Which is the best cheap frontier model — Gemini 3.5 Flash, DeepSeek V4 Flash, or Qwen 3.6 Plus?

Gemini 3.5 Flash wins on raw capability per dollar (frontier-class, 1M context, $1.50/$9 per 1M tokens, ~4x baseline speed) and is the best default for most agentic coding and long-context work. DeepSeek V4 Flash is the cheapest of the three ($0.14/$0.28 per 1M) and the strongest at math and code reasoning at low cost. Qwen 3.6 Plus ($0.325/$1.95) wins for self-hosting and tool-use workflows, especially in Asian-language tasks. Pick by workload: Gemini 3.5 Flash for breadth and quality, DeepSeek V4 Flash for cost, Qwen 3.6 Plus for self-host.

How much cheaper is DeepSeek V4 Flash than Gemini 3.5 Flash?

DeepSeek V4 Flash is roughly 10x cheaper than Gemini 3.5 Flash on input ($0.14 vs $1.50 per 1M) and ~32x cheaper on output ($0.28 vs $9.00 per 1M). For an agentic loop that generates a lot of output tokens, DeepSeek V4 Flash can be 20–30x cheaper per task. The tradeoff: Gemini 3.5 Flash has materially better long-context handling (1M proven vs DeepSeek's effective ~200K), better multimodal, and faster output speed.

Can Qwen 3.6 Plus run locally?

Qwen 3.6 Plus is available open-weight (Apache 2.0 license) and the smaller variants (Qwen3.6-7B, 32B) run well on a single H100 or even a Mac Studio with 128GB+ RAM. The full Qwen3.6-Plus (480B MoE, ~35B active) requires multi-GPU setups (4x H100 or 8x H200 for reasonable throughput). The hosted API at ~$0.325/$1.95 per 1M is dramatically cheaper than DIY for most teams. Gemini 3.5 Flash is closed-source and cannot be self-hosted; DeepSeek V4 Flash is open-weight (MIT-style license) but requires similar GPU footprint.

Which one should I use for coding agents?

Gemini 3.5 Flash is the strongest of the three for agentic coding — 76.2% on Terminal-Bench 2.1, native tool use that matches Claude Opus 4.7 patterns, and a 1M context that handles entire codebases. DeepSeek V4 Flash is a strong second at a fraction of the cost — great for high-volume agent loops where you can route hard tasks to GPT-5.5 or Opus. Qwen 3.6 Plus is the strongest tool-use model among open weights but trails on terminal-style coding agents. For most teams: Gemini 3.5 Flash as default, DeepSeek V4 Flash as the cheap fallback.

Quick Answer

Gemini 3.5 Flash vs DeepSeek V4 Flash vs Qwen 3.6 (May 2026)

Published: May 22, 2026

Gemini 3.5 Flash vs DeepSeek V4 Flash vs Qwen 3.6 (May 2026)

Three “cheap frontier” models now compete head-to-head: Google’s Gemini 3.5 Flash (launched May 19, 2026), DeepSeek V4 Flash, and Qwen 3.6 Plus. Here’s the practical comparison for builders choosing a default.

Last verified: May 22, 2026

TL;DR table

	Gemini 3.5 Flash	DeepSeek V4 Flash	Qwen 3.6 Plus
Vendor	Google	DeepSeek (China)	Alibaba (China)
Released	May 19, 2026	Q1 2026	Q1 2026
Architecture	Dense + sparse hybrid	MoE (sparse)	MoE (sparse)
License (weights)	Closed (API only)	MIT-style open	Apache 2.0 open
Context window	1,000,000	~200,000 effective	256,000
Input price / 1M	$1.50	$0.14	$0.325 (OpenRouter)
Output price / 1M	$9.00	$0.28	$1.95 (OpenRouter)
Speed	~4x baseline	Fast (MoE)	Fast (MoE)
Multimodal	Best (image + video + audio)	Image (limited)	Image + audio
Tool use	Best	Strong	Strong
Terminal-Bench 2.1	76.2%	~62%	~55%
AA Intelligence Index	~55	~45	~42
Best for	Default for everything	Cheap high-volume agents	Self-host + Asian langs

Gemini 3.5 Flash — the new default

Released May 19, 2026 at Google I/O. Frontier-class capability at Flash-tier pricing.

What stands out:

1M context that actually works — Google’s effective long-context handling is materially better than DeepSeek’s at 200K+ and Qwen’s at 256K. Recall and reasoning over the full window degrade more gracefully.
Tool use parity with Opus 4.7 — Native function calling, parallel tool use, and structured output that matches Anthropic patterns.
Speed — ~4x baseline output speed makes interactive coding agents feel snappy.
Multimodal lead — Best-in-class video reasoning, on par with Opus 4.7 on image, and audio in/out.
Powers Antigravity, AI Mode, Antigravity CLI by default. Massive real-world usage means rapid iteration.

What’s missing:

Closed weights. No self-hosting. You’re on Google.
Higher per-token price than DeepSeek/Qwen. 10x DeepSeek on input, 30x on output.
Region availability — Some regions (mainland China, certain EU compliance scenarios) get reduced or no access.

Best for: The default model for nearly any production agentic workload that isn’t price-constrained.

DeepSeek V4 Flash — the unbeatable cost

DeepSeek’s Flash-tier model. The cheapest credible-quality model in this comparison.

What stands out:

$0.14/$0.28 per 1M tokens — Cheaper by an order of magnitude than the others.
MIT-style open weights. Self-hostable if you have the GPU footprint.
Strong math and code. Inherited from DeepSeek V4 Pro’s strengths.
MoE efficiency — Only ~21B parameters active per token from ~671B total.

What’s missing:

Long context is shaky — Effective context degrades materially past 64–96K tokens.
Tool use is good but not GPT-5.5/Opus level — Multi-step agentic loops require more guardrails.
Multimodal is image-only — No video, no audio.
Geopolitics — Some US enterprises restrict Chinese model deployment regardless of self-hosting.

Best for: High-volume agent loops where you can route hard subtasks to GPT-5.5 or Opus and use V4 Flash for the 80% of cheap, repetitive work.

Qwen 3.6 Plus — the self-host champion

Alibaba’s flagship open-weight model. Strong tool use and the best open-source choice for many builders.

What stands out:

Apache 2.0 license. Most permissive of the three.
Strong open ecosystem — Excellent vLLM support, good quantizations (GGUF, AWQ, GPTQ available within days of release).
Tool use is the best among open weights. Native function calling, parallel tools, structured output that’s competitive with closed frontier models.
Asian-language quality — Strongest of the three for Mandarin, Japanese, Korean (and competitive on most others).
Cheap hosted — ~$0.325/$1.95 per 1M on OpenRouter, ~$0.78/$3.90 from Alibaba directly at list.

What’s missing:

Self-host is heavy — Full Qwen3.6-Plus is 480B MoE with 35B active. Needs 4x H100 minimum for production throughput.
Long context weaker than Gemini — 256K context works but recall degrades past ~150K.
No video understanding.
Western enterprise hesitation — Like DeepSeek, some buyers restrict Chinese-origin models.

Best for: Teams who want frontier-class capability with full open-weight control, or who are doing significant work in Asian languages.

Head-to-head: cost-per-task

For a representative agentic coding loop (~50K input tokens, ~10K output tokens, repeated 100 times per day):

	Gemini 3.5 Flash	DeepSeek V4 Flash	Qwen 3.6 Plus (OpenRouter)
Input cost / day	$7.50	$0.70	$1.625
Output cost / day	$9.00	$0.28	$1.95
Total / day	$16.50	$0.98	$3.58
Per month	$495	$29	$107
Per year	$5,940	$352	$1,290

DeepSeek V4 Flash is the runaway cost winner. The question is whether the ~30% quality gap on hard tasks is worth saving 15x.

Head-to-head: speed and latency

	Time-to-first-token	Tokens/sec output	Notes
Gemini 3.5 Flash	~250ms	~250 tok/s	Google’s own infra, very stable
DeepSeek V4 Flash	~400ms	~120 tok/s	API can be slow at peak Chinese hours
Qwen 3.6 Plus (OpenRouter)	~500ms	~150 tok/s	Varies by hosting provider

Gemini 3.5 Flash is meaningfully faster end-to-end, which matters for interactive agents and IDE integrations.

Head-to-head: when to pick which

Workload	Best pick	Why
Production coding agent (IDE)	Gemini 3.5 Flash	Speed + quality + 1M context
High-volume batch generation	DeepSeek V4 Flash	15x cheaper, quality good enough
Self-hosted enterprise deployment	Qwen 3.6 Plus	Apache 2.0, best ecosystem
Long-context (500K+ tokens)	Gemini 3.5 Flash	Only one that handles it cleanly
Asian-language tasks	Qwen 3.6 Plus	Best tokenizer and training mix
Multimodal (video / audio)	Gemini 3.5 Flash	Only one with proper video
Math-heavy reasoning	DeepSeek V4 Flash	Inherits V4 Pro’s math strength
Lowest-risk geopolitically	Gemini 3.5 Flash	US vendor, fewer enterprise blocks

Hybrid routing — the real-world answer

Most production teams in May 2026 don’t pick one; they route:

Cheap, high-volume work → DeepSeek V4 Flash
Default agentic and complex → Gemini 3.5 Flash
Highest-stakes single-shot coding → Claude Opus 4.7 or Gemini 3.5 Pro (June 2026)
Long-context retrieval (200K+) → Gemini 3.5 Flash
Self-hosted compliance scenarios → Qwen 3.6 Plus

OpenRouter, LiteLLM, and similar routers make this easy.

What’s coming

June 2026 — Gemini 3.5 Pro ships, raising Google’s ceiling.
Late 2026 — DeepSeek R3 / V5 rumored.
Q3 2026 — Qwen 4 expected.
Ongoing — Prices keep dropping; the “cheap frontier” tier has had ~5x price reduction in 12 months.

TL;DR

Gemini 3.5 Flash = best default, frontier-class, 1M context, $1.50/$9.
DeepSeek V4 Flash = unbeatable cost ($0.14/$0.28), great for high-volume.
Qwen 3.6 Plus = best self-host, Apache 2.0, strong Asian-language and tool use.
Real production answer: route between them based on workload.

Sources

Google blog: “Gemini 3.5 Flash” announcement (May 19, 2026)
Lushbinary: “Gemini 3.5 Flash vs GPT-5.5 vs Claude Opus 4.7 Compared” (May 20, 2026)
AIMadeTools: “Gemini 3.5 Flash vs Claude Opus 4.7 vs GPT-5.5” (May 20, 2026)
DeepSeekAI.guide: “DeepSeek vs Qwen (2026): Which Open Model Wins?” (April 30, 2026)
PricePerToken.com: “Deepseek vs Qwen — API Pricing & Model Comparison 2026”