Which is cheapest for agentic workloads — Gemini 3.5 Flash, GPT-5-mini, or Claude Haiku?

Depends on what you're measuring. On raw per-token pricing, Gemini 3.5 Flash ($1.50 in / $9 out per Mtok) is cheaper than Claude Haiku 4.7 (~$0.80/$4) on output, and roughly even with GPT-5-mini ($0.50/$3). But on cost-per-successful-task — the metric that matters for agents — Gemini 3.5 Flash often wins because its quality is closer to frontier models, so it completes more tasks first-try without escalation.

Why does cost-per-task matter more than cost-per-token?

Cheap models are only cheap if they actually finish the task. A model that costs $0.10 per million tokens but fails 40% of the time and requires retry/escalation can be more expensive than a model that costs $1.50 but succeeds 90% of the time. For agentic workloads (multi-step tool use, long loops), success rate compounds. Pick the cheapest model that hits your success-rate floor, not the cheapest model period.

What about Gemini 3.5 Flash's benchmark scores?

On the Artificial Analysis Intelligence Index (May 2026), Gemini 3.5 Flash scores 55 — within 2 points of Claude Opus 4.7 (57.3) and 5 points of GPT-5.5 (60.2). That's frontier-adjacent quality at one-tenth the price. On Terminal-Bench 2.1 (agentic coding), Flash scores 76.2%, beating last year's Gemini 3.1 Pro. The 'Flash that beat last year's Pro' framing isn't hyperbole.

When should I still use a frontier model instead of a Flash-class model?

Three cases. (1) Hard reasoning tasks where 5 IQ points matter — frontier models still win on math, novel algorithms, and tricky planning. (2) Very long context (300K+ tokens) where Opus 4.7's 1M window is unmatched. (3) Safety-critical or high-stakes output (legal, medical, financial) where you want the most conservative refusals. Otherwise, Flash-class models are the right choice for production agent workloads in May 2026.

Quick Answer

Gemini 3.5 Flash Cost-per-Task vs GPT-5-mini vs Claude Haiku

Published: May 23, 2026

Gemini 3.5 Flash Cost-per-Task vs GPT-5-mini vs Claude Haiku (May 2026)

Gemini 3.5 Flash launched at Google I/O on May 19, 2026 at $1.50 input / $9 output per million tokens — and changed the cost-per-task math for agentic workloads. Here’s how it stacks up against GPT-5-mini and Claude Haiku 4.7 when you measure what actually matters: cost per successful task, not cost per token.

Last verified: May 23, 2026

TL;DR table

	Gemini 3.5 Flash	GPT-5-mini	Claude Haiku 4.7
Vendor	Google	OpenAI	Anthropic
Released / latest	May 19, 2026	Q1 2026	Q2 2026
Pricing (in / out per Mtok)	$1.50 / $9.00	$0.50 / $3.00	$0.80 / $4.00
Context window	1M tokens	272K	200K
Artificial Analysis Index	55	47	49
Terminal-Bench 2.1 (agentic coding)	76.2%	62%	65%
Output speed (tokens/sec)	278	~200	~220
Best for	Frontier-adjacent quality, agentic loops	Cheapest per-token	Anthropic-native workflows
Cost per typical agent task	~$0.04-0.10 (high success rate)	~$0.03-0.15 (more retries)	~$0.05-0.13

What’s new with Gemini 3.5 Flash

From Google’s I/O announcement (May 19, 2026) and the LLM Stats blog post:

GA at launch. Available in the Gemini API and Vertex AI on day one.
Pricing: $1.50 / $9 per million tokens. Standard tier; high-thinking-effort tier costs more.
1M token context window. Same as Gemini 3.1 Pro.
Artificial Analysis Index: 55. Within 2 points of Claude Opus 4.7 (57.3) and 5 points of GPT-5.5 (60.2).
Terminal-Bench 2.1: 76.2%. Beats last year’s Gemini 3.1 Pro on agentic coding.
278 tokens/sec output speed. Rank #2 in its Artificial Analysis price class.
Used in Antigravity 2.0 for parallel subagent orchestration (Google demoed Flash running 93 parallel subagents on 15,000+ requests in one task).

The headline pitch: “frontier-adjacent quality at Flash-tier pricing.” Independent benchmarks back it up.

Why per-token pricing misleads you

Look at the raw pricing table and you’d think GPT-5-mini is the cheapest. But that’s not the question that matters for production agents. The right question is: cost per successful agent task.

A simple model:

Cost per successful task = (cost per token) × (tokens per attempt) × (1 / success rate)

A model with a 50% first-try success rate doubles its effective cost. Add retries, escalation to a frontier model on failure, and orchestration overhead — cheap models can easily end up more expensive than a slightly more capable middle-tier model.

This is exactly why Gemini 3.5 Flash is the surprise winner in May 2026 agentic benchmarks. Its first-try success rate on multi-step tasks is much closer to frontier models than its price suggests.

Real-world cost-per-task estimates

Take a “research and summarize 10 web pages, extract structured data, write a report” agentic task — typical of analyst agents.

Model	Tokens / attempt	First-try success rate	Cost per attempt	Cost per success
Gemini 3.5 Flash	~80K in / 20K out	~85%	$0.30	~$0.35
GPT-5-mini	~80K in / 20K out	~65%	$0.10	~$0.15 (low) but $0.50+ if you escalate failed runs
Claude Haiku 4.7	~80K in / 20K out	~72%	$0.14	~$0.20 (low) but $0.45+ if you escalate
GPT-5.5 (frontier)	~80K in / 20K out	~93%	$1.00	~$1.07
Claude Opus 4.7 (frontier)	~80K in / 20K out	~95%	$3.00	~$3.16

Two takeaways:

1. Gemini 3.5 Flash has the best cost-per-success ratio in this class — high success rate at moderate per-token cost. 2. The cheapest per-token models cost more once you account for retries and escalation to a frontier fallback.

(Specific numbers are illustrative — your mileage varies by task. The pattern holds across most agentic benchmarks in May 2026.)

When each model wins

Gemini 3.5 Flash wins for:

Agentic loops (multi-step tool use with high first-try success requirements).
Long-context tasks that don’t need frontier reasoning (1M token window at Flash price is unique).
Cost-sensitive Workspace integrations (built into Google’s stack).
Parallel subagent orchestration (Antigravity 2.0 demoed 93 parallel Flash agents).

GPT-5-mini wins for:

Pure cost optimization when task is simple and success rate doesn’t matter much.
Single-turn classification / extraction where one shot is enough.
OpenAI-ecosystem workflows (existing OpenAI infrastructure, Codex integration, Operator).
Voice / Realtime API workflows.

Claude Haiku 4.7 wins for:

Anthropic-native workflows (Claude Skills, Managed Agents, MCP server-side).
Conservative content generation where you want stricter refusals.
Customer support agents where Anthropic’s tuning is strong.
Workflows already on Claude where downgrading from Sonnet/Opus for cost.

When to stay on frontier models (GPT-5.5, Opus 4.7)

Don’t downgrade to a Flash-class model for:

Hard reasoning — math problems, novel algorithm design, tricky causal inference.
Very long context above 500K tokens where context coherence matters.
Safety-critical outputs — legal advice, medical content, regulatory writing.
Tasks where 5 IQ points matter — strategic planning, complex debugging, expert-level analysis.

For everything else — and that’s most agentic workloads in May 2026 — Flash-class models are the right default.

The strategic picture

The Gemini 3.5 Flash launch is part of a broader trend: the gap between Flash-tier and frontier-tier model quality is shrinking, while the price gap is widening.

Year	Flash-tier vs frontier quality gap	Flash-tier vs frontier price gap
2023	~30% (Flash was clearly worse)	5-10x cheaper
2024	~20%	5-10x cheaper
2025	~10%	8-10x cheaper
May 2026	~3-5%	8-10x cheaper

This means: most production agent workloads should default to Flash-tier models now, with frontier escalation only when genuinely needed. The Cursor Auto-mode pattern (default Composer 2.5, escalate to Opus 4.7 on hard tasks) is the canonical model.

How to choose — for engineers building agents

Default to Gemini 3.5 Flash for agentic workloads in May 2026. Highest success rate at this price point.
Use GPT-5-mini for the cheapest single-turn classification or extraction tasks.
Use Claude Haiku 4.7 when your stack is already on Anthropic (skills, MCP, Managed Agents).
Escalate to frontier (GPT-5.5, Opus 4.7) only when first-try success on Flash drops below your threshold.
Measure cost-per-success, not cost-per-token. Track success rates per workflow and recompute your routing every quarter.

Verdict

Cheapest per token: GPT-5-mini.
Cheapest per successful task: Gemini 3.5 Flash.
Best Anthropic-native cheap tier: Claude Haiku 4.7.
Best default for production agents (May 2026): Gemini 3.5 Flash.

If you’re building agents at scale and haven’t re-evaluated your default model since I/O 2026, Gemini 3.5 Flash deserves a serious A/B test. The cost-per-success math is hard to argue with.