What Is Gemini 3.5 Flash? Google's New Default AI Model (May 2026)
What Is Gemini 3.5 Flash? Google’s New Default AI Model (May 2026)
Gemini 3.5 Flash launched on May 19, 2026 at Google I/O and is now the default model for the Gemini app and AI Mode in Google Search. It’s Google’s first “frontier-class small model” — claiming to beat Gemini 3.1 Pro on most agentic benchmarks while running up to 4x faster and at a fraction of the cost.
Last verified: May 20, 2026
Quick facts
| Property | Value |
|---|---|
| Announced | May 19, 2026 (Google I/O 2026) |
| Vendor | Google DeepMind |
| Model family | Gemini 3.5 |
| Input context | 1,000,000 tokens |
| Output tokens | 64,000 |
| Speed | Up to 4x faster than other frontier models (output TPS) |
| SWE-bench Pro | 55.1% (single-attempt) |
| Terminal-Bench 2.1 | 76.2% |
| Default in | Gemini app, AI Mode in Search, Antigravity 2.0 |
| Powers | Gemini Spark, Information Agents in Search |
What’s actually new
Gemini 3.5 Flash isn’t a Pro-class model and Google isn’t pretending it is. The story is different: a small, fast, cheap model that hits frontier-class numbers on the workloads that matter for agents.
- Agent-first — tuned for tool use, planning, and long-horizon multi-step tasks. It’s the model that powers Antigravity 2.0 by default and Gemini Spark.
- 4x output speed — Google says output tokens per second is up to 4x faster than competing frontier models. For agentic loops that call the model repeatedly, this changes the unit economics.
- 1M-token input window — same as Gemini 1.5 Pro, but in a Flash-tier model.
- Beats 3.1 Pro on most evals — this is Google’s main marketing claim: a Flash model that outperforms the previous-generation Pro on agentic and coding tasks.
Benchmarks
| Benchmark | Gemini 3.5 Flash | Gemini 3.1 Pro | GPT-5.5 | Claude Opus 4.7 |
|---|---|---|---|---|
| SWE-bench Pro (single attempt) | 55.1% | ~50% | 58.6% | 64.3% |
| Terminal-Bench 2.1 | 76.2% | 70.3% | 78.2% | 66.1% |
| Output tokens per second | ~4x baseline | baseline | baseline | baseline |
| Context window (input) | 1M | 1M | ~400K | 200K |
Read this honestly: Opus 4.7 is still the best coding model. GPT-5.5 leads on terminal-style agentic tasks. Gemini 3.5 Flash’s superpower is speed + price + context length — it’s the model you want in the loop when you’re running thousands of agent steps.
Where Gemini 3.5 Flash shows up
| Surface | What it does |
|---|---|
| Gemini app | New default model (replaces 3.1 Pro for free + Pro tiers) |
| AI Mode in Google Search | Powers the new conversational search box + Information Agents |
| Antigravity 2.0 | Default model in Google’s agent-first IDE |
| Gemini Spark | The reasoning engine for Google’s 24/7 personal AI agent |
| Gemini API / Vertex AI | Generally available for developers |
| Jules | Backs the free tier (3 concurrent / 15 daily tasks) |
Pricing
Gemini 3.5 Flash is cheaper than 3.1 Pro and Google’s stated pitch is “frontier-class quality at Flash prices.” Exact per-token API prices are listed at ai.google.dev — for most Pro-tier users, the practical impact is higher rate limits + faster responses at the same Gemini app subscription price.
The new Google AI subscription stack as of May 20, 2026:
| Plan | Price | What you get |
|---|---|---|
| Free | $0 | Limited daily Gemini 3.5 Flash usage |
| Google AI Pro | $19.99/mo | Higher 3.5 Flash limits, Veo 3.1, NotebookLM Pro |
| Google AI Ultra | $99.99/mo | 5x usage of Pro, priority Antigravity, early Gemini Spark, 20TB cloud, YouTube Premium |
| Google AI Ultra (top tier) | $199.99/mo | 20x usage of Pro, expanded Project Genie access |
The $250/mo tier was reduced to $200. The $100 tier is new.
Who should care
Agent developers — Flash’s 4x output speed and 1M-token context change agentic loop economics. If you’re hitting Opus or GPT-5.5 limits on cost or latency, Flash is the obvious comparison.
Anyone using AI Mode in Google Search — you’re already using 3.5 Flash, even if you didn’t notice. The new search box, Information Agents, and conversational refinement all run on it.
Cursor / Windsurf / Antigravity users — Flash is now the default in Antigravity and is the cheaper option in Cursor 3 and Windsurf model pickers.
Jules users — the free tier moves to 3.5 Flash, the Pro and Ultra tiers still use 3.1 Pro pending the 3.5 Pro launch in June.
What’s next
- Gemini 3.5 Pro — June 2026 launch, currently in testing.
- Gemini Omni Flash — rolling out alongside 3.5 Flash for multimodal video generation.
- Gemini Spark — beta access for Ultra subscribers in the US next week.
TL;DR
Gemini 3.5 Flash is the new floor for Google’s models, not the ceiling. It’s a Flash-tier model that beats last-gen Pro on most agentic benchmarks, runs 4x faster, and is now the default everywhere — Gemini app, AI Mode in Search, Antigravity 2.0, Jules. The actual flagship (Gemini 3.5 Pro) drops in June. For now, Flash is the workhorse and it’s a strong one.