Antigravity CLI vs Codex CLI vs SubQ Code (May 2026)
Antigravity CLI vs Codex CLI vs SubQ Code (May 2026)
Three terminal-native agents, three completely different strategies. Google retired Gemini CLI on May 19 for Antigravity CLI, OpenAI’s Codex CLI continues to mature, and Subquadratic shipped SubQ Code on May 5 with a 12-million-token context window. Here’s how to pick.
Last verified: May 21, 2026
TL;DR table
| Antigravity CLI | Codex CLI | SubQ Code | |
|---|---|---|---|
| Vendor | OpenAI | Subquadratic | |
| Released | May 19, 2026 | Late 2025 (active) | May 5, 2026 |
| Default model | Gemini 3.5 Flash | GPT-5.5 | SubQ (subquadratic) |
| Context window | 1,000,000 | ~400,000 | 12,000,000 |
| Terminal-Bench 2.1 | 76.2% | 78.2% | n/a (different bench) |
| SWE-bench | 55.1% (Pro) | 88.7% (Verified) / 58.6% (Pro) | 81.8% (Verified) |
| Multi-agent | Yes (manager view) | Yes (Codex Cloud) | Limited |
| Local vs cloud | Both | Both (Codex CLI + Cloud) | Both |
| Open source | No (closed) | Yes (Codex CLI repo) | No (waitlist) |
| Input price | $1.50 / 1M | ~$5 / 1M | TBA (preview) |
| Output price | $9 / 1M | ~$25 / 1M | TBA (preview) |
| Strength | Cheap parallel loops, 1M context | Terminal-Bench leader, Codex Cloud handoff | Whole-codebase context, niche workloads |
What each one is
Antigravity CLI — Google’s terminal bet
Replaces Gemini CLI as of May 19, 2026. Go-based, signs in with your Google account, defaults to Gemini 3.5 Flash. The same agent harness powers Antigravity 2.0 (the desktop app), AI Studio, and Managed Agents in the Gemini API — so you can hand off a session from terminal to desktop without losing context.
Pitch: the cheapest, fastest terminal agent with frontier-class capability. For high-volume agentic loops where you make thousands of calls.
Codex CLI — OpenAI’s open-source terminal agent
OpenAI’s open-source terminal agent. Defaults to GPT-5.5, the Terminal-Bench 2.1 leader (78.2%). Pairs with Codex Cloud — long-running cloud agents that you can dispatch and pick up later. ChatGPT integration is tight: you can promote a CLI session to a ChatGPT thread for review.
Pitch: the most capable single terminal agent on benchmarks, with cloud handoff for long jobs.
SubQ Code — the long-context outlier
Launched May 5, 2026 by Miami-based Subquadratic. Built on the SubQ model — the first commercial subquadratic-attention LLM with a native 12-million-token context window. The pitch is straightforward: load your entire codebase into a single prompt, then refactor.
Pitch: whole-codebase reasoning, not chunk-and-stitch. RULER 128K score of 97%, MRCR v2 score of 83 (beats Opus, GPT-5.4, Gemini 3.1 Pro).
When each one wins
Pick Antigravity CLI for…
- High-volume parallel agentic loops — Flash-tier pricing changes unit economics
- Tight Google Cloud / Firebase / Workspace integration
- Migrating off Gemini CLI before June 18, 2026 deprecation
- 1M context with cheap output tokens
- Shared sessions with Antigravity 2.0 desktop app
Pick Codex CLI for…
- Maximum agentic terminal benchmark scores (Terminal-Bench 2.1 78.2%)
- Long-running cloud jobs via Codex Cloud — dispatch and walk away
- ChatGPT integration (review CLI sessions in ChatGPT)
- Open-source CLI — you can fork and customize the agent harness
- Mixed coding + reasoning workloads where GPT-5.5’s all-rounder profile matters
Pick SubQ Code for…
- Whole-monorepo refactors that don’t fit in 1M tokens
- Long-document understanding — RULER 128K leader
- Multi-needle retrieval across millions of tokens — MRCR v2 leader
- Cost-per-token at long context — Subquadratic claims ~1/5 the cost of frontier models for long-context tasks
- You’re okay running on a preview/waitlist product with vendor-reported benchmarks
Benchmark deep dive
Terminal-Bench 2.1 (agentic terminal coding)
| Model | Score |
|---|---|
| GPT-5.5 (Codex CLI) | 78.2% |
| Gemini 3.5 Flash (Antigravity CLI) | 76.2% |
| Gemini 3.1 Pro | 70.3% |
| Claude Opus 4.7 | 66.1% |
| SubQ | not reported on Terminal-Bench 2.1 |
SWE-bench Pro (single attempt)
| Model | Score |
|---|---|
| Claude Opus 4.7 | 64.3% |
| GPT-5.5 (Codex CLI) | 58.6% |
| Gemini 3.5 Flash (Antigravity CLI) | 55.1% |
| SubQ | not reported on Pro variant |
Long-context retrieval — RULER 128K
| Model | Score |
|---|---|
| SubQ (SubQ Code) | 97% |
| Claude Opus 4.6 | 94% |
| Gemini 3.1 Pro | ~90% |
Multi-needle retrieval — MRCR v2
| Model | Score |
|---|---|
| SubQ (SubQ Code) | 83 |
| Claude Opus 4.6 | 78 |
| GPT-5.4 | 39 |
| Gemini 3.1 Pro | 23 |
Honest caveat: SubQ’s long-context numbers are vendor-reported and not yet independently reproduced as of May 21, 2026. The architectural pitch is real (subquadratic attention scales linearly), but treat the headline efficiency numbers as preview-grade.
Pricing reality
A 100K-token agent task with 5K tokens output, run 1000 times:
| Antigravity CLI | Codex CLI | |
|---|---|---|
| Input cost | $150 | ~$500 |
| Output cost | $45 | ~$125 |
| Total | $195 | ~$625 |
Antigravity CLI is roughly 3x cheaper for the same workload — the unit-economics shift Flash-tier models enable.
SubQ Code pricing is still preview/waitlist; expect it to position around the same total cost as frontier models but at much longer context.
Which to pick
Default recommendation: run two of them in parallel.
- Antigravity CLI for cheap parallel agentic loops, day-to-day grinding work, and Google-stack projects.
- Codex CLI for highest-quality terminal-bench-style execution and Codex Cloud handoff.
- SubQ Code if and only if you’re hitting context-window limits with 1M tokens — i.e., monorepo refactors or massive corpus analysis.
TL;DR
For most teams in May 2026, the answer is Antigravity CLI + Codex CLI. Antigravity gives you cheap throughput on Gemini 3.5 Flash; Codex gives you peak Terminal-Bench performance on GPT-5.5; the cost overlap is small. SubQ Code is the special-purpose tool you’d add for whole-codebase context. The terminal agent space went from “one good option” (Claude Code, late 2025) to “three real options” in six months. Pick by workload, not by vendor loyalty.