Which is better for coding: GPT-5.5, Claude Opus 4.7, or Claude Mythos Preview?

Claude Mythos Preview leads SWE-bench Verified at 93.9% but is in limited preview only. Claude Opus 4.7 (Adaptive) leads among generally available models at 87.6% on SWE-bench Verified and 64.3% on the harder SWE-Bench Pro — versus GPT-5.5's 58.6% on SWE-Bench Pro. For production coding work in May 2026, Opus 4.7 is the safest pick. For raw reasoning and ARC-AGI-2 benchmarks, GPT-5.5 (High) leads at 83.3%.

How does pricing compare in May 2026?

GPT-5.5 API pricing is $5 input / $30 output per million tokens. Claude Opus 4.7 API pricing is $5 input / $25 output per million tokens — slightly cheaper on output. Claude Mythos Preview is not publicly priced as of May 2, 2026. Both Opus 4.7 and GPT-5.5 are available through Pro tier consumer plans (Claude Pro, ChatGPT Plus) at $20/month with usage caps.

Where does GPT-5.5 still win in May 2026?

GPT-5.5 (High) leads ARC-AGI-2 at 83.3% versus Claude Opus 4.7 (High) at 68.3% — a 15-point gap on the abstract reasoning benchmark. GPT-5.5 also leads on Humanity's Last Exam per Artificial Analysis's verified results. For tasks emphasizing pure reasoning over coding, GPT-5.5 is still the stronger choice.

Should I migrate from GPT-5.5 to Claude Opus 4.7?

Migrate if your workload is dominated by coding (SWE-Bench Pro 64.3% vs 58.6%), agentic loops, or long-horizon tasks where reliability matters more than peak reasoning. Stay on GPT-5.5 if your workload emphasizes reasoning, OpenAI ecosystem integration, or you've already invested heavily in GPT tool use. Many teams in May 2026 run both — Opus 4.7 for coding agents, GPT-5.5 for reasoning-heavy analysis.

Quick Answer

GPT-5.5 vs Opus 4.7 vs Mythos Preview (May 2026)

Published: May 2, 2026

GPT-5.5 vs Opus 4.7 vs Mythos Preview (May 2026)

The frontier model picture in May 2026 has three names: OpenAI’s GPT-5.5, Anthropic’s Claude Opus 4.7, and Anthropic’s preview-only Claude Mythos Preview. Each wins different benchmarks. Here’s how they actually compare for May 2026 production decisions.

Last verified: May 2, 2026

Headline benchmarks

Benchmark	GPT-5.5	Opus 4.7	Mythos Preview
SWE-bench Verified	83.8% (High)	87.6% (Adaptive)	93.9% (preview only)
SWE-Bench Pro	58.6%	64.3%	not published
ARC-AGI-2	83.3% (High)	68.3% (High)	not published
Humanity’s Last Exam	leads (per Artificial Analysis)	trailing GPT-5.5	not published
Long-context Graphwalks	strong	leads	strong (per limited reports)
Multi-modal	strong	strong	unknown

The pattern is consistent across April and early May 2026 reporting:

Coding (SWE-bench Pro and Verified): Anthropic wins. Mythos Preview leads, Opus 4.7 second.
Reasoning (ARC-AGI-2, Humanity’s Last Exam): OpenAI wins. GPT-5.5 leads.
Long-context and agentic loops: Opus 4.7 leads the GA models; Mythos Preview likely better but not benchmarked publicly.
Multi-modal: rough parity between GPT-5.5 and Opus 4.7. Gemini 3.1 Pro is the multi-modal leader, not in this comparison.

Pricing in May 2026

Plan	GPT-5.5	Opus 4.7	Mythos Preview
API input / million tokens	$5	$5	not priced
API output / million tokens	$30	$25	not priced
Consumer Pro tier	ChatGPT Plus $20/mo	Claude Pro $20/mo	not available
Consumer Max tier	ChatGPT Pro $200/mo	Claude Max $100/mo	not available
Enterprise	OpenAI Enterprise	Claude Enterprise	enterprise preview only

Output tokens dominate cost in agentic workloads (where models generate long responses), so Opus 4.7’s $5 cheaper output pricing is meaningful at scale. For a heavy agentic workload generating millions of output tokens per day, the difference is hundreds of dollars per day.

Where each model wins

GPT-5.5 — best for reasoning and OpenAI ecosystem

GPT-5.5 (High) leads ARC-AGI-2 at 83.3% versus Opus 4.7’s 68.3% — a 15-point gap. It also leads Humanity’s Last Exam per Artificial Analysis’s verified results. For workloads that emphasize abstract reasoning, mathematical problem solving, or novel-domain pattern recognition, GPT-5.5 is the stronger choice.

It’s also the better choice if your stack is OpenAI-native: Codex, GPT Tools, the OpenAI Agent SDK, ChatGPT Enterprise, or Microsoft Azure OpenAI Service.

Claude Opus 4.7 — best for production coding and agentic work

Claude Opus 4.7’s edge on SWE-Bench Pro (64.3% vs GPT-5.5’s 58.6%) is the headline number for any team building agentic coding tools. The 5.7-point gap on SWE-Bench Pro represents hundreds of real GitHub issues where Opus 4.7 ships working code and GPT-5.5 doesn’t.

Opus 4.7 also leads on long-context tasks and agentic loops where the model has to reason over many tool calls without losing track. Claude Code, Cursor 3, and JetBrains Air all lean on Opus 4.7 for their hardest agentic workflows in May 2026.

Claude Mythos Preview — best benchmark, but preview only

Mythos Preview’s 93.9% on SWE-bench Verified is the new frontier ceiling. It’s roughly 6 points ahead of Opus 4.7 (Adaptive) at 87.6%. But it’s preview-only — not generally available, not priced, not in Claude Code by default.

Treat Mythos as your 2027 model, not your 2026 model. Adopt Opus 4.7 today and migrate to Mythos when it goes GA.

Decision matrix

Your priority	Pick
Production coding agents	Opus 4.7
Long-horizon agentic loops	Opus 4.7
Pure reasoning, math, novel problems	GPT-5.5
OpenAI ecosystem (Codex, Agent SDK)	GPT-5.5
Lowest output token cost	Opus 4.7 ($25 vs $30)
Frontier capability ceiling for planning	Mythos Preview (when GA)
Multi-modal (image, video, doc)	Tied — or use Gemini 3.1 Pro outside this matchup

What changed in April 2026

Three things shifted the comparison in the past month:

Mythos Preview leaderboard appearance. The first credible 93%+ on SWE-bench Verified, signaling SWE-bench Verified is approaching saturation.
Opus 4.7 Adaptive mode rollout. Anthropic shipped a higher-quality “Adaptive” mode for Opus 4.7 that improves SWE-bench Verified from 84.2% (Standard) to 87.6% (Adaptive) at higher inference cost.
GPT-5.5 (High) tightened on ARC-AGI-2. OpenAI’s High mode pushed ARC-AGI-2 to 83.3%, widening the reasoning gap with Opus 4.7.

The competitive picture is sharpening: Anthropic deepens coding lead; OpenAI deepens reasoning lead. Gemini 3.1 Pro stays best-in-class for multi-modal but trails on pure coding and reasoning.

Real-world reliability ≠ benchmark scores

Several teams (MindStudio, Build Fast With AI, Mashable) flagged through April 2026 that benchmark scores don’t fully predict production reliability:

Opus 4.7 is more reliable in agentic loops than the 87.6% number alone suggests. It maintains coherence over longer multi-step tasks.
GPT-5.5 is faster on average and integrates more cleanly with non-coding tools. For latency-sensitive applications, this matters more than benchmark scores.
Both models hit context-window degradation at very long contexts (>500k tokens) despite advertised limits.

The honest read: pick Opus 4.7 for coding, GPT-5.5 for reasoning, and run small evals on your actual workload before committing.

Bottom line

For May 2026: Claude Opus 4.7 wins production coding and agentic work. GPT-5.5 wins reasoning and OpenAI-stack integration. Claude Mythos Preview is the future model to plan around — but not to deploy yet. Most teams running serious AI workloads in May 2026 use Opus 4.7 as their default coding model, GPT-5.5 as their reasoning model, and pay attention to Mythos Preview pricing announcements through Q2-Q3 2026 to plan migrations. Don’t try to standardize on one model; the per-task right answer is too clear to ignore.

Built with 🤖 by AI, for AI.