What Is GPT-Realtime-2? OpenAI's New Voice Model (May 2026)
What Is GPT-Realtime-2? OpenAI’s New Voice Model (May 2026)
OpenAI released GPT-Realtime-2 on May 7, 2026, as the new flagship of a refreshed Realtime API that simultaneously exited beta. The headline upgrades — GPT-5-class reasoning, 128K context, parallel tool calls — change what’s practical to build with voice. Here’s what GPT-Realtime-2 actually is, what it can do, and how it fits in the May 2026 voice stack.
Last verified: May 10, 2026
The announcement at a glance
| Property | Value |
|---|---|
| Released | May 7, 2026 |
| Provider | OpenAI |
| Architecture | Speech-to-speech (audio in, audio out) |
| Reasoning class | GPT-5-class |
| Context window | 128,000 tokens (up from 32K) |
| Tool calling | Parallel function calls supported |
| Reasoning effort | minimal / low / medium / high / xhigh |
| Audio input price | $32 / 1M tokens |
| Audio output price | $64 / 1M tokens |
| API status | GA — Realtime API out of beta |
What GPT-Realtime-2 actually is
GPT-Realtime-2 is the speech-to-speech version of OpenAI’s GPT-5-class frontier model. “Speech-to-speech” means the model receives audio and emits audio directly — there’s no intermediate text round-trip through ASR + LLM + TTS. That’s the architectural choice that gives Realtime models lower latency and more natural prosody than turn-based stacks.
The May 7, 2026 release shipped three audio models together:
- GPT-Realtime-2 — the flagship reasoning voice agent.
- GPT-Realtime-Translate — dedicated speech-to-speech translator (70+ input → 13 output languages, $0.034/min).
- GPT-Realtime-Whisper — streaming transcription only.
GPT-Realtime-2 replaces the original GPT-Realtime as OpenAI’s recommended default for conversational voice agents.
The five upgrades that actually matter
1. GPT-5-class reasoning in the voice loop
The original GPT-Realtime was capable but reasoning-light. It struggled with multi-step requests, planning, and constraint-heavy tasks. GPT-Realtime-2 brings GPT-5-class reasoning into the speech-to-speech loop, while still maintaining conversational pacing.
In practice: voice agents can now handle “book me a flight from Boston to Tokyo, layover not in San Francisco, in budget under $1500, leaving Tuesday morning” in one turn. The original model needed to be walked through it.
2. 128K context window (up from 32K)
The 4x context expansion removes the most common production workaround — chunking long calls or stripping conversation history to fit in 32K. With 128K you can:
- Hold full call history for sessions over an hour.
- Ground voice agents in product manuals, knowledge bases, or call scripts inline.
- Keep multi-tool conversation state across many turns without aggressive pruning.
3. Parallel tool calls
GPT-Realtime-2 can issue multiple tool calls in a single turn and narrate that work to the user (“let me check your order and look up the return policy at the same time”). For voice agents that hit multiple backends per turn (CRM + product DB + ticketing) this cuts latency by 30-50% in production benchmarks.
4. Preambles and recovery
Two conversational behaviors that solve the most jarring voice agent failures:
- Preambles — when the model knows it’s about to do work that will take a moment, it fills the silence. “Let me check that for you.” This is what makes the agent feel responsive rather than dead.
- Recovery — when the user interrupts mid-sentence or speaks over a tool call, the agent recovers instead of crashing or repeating itself.
These are the “unsexy but essential” features that distinguish demo voice agents from production ones.
5. Adjustable reasoning effort
A new reasoning_effort parameter — minimal, low, medium, high, xhigh — trades latency for reasoning depth.
- minimal — drive-thru order taking, FAQ lookup, simple commands. Sub-second time-to-first-audio.
- medium (default) — most consumer voice agents. Good balance.
- high / xhigh — complex enterprise workflows, troubleshooting, scheduling under constraints. Higher latency, materially better answers.
This is the same knob OpenAI exposes on text reasoning models, now plumbed through to voice.
Pricing breakdown
| Model | Audio input | Audio output | Notes |
|---|---|---|---|
| GPT-Realtime-2 | $32 / 1M tokens | $64 / 1M tokens | Default for conversational |
| GPT-Realtime-Translate | $0.034 / minute (flat) | included | Translation only, no dialog |
| GPT-Realtime-Whisper | Whisper streaming pricing | n/a | Transcription only |
Practical cost rule of thumb for GPT-Realtime-2: budget roughly $0.30-0.60 per minute of two-way conversation, depending on speech density and tool call frequency. Translation-only workloads are ~10x cheaper using GPT-Realtime-Translate.
Migration from the GPT-Realtime beta
The migration path is short:
- Rename the model in your session config:
gpt-realtime→gpt-realtime-2. - Add
reasoning_effort: "medium"as a default; tune per use case. - Enable parallel tool calls if your tools are independent — measurable latency win.
- Render preamble text on the client UI while audio streams.
- Verify your context window assumptions — long sessions that previously had to truncate at 32K can now run native to 128K.
Most teams ship the change in a single PR.
What GPT-Realtime-2 is not
- Not a translation specialist. Use GPT-Realtime-Translate for translation-only workloads. 10x cheaper, purpose-built.
- Not a transcription service. Use Whisper streaming.
- Not the cheapest voice option. ElevenLabs + a smaller LLM still beats it on cost when voice quality is the product but reasoning isn’t critical.
- Not on-device. Gemini Live runs natively on Pixel; GPT-Realtime-2 is API-only.
What to watch next
- Production SLA numbers — first published uptime data post-GA.
- Voice library expansion — OpenAI is expected to expand the voice catalog over Q3 2026.
- Multimodal voice agents — combining camera/screen + voice + tool use is the next frontier.
- Agent SDK support — how OpenAI’s Assistants API and the agentic stack take advantage of Realtime-2’s parallel tools.
Related reading
- GPT-Realtime-2 vs ElevenLabs vs Gemini Live
- Best AI voice agent platforms
- Best AI voice agent stack for SMB service businesses
Last verified: May 10, 2026 — sources: OpenAI Realtime API release notes, OpenAI cookbook for realtime translation, MarktechPost, TheNextWeb, Latent.Space, 9to5Mac, Microsoft Azure AI Foundry blog, OpenAI community forum.