What's new in GPT-Realtime-2 vs the original GPT-Realtime?

Five major upgrades. (1) GPT-5-class reasoning replaces the original Realtime model's lighter reasoning. The model can now think through multi-step requests while keeping the conversation flowing. (2) 128K context window up from 32K — long calls, document grounding, sustained agent tasks no longer need workarounds. (3) Parallel tool calls — the model can call multiple backend functions simultaneously and narrate the work (e.g., 'let me check your account and look up that order at the same time'). (4) Preambles and recovery — instead of going silent during slow tool calls, the model fills the silence; it gracefully handles user interruptions. (5) Adjustable reasoning_effort knob — minimal for latency-critical flows, xhigh for the hardest reasoning. Migration from the beta is typically a single PR — rename the model, optionally add reasoning_effort and parallel tool calls.

How does GPT-Realtime-2 fit alongside GPT-Realtime-Translate and GPT-Realtime-Whisper?

OpenAI shipped three audio models simultaneously on May 7, 2026, each purpose-built. (1) GPT-Realtime-2 is the conversational reasoning model — full agent, tool calls, dialog. (2) GPT-Realtime-Translate is dedicated speech-to-speech interpretation — 70+ input languages to 13 output languages, flat $0.034 per minute, NOT a conversational agent (it doesn't reason or hold dialog state, it just translates speech in real time). (3) GPT-Realtime-Whisper is streaming transcription only. Most production teams now run a small router: detect intent → translation jobs go to Translate (cheaper, purpose-built), conversational jobs go to Realtime-2, transcription-only jobs go to Whisper. Mixing them in one app is the recommended pattern, not picking just one.

Is GPT-Realtime-2 worth migrating to from the beta?

Yes for most teams. Three reasons. (1) The 128K context unlock removes the chunking workarounds that beta apps needed for long calls and document grounding. (2) Parallel tool calls cut latency for agents that hit multiple backends per turn — measurable 30-50% latency reduction on multi-tool workflows in beta testing. (3) The Realtime API exiting beta means production SLAs apply — if you're shipping to customers, this is now safe to depend on. The main migration tasks are: rename the model, add a reasoning_effort default (medium is sensible), enable parallel tool calls if your tools are independent, and add UI rendering for the preamble text. Most teams report a single afternoon of work.

Quick Answer

What Is GPT-Realtime-2? OpenAI's New Voice Model (May 2026)

Q: What is GPT-Realtime-2?

GPT-Realtime-2 is OpenAI's flagship speech-to-speech voice model, released May 7, 2026, when the OpenAI Realtime API exited beta. It brings GPT-5-class reasoning into the voice loop with a 128,000-token context window (up from 32K in the original GPT-Realtime), parallel tool calls, conversational repair behavior (preambles like 'let me check that' plus graceful recovery from interruptions), tone control, and adjustable reasoning effort (minimal, low, medium, high, xhigh). Pricing is $32 per million audio-input tokens and $64 per million audio-output tokens. It's the production default for voice agents that need to reason and call tools, replacing the original GPT-Realtime.

Published: May 10, 2026

What Is GPT-Realtime-2? OpenAI’s New Voice Model (May 2026)

OpenAI released GPT-Realtime-2 on May 7, 2026, as the new flagship of a refreshed Realtime API that simultaneously exited beta. The headline upgrades — GPT-5-class reasoning, 128K context, parallel tool calls — change what’s practical to build with voice. Here’s what GPT-Realtime-2 actually is, what it can do, and how it fits in the May 2026 voice stack.

Last verified: May 10, 2026

The announcement at a glance

Property	Value
Released	May 7, 2026
Provider	OpenAI
Architecture	Speech-to-speech (audio in, audio out)
Reasoning class	GPT-5-class
Context window	128,000 tokens (up from 32K)
Tool calling	Parallel function calls supported
Reasoning effort	minimal / low / medium / high / xhigh
Audio input price	$32 / 1M tokens
Audio output price	$64 / 1M tokens
API status	GA — Realtime API out of beta

What GPT-Realtime-2 actually is

GPT-Realtime-2 is the speech-to-speech version of OpenAI’s GPT-5-class frontier model. “Speech-to-speech” means the model receives audio and emits audio directly — there’s no intermediate text round-trip through ASR + LLM + TTS. That’s the architectural choice that gives Realtime models lower latency and more natural prosody than turn-based stacks.

The May 7, 2026 release shipped three audio models together:

GPT-Realtime-2 — the flagship reasoning voice agent.
GPT-Realtime-Translate — dedicated speech-to-speech translator (70+ input → 13 output languages, $0.034/min).
GPT-Realtime-Whisper — streaming transcription only.

GPT-Realtime-2 replaces the original GPT-Realtime as OpenAI’s recommended default for conversational voice agents.

The five upgrades that actually matter

1. GPT-5-class reasoning in the voice loop

The original GPT-Realtime was capable but reasoning-light. It struggled with multi-step requests, planning, and constraint-heavy tasks. GPT-Realtime-2 brings GPT-5-class reasoning into the speech-to-speech loop, while still maintaining conversational pacing.

In practice: voice agents can now handle “book me a flight from Boston to Tokyo, layover not in San Francisco, in budget under $1500, leaving Tuesday morning” in one turn. The original model needed to be walked through it.

2. 128K context window (up from 32K)

The 4x context expansion removes the most common production workaround — chunking long calls or stripping conversation history to fit in 32K. With 128K you can:

Hold full call history for sessions over an hour.
Ground voice agents in product manuals, knowledge bases, or call scripts inline.
Keep multi-tool conversation state across many turns without aggressive pruning.

3. Parallel tool calls

GPT-Realtime-2 can issue multiple tool calls in a single turn and narrate that work to the user (“let me check your order and look up the return policy at the same time”). For voice agents that hit multiple backends per turn (CRM + product DB + ticketing) this cuts latency by 30-50% in production benchmarks.

4. Preambles and recovery

Two conversational behaviors that solve the most jarring voice agent failures:

Preambles — when the model knows it’s about to do work that will take a moment, it fills the silence. “Let me check that for you.” This is what makes the agent feel responsive rather than dead.
Recovery — when the user interrupts mid-sentence or speaks over a tool call, the agent recovers instead of crashing or repeating itself.

These are the “unsexy but essential” features that distinguish demo voice agents from production ones.

5. Adjustable reasoning effort

A new reasoning_effort parameter — minimal, low, medium, high, xhigh — trades latency for reasoning depth.

minimal — drive-thru order taking, FAQ lookup, simple commands. Sub-second time-to-first-audio.
medium (default) — most consumer voice agents. Good balance.
high / xhigh — complex enterprise workflows, troubleshooting, scheduling under constraints. Higher latency, materially better answers.

This is the same knob OpenAI exposes on text reasoning models, now plumbed through to voice.

Pricing breakdown

Model	Audio input	Audio output	Notes
GPT-Realtime-2	$32 / 1M tokens	$64 / 1M tokens	Default for conversational
GPT-Realtime-Translate	$0.034 / minute (flat)	included	Translation only, no dialog
GPT-Realtime-Whisper	Whisper streaming pricing	n/a	Transcription only

Practical cost rule of thumb for GPT-Realtime-2: budget roughly $0.30-0.60 per minute of two-way conversation, depending on speech density and tool call frequency. Translation-only workloads are ~10x cheaper using GPT-Realtime-Translate.

Migration from the GPT-Realtime beta

The migration path is short:

Rename the model in your session config: gpt-realtime → gpt-realtime-2.
Add reasoning_effort: "medium" as a default; tune per use case.
Enable parallel tool calls if your tools are independent — measurable latency win.
Render preamble text on the client UI while audio streams.
Verify your context window assumptions — long sessions that previously had to truncate at 32K can now run native to 128K.

Most teams ship the change in a single PR.

What GPT-Realtime-2 is not

Not a translation specialist. Use GPT-Realtime-Translate for translation-only workloads. 10x cheaper, purpose-built.
Not a transcription service. Use Whisper streaming.
Not the cheapest voice option. ElevenLabs + a smaller LLM still beats it on cost when voice quality is the product but reasoning isn’t critical.
Not on-device. Gemini Live runs natively on Pixel; GPT-Realtime-2 is API-only.

What to watch next

Production SLA numbers — first published uptime data post-GA.
Voice library expansion — OpenAI is expected to expand the voice catalog over Q3 2026.
Multimodal voice agents — combining camera/screen + voice + tool use is the next frontier.
Agent SDK support — how OpenAI’s Assistants API and the agentic stack take advantage of Realtime-2’s parallel tools.

Last verified: May 10, 2026 — sources: OpenAI Realtime API release notes, OpenAI cookbook for realtime translation, MarktechPost, TheNextWeb, Latent.Space, 9to5Mac, Microsoft Azure AI Foundry blog, OpenAI community forum.