Gemini Omni is Google DeepMind's new multimodal 'world model', unveiled at Google I/O 2026 on May 19, 2026. It accepts text, images, audio, and video as input and produces realistic, physics-accurate video as output. DeepMind CEO Demis Hassabis described it as a step toward artificial general intelligence (AGI). The first model in the family, Gemini Omni Flash, is rolling out to paid Google AI subscribers.

How is Gemini Omni different from Veo 3.1?

Veo 3.1 is a dedicated text-to-video and image-to-video model. Gemini Omni is a unified multimodal world model: it can take any combination of text, image, audio, and existing video and produce video with physics-grounded behavior. Omni also supports conversational editing — you can refine generated videos through dialogue. Veo focuses on cinematic generation with strong audio; Omni focuses on world simulation and editability.

How does Gemini Omni compare to OpenAI Sora 2?

Sora 2 was OpenAI's leading video model from September 2025, but OpenAI deprecated the Sora product and API in April 2026 with a shutdown date of September 24, 2026. Gemini Omni is the live, supported successor in the category. Sora 2 had stronger 'characters' (insert real people) and longer clips (up to 20 seconds); Omni leads on multimodal input, physics realism, and conversational editing, and is actually shipping in May 2026.

How do I use Gemini Omni?

Gemini Omni Flash is available immediately to paid Google AI Plus, Pro, and Ultra subscribers via the Gemini app and Google Flow. A free version is being rolled out to YouTube Shorts and YouTube Create later in the week of May 19, 2026. All generated videos include a SynthID watermark.

Quick Answer

What Is Gemini Omni? Google's Multimodal World Model (May 2026)

Published: May 20, 2026

What Is Gemini Omni? Google’s Multimodal World Model (May 2026)

Gemini Omni is Google DeepMind’s new “world model”, unveiled at Google I/O 2026 on May 19, 2026. It takes text, image, audio, or video as input and produces physics-accurate video as output. Demis Hassabis called it a step toward AGI. The first version — Gemini Omni Flash — is shipping today.

Last verified: May 20, 2026

Quick facts

Property	Value
Announced	May 19, 2026 (Google I/O 2026)
Vendor	Google DeepMind
First model	Gemini Omni Flash
Inputs	Text, image, audio, video (any combination)
Output	Video (with physics simulation), conversationally editable
Watermark	SynthID embedded in every generated video
Available	Gemini app + Google Flow (paid), YouTube Shorts/Create (free, rolling out)
Successor positioning	Beyond Veo — a “world model,” not just a video model

What is a “world model”?

A world model is an AI system that has an internal simulation of how the physical world works — gravity, momentum, fluid dynamics, lighting, shadows, object permanence — and uses that simulation to generate or predict outcomes.

Where Veo 3.1 generates beautiful video, Omni generates video that respects physics: pour liquid and it falls correctly, push an object and it slides realistically, change the lighting and shadows update.

For Google DeepMind, this is the unification of three things that used to be separate models:

Reasoning — language-model-style understanding of intent.
Real-world knowledge — physics, materials, biology.
Generation — turning all of that into video, image, or audio output.

What Gemini Omni Flash can do

Generate video from any input — text prompt, image, audio clip, or existing video, or any combination.
Conversational editing — “make the sky brighter,” “remove the second character,” “add a chair on the left” — all in natural language, with visual consistency preserved across edits.
Physics-grounded scenes — gravity, kinetic energy, fluid dynamics, light/shadow are modeled rather than guessed.
Multi-turn refinement — keep editing the same video over multiple prompts.
Aspect-ratio control — portrait (9:16) and landscape (16:9), with frame-level guidance.
Speech input — provide a voice sample as part of the prompt (full speech generation/editing coming later).

Gemini Omni vs Veo 3.1 vs Sora 2

	Gemini Omni Flash	Veo 3.1	Sora 2
Vendor	Google DeepMind	Google DeepMind	OpenAI
Released	May 19, 2026	October 2025	September 2025
Status	Live, rolling out	Live	Deprecated (sunset Sep 24, 2026)
Multimodal input	Text + image + audio + video	Text + image	Text + image + video
Audio generation	Speech samples; full editing planned	Native synced dialogue + ambient + music	Native dialogue + SFX
Editing	Conversational, multi-turn	Frame-specific, video extension	Remix + targeted edits
Physics realism	World-model grounded	Strong	Strong (e.g. gymnastics, buoyancy)
Max clip length	Short clips (dynamic)	Up to 8s	Up to 20s
Resolution	High (not officially stated)	Up to 4K	Up to 1080p
Watermark	SynthID	SynthID	C2PA
Best for	Physics-accurate scenes + iterative editing	Cinematic motion + audio	(Not recommended — deprecated)

Why Sora 2 deprecation matters

OpenAI’s decision to shut down the Sora product and API by September 2026 is the single biggest change in the AI video landscape, and it lands the same week as Gemini Omni. Net effect: Google now has the only two live frontier-class video models (Omni for editing/world-sim, Veo 3.1 for cinematic generation). For anyone building on top of OpenAI’s video stack, Omni is the obvious migration path.

How to use Gemini Omni

Path	Who	How
Gemini app	Paid AI Plus, Pro, Ultra	Open Gemini, switch to Omni Flash in the model picker
Google Flow	Paid AI Plus, Pro, Ultra	The video creation studio for Omni + Veo
YouTube Shorts	Everyone (rolling out)	Generate Shorts from a prompt or remix
YouTube Create	Everyone (rolling out)	Edit footage conversationally
Vertex AI / Gemini API	Developers (expected)	Programmatic access via Vertex (timing TBC)

Limits and caveats

Clip length — short by default; not yet for long-form film.
Audio output — initial release is speech-sample-conditioned; full generative audio editing comes later “responsibly.”
SynthID is mandatory — every Omni-generated frame is watermarked. This is a feature for trust, but worth knowing if you’re building on top of Omni.
API access — consumer surfaces first; programmatic Vertex/Gemini API rollout following.

Who should care

Creators / video pros — conversational editing is the workflow change.
Marketing / ad teams — physics-accurate product visualization without a 3D pipeline.
Educators — explainer videos with correct physics (drop the ball, it actually falls right).
AGI watchers — Hassabis explicitly framed this as a step toward AGI; the bet is that learning physics from video is part of the path.
Anyone building on Sora 2 — start migrating to Omni or Veo 3.1.

TL;DR

Gemini Omni is video generation that actually understands the world. It accepts any multimodal input, outputs physics-grounded video, and lets you edit conversationally. Sora 2 is being deprecated; Omni is the live frontier-class video model that ships May 19, 2026. If you make video, the workflow is changing this month.