Spatial AI refers to large models trained on video, gameplay, simulation, or 3D data — not just text — designed to develop intuition about space, time, physics, and embodied behavior. Where text-trained LLMs (GPT-5.5, Claude Fable 5, Gemini 2.5 Pro) excel at language and reasoning but struggle with spatial intuition, spatial AI models target the inverse: predicting what happens next in a physical scene, planning multi-step actions in environments with consequences, and supporting agents that act in 3D space. The category emerged in 2024-2025 with V-JEPA (Meta), Genie (Google DeepMind), and Cosmos (NVIDIA), and accelerated in 2026 with General Intuition's $320M Series A at $2.3B valuation announced June 25. The bet across all of these is that future AI agent and robotics systems will combine LLMs for reasoning with spatial AI for physical intuition, plus tools for action.

How does General Intuition compare to Google Genie 3?

General Intuition and Google DeepMind's Genie are both betting on spatial AI but with different data and approach. General Intuition trains primarily on curated video gameplay clips from Medal (billions of hours, human-curated for interesting moments). Its bet is that gameplay clips give the best signal on goal-directed behavior, planning, and physics intuition at scale. Genie 3 (Google DeepMind, released August 2025) is a foundation world model trained on internet video plus simulation environments, capable of generating playable 3D environments from text or image prompts. Genie 3 is publicly accessible via Google research previews; General Intuition is at the model-development stage with no public product yet. Genie has Google's compute and research depth; General Intuition has a defensible data moat (Medal) and venture-scale focus. Both are bets on the same category — they will likely produce complementary capabilities rather than a single winner.

How does General Intuition compare to NVIDIA Cosmos?

NVIDIA Cosmos (released January 2025) is NVIDIA's foundation model platform for physical AI — autonomous vehicles, robotics, and industrial AI. Cosmos includes World Foundation Models (WFMs) trained on a mix of synthetic and real video, with the explicit goal of accelerating physical-AI development by giving robotics companies a strong pre-trained starting point. Cosmos's positioning is platform / infrastructure for the robotics market. General Intuition's positioning is a frontier-model research lab betting on gameplay-as-training-data as the breakthrough approach. Cosmos has NVIDIA's full GPU stack and customer relationships; General Intuition has Medal's data moat and Bezos's personal investment. The likely outcome: Cosmos becomes the default platform for robotics companies that want to build on existing infrastructure; General Intuition becomes a research-driven competitor that targets specific high-leverage capabilities (e.g., complex planning, multi-agent reasoning) that emerge from gameplay-specific training.

Should I use spatial AI models in my AI application today?

For most AI applications in mid-2026, no. Spatial AI is currently upstream research infrastructure for robotics, autonomous systems, embodied agents, and game/simulation platforms — not directly usable in chat applications, text agents, or coding workflows. The closest production-ready spatial AI options are Google Genie 3 (research preview) and NVIDIA Cosmos (available for robotics developers). General Intuition, World Labs, V-JEPA, and Decart are at research or early-product stages. If you're building robotics, AR/VR, autonomous driving, simulation, or game AI, evaluate Genie 3 and Cosmos now and watch the category through 2027. If you're building chat-based or text-based AI applications, spatial AI is not directly relevant yet — but the underlying capabilities will likely surface in your stack as multimodal model improvements within 12-24 months.

Quick Answer

General Intuition vs Genie 3 vs Cosmos: Spatial AI 2026

Published: June 26, 2026

General Intuition vs Genie 3 vs Cosmos: Spatial AI 2026

General Intuition raised $320M at a $2.3B valuation on June 25, 2026 to build spatial AI frontier models trained primarily on gameplay clips. Spatial AI — large models that learn from video, gameplay, simulation, or 3D data rather than just text — is one of the fastest-emerging AI categories in 2026. How does General Intuition compare to Google DeepMind Genie 3, NVIDIA Cosmos, Meta V-JEPA, World Labs, and other spatial AI bets? Short answer: each has a distinct data thesis and target market; the category is large enough for multiple winners.

Last verified: June 26, 2026.

TL;DR

General Intuition: gameplay-trained spatial AI; $320M Series A at $2.3B; Bezos invested; Medal data moat
Google DeepMind Genie 3: foundation world model from internet video + simulation; playable 3D environment generation
NVIDIA Cosmos: platform foundation models for physical AI (robotics, autonomous vehicles); NVIDIA infrastructure
Meta V-JEPA: self-supervised learning from internet video; research-focused
World Labs (Fei-Fei Li): large world models with 3D scene focus; $230M seed at $1B (Sept 2024)
Decart: real-time generative video; consumer-facing apps
1X World Model: robotics-grounded world model from teleoperation data
Common thesis: AI agents and robotics need spatial intuition that text-trained LLMs cannot provide

The category framing

Each player makes a different bet on what kind of data and architecture produces the best spatial intuition.

Approach	Data thesis	Target user
General Intuition	Curated gameplay clips have the best embodied-behavior signal	Robotics, embodied agents, game AI
Google Genie 3	Internet video + simulation generates playable worlds	Researchers, developers, agent platforms
NVIDIA Cosmos	Synthetic + real video covers physical-AI use cases	Robotics, autonomous vehicles, industrial AI
Meta V-JEPA	Self-supervised learning from internet video	Open research, downstream applications
World Labs	3D scenes and spatial reasoning are the key primitives	Spatial computing, AR/VR, design
Decart	Real-time generation enables consumer applications	Consumer AR/VR, gaming, content
1X World Model	Robotics teleoperation is the most relevant data	1X humanoid robots specifically

Player-by-player

General Intuition

Dimension	Detail
Founded	2024 (spun out of Medal)
Funding	$320M Series A at $2.3B (announced June 25, 2026); total ~$450M+
Lead founder	Pim de Witte (Medal CEO)
Notable investor	Jeff Bezos personally
Data moat	Medal: billions of hours of curated gameplay
Stage	Pre-product; frontier model development
Differentiator	Gameplay-specific training data with strong rights position

The gameplay-first bet is the most differentiated approach in the category. Medal’s billions of hours of human-curated clips, paired with clean licensing terms, is a data asset no other player can replicate. The transfer question (does gameplay intuition transfer to real-world tasks?) is the open empirical risk.

Google DeepMind Genie 3

Dimension	Detail
Released	Genie 3 in August 2025; Genie research line since 2024
Parent	Google DeepMind
Data	Internet video + simulation environments
Output capability	Generates playable, interactive 3D environments from text/image prompts
Stage	Research preview, broader access expanding
Differentiator	Best-known and most-cited spatial AI model; Google compute + research

Genie 3 is the most public spatial AI model. It can generate full 3D environments that users can interact with in real time — text prompt in, playable world out. Genie’s existence validates the category and sets the bar for what “spatial AI generation” looks like.

NVIDIA Cosmos

Dimension	Detail
Released	January 2025
Parent	NVIDIA
Approach	World Foundation Models (WFMs); platform for physical AI
Target market	Robotics, autonomous vehicles, industrial AI
Differentiator	Native NVIDIA GPU integration; existing robotics customer base; platform business model

Cosmos is the most commercially-deployed spatial AI category entrant. NVIDIA has been positioning Cosmos as the foundation layer for physical AI developers — analogous to CUDA’s role in deep learning. The platform model (pre-trained models + GPUs + tooling) gives Cosmos commercial momentum even if individual model quality is contested.

Meta V-JEPA

Dimension	Detail
Released	V-JEPA (original) 2024; V-JEPA 2 in 2025
Parent	Meta FAIR (research org)
Approach	Self-supervised learning from internet video
Stage	Research releases, open-weight versions
Differentiator	Yann LeCun’s preferred architecture (JEPA); strong research direction

V-JEPA is the academic-research-oriented entrant. Yann LeCun (Meta Chief AI Scientist) has argued for years that JEPA-style architectures are the path to “real” AI, beyond next-token prediction. Meta releases V-JEPA models openly, which makes the architecture broadly accessible. Commercial deployment is less obvious than for Cosmos or Genie.

World Labs

Dimension	Detail
Founded	2024
Funding	$230M seed at $1B (September 2024)
Founders	Fei-Fei Li + Justin Johnson + Christoph Lassner + Ben Mildenhall
Approach	Large World Models with 3D scene focus
Stage	Pre-product or early-product
Differentiator	Founder credibility (Fei-Fei Li); 3D scene primitive focus

World Labs is the most academically-credentialed entrant — Fei-Fei Li is one of the most cited AI researchers in the world. The 3D-scene focus differentiates it from video-trained alternatives (General Intuition, Genie, V-JEPA). World Labs’ target applications appear to be spatial computing, AR/VR, design, and architecture, more than robotics.

Decart

Dimension	Detail
Founded	2023
Funding	Multiple rounds; growing fast
Approach	Real-time generative video
Target market	Consumer AR/VR, gaming, content creation
Differentiator	Speed and consumer-app focus

Decart is the most consumer-product-oriented entrant. Its real-time generation capability is differentiated; its target market is closer to consumer media than to robotics or research.

1X World Model

Dimension	Detail
Released	2024-2025
Parent	1X (humanoid robotics)
Approach	World model trained on robot teleoperation data
Target market	1X’s own humanoid robots
Differentiator	Robotics-grounded training data

1X is vertically integrated — they build the robots, collect the teleoperation data, train the world model, and use it in production. The data is the most directly relevant to robotics, but the scope is narrow (1X’s humanoid robots specifically).

Which approach wins?

Probably multiple. Spatial AI is too broad to be a winner-take-all market. Realistic outcomes:

Robotics platform layer

NVIDIA Cosmos is positioned to win as the default platform for robotics developers who want pre-trained foundation models with tooling. NVIDIA’s GPU lock-in and existing customer base give it strong defaults.

Frontier research and generation

Google Genie 3 is best-positioned for the “generate a playable 3D world from a prompt” use case. Google’s compute and research depth make this hard to beat.

Vertical / data-moat winners

General Intuition could win specific high-leverage capabilities (complex planning, multi-agent reasoning, novel-strategy emergence) where gameplay data gives a real advantage over internet video or synthetic data. 1X wins for its own robots. World Labs wins in spatial computing and 3D scene applications.

Open research

Meta V-JEPA continues to advance the academic frontier. If JEPA-style architectures prove superior over time, V-JEPA insights flow into all the others.

Consumer applications

Decart wins where real-time generation matters more than absolute model quality.

How to think about this as a builder

If you build robotics or autonomous systems

Evaluate NVIDIA Cosmos as your default platform
Watch General Intuition for first model releases (likely 2027) — could be a differentiated alternative for specific capabilities
Track 1X World Model insights for humanoid-specific applications

If you build AR/VR or spatial computing

World Labs is the most relevant when their product ships
Decart for real-time generation use cases
Genie 3 for generation-from-prompt use cases

If you build game AI

General Intuition is the most aligned with your data and use cases when their product is available
Use existing LLM + game-engine integration today

If you build chat or text AI applications

Spatial AI is not directly relevant in 2026
Capabilities will likely surface as multimodal upgrades to GPT, Claude, Gemini in 2027-2028

Bottom line

Spatial AI is a real and growing category — the same week as General Intuition’s $320M, Sail Research raised $80M for agent inference and Mirendil raised $200M for AI research tools. The investment cycle is unmistakable: the next leg of AI is moving beyond text-trained LLMs into models that understand space, time, and embodied action.

General Intuition’s gameplay-first bet is the most differentiated entrant in the category. Google Genie 3 is the most-public frontier. NVIDIA Cosmos is the most-commercially-deployed platform. World Labs has the most academically-credentialed team. V-JEPA continues to push the research frontier. Decart owns the real-time consumer corner. 1X owns the humanoid robotics corner.

Expect 3-5 winners across the category by 2028, not a single dominant player. Build robotics on Cosmos today, watch General Intuition and Genie for the next leg, and assume the underlying capabilities will surface in your LLM stack as multimodal upgrades over the next 24 months.