Why does 3D motion forecasting matter?

Motion forecasting is the bridge between vision-language models and physical AI. Robotics, autonomous vehicles, sports analytics, video generation, and safety monitoring all need models that can reason about where objects will be a few seconds in the future — not just what they are right now. MolmoMotion is one of the first fully open (weights + code + data) models to attempt this in 3D.

How does MolmoMotion compare to closed motion models?

Closed models like Google's Genie 3 and OpenAI's world simulation systems generate video futures. MolmoMotion is smaller and more focused: it outputs structured 3D motion trajectories that can be consumed by downstream planners (robot policies, driving stacks, animation tools). It trades video-generation quality for open weights and language conditioning.

Who is MolmoMotion for?

Robotics researchers, autonomous-driving teams, sports and biomechanics analysts, video-generation researchers who want a language-controllable motion prior, and hobbyists building 3D applications. It's Apache-licensed, so commercial use is allowed.

Quick Answer

What Is MolmoMotion? Ai2's Open 3D Motion Model (July 2026)

Q: What is MolmoMotion?

MolmoMotion is an open vision-language model from the Allen Institute for AI (Ai2), released on July 1, 2026. It takes a few seconds of video and predicts where the objects in that video will move next in 3D space. It's part of the Molmo family, extending Ai2's open multimodal work into temporal 3D motion forecasting.

Published: July 2, 2026

What Is MolmoMotion? Ai2’s Open 3D Motion Model (July 2026)

On July 1, 2026, the Allen Institute for AI (Ai2) released MolmoMotion — an open vision-language model that takes a few seconds of video and predicts where the objects in that video will move next in 3D space. It’s a small release compared to the same-week Claude Sonnet 5 and GPT-5.6 Sol headlines, but it’s the most important open contribution to a rapidly commercializing field: language-conditioned 3D motion forecasting.

Last verified: July 2, 2026

What MolmoMotion actually does

Feed MolmoMotion:

A short video (a few seconds)
Optionally: a language prompt (“the person in the red jersey is about to jump”)

It outputs:

3D trajectories for objects in the scene — where they’ll be in the next N seconds
Uncertainty estimates for each prediction
Optional language grounding — which object the model thinks the prompt refers to

It’s not a video generator. It’s a motion prior — a compact structured output that other systems (robot planners, animation tools, sports-analytics dashboards) can consume.

Why this matters

Motion forecasting is the missing link between vision-language models and physical AI. VLMs today can describe what they see; world models like Google’s Genie 3 can hallucinate plausible futures as video. Neither directly answers the question a robot arm or a self-driving car actually needs: “where will this object be 500ms from now?”

Closed labs are building this internally (Waymo, Wayve, Tesla, and OpenAI’s rumored world simulation work). Ai2 is the first to ship a fully open attempt — weights, code, training data recipe.

Three practical unlocks:

Robotics research. Open motion priors mean smaller labs can train manipulation policies without commissioning a proprietary world model.
Video generation. Language-conditioned motion trajectories can plug into diffusion video pipelines (Runway, Kling, Veo) as controllable motion guides.
Sports and biomechanics. Analysts can fine-tune MolmoMotion on their sport of choice without paying per-frame for a closed API.

How MolmoMotion fits in the Molmo family

Ai2’s Molmo lineup as of July 2026:

Model	Focus	Released
Molmo	Open multimodal (vision + language)	2024, iterated through 2025-2026
MolmoAct	Open action model for robotics	2025
MolmoMotion	Open 3D motion forecasting	July 1, 2026

MolmoMotion completes a family: perception (Molmo) → action (MolmoAct) → prediction (MolmoMotion). All three are Apache-licensed with open weights.

Comparison: MolmoMotion vs Closed Motion Models

Property	MolmoMotion	Google Genie 3	Closed lab world models
License	Apache 2.0 (open weights)	Proprietary	Proprietary
Output	Structured 3D trajectories	Generated video	Varies
Language conditioning	Yes	Yes	Sometimes
Best for	Downstream planners	Video generation	Internal use
Cost	Free to self-host	API + credits	Not available externally

Limitations

Small model, small scenes. MolmoMotion is not a general-purpose world simulator. It’s tuned for a handful of seconds and a modest number of objects.
Depth ambiguity. Predicting 3D motion from monocular video is fundamentally underconstrained. MolmoMotion outputs uncertainty, but users must handle it.
No physics simulator. The model predicts what motion looks plausible, not what physics actually would produce. Downstream planners still need a real simulator or safety layer.

What to watch

Community fine-tunes on sports (NBA/NFL analytics), driving (nuScenes, Waymo Open), and manipulation datasets
Integrations with open robotics stacks (LeRobot, DROID) and open video generators (Runway ML’s open-source components, Stable Video Diffusion successors)
Ai2’s next model — a Molmo-scale world simulator would close the gap with Google Genie

Bottom line

MolmoMotion is a small but important release: the first fully open, language-conditioned 3D motion forecasting model. It won’t compete with closed world models on generative video quality, but it gives the open community a compact, composable motion prior that can plug into robotics, video generation, and analytics pipelines. In a week dominated by proprietary flagship model announcements, it’s the most useful open contribution.