AI agents · OpenClaw · self-hosting · automation

Quick Answer

What Is MolmoMotion? Ai2's Open 3D Motion Model (July 2026)

Published:

What Is MolmoMotion? Ai2’s Open 3D Motion Model (July 2026)

On July 1, 2026, the Allen Institute for AI (Ai2) released MolmoMotion — an open vision-language model that takes a few seconds of video and predicts where the objects in that video will move next in 3D space. It’s a small release compared to the same-week Claude Sonnet 5 and GPT-5.6 Sol headlines, but it’s the most important open contribution to a rapidly commercializing field: language-conditioned 3D motion forecasting.

Last verified: July 2, 2026

What MolmoMotion actually does

Feed MolmoMotion:

  • A short video (a few seconds)
  • Optionally: a language prompt (“the person in the red jersey is about to jump”)

It outputs:

  • 3D trajectories for objects in the scene — where they’ll be in the next N seconds
  • Uncertainty estimates for each prediction
  • Optional language grounding — which object the model thinks the prompt refers to

It’s not a video generator. It’s a motion prior — a compact structured output that other systems (robot planners, animation tools, sports-analytics dashboards) can consume.

Why this matters

Motion forecasting is the missing link between vision-language models and physical AI. VLMs today can describe what they see; world models like Google’s Genie 3 can hallucinate plausible futures as video. Neither directly answers the question a robot arm or a self-driving car actually needs: “where will this object be 500ms from now?”

Closed labs are building this internally (Waymo, Wayve, Tesla, and OpenAI’s rumored world simulation work). Ai2 is the first to ship a fully open attempt — weights, code, training data recipe.

Three practical unlocks:

  1. Robotics research. Open motion priors mean smaller labs can train manipulation policies without commissioning a proprietary world model.
  2. Video generation. Language-conditioned motion trajectories can plug into diffusion video pipelines (Runway, Kling, Veo) as controllable motion guides.
  3. Sports and biomechanics. Analysts can fine-tune MolmoMotion on their sport of choice without paying per-frame for a closed API.

How MolmoMotion fits in the Molmo family

Ai2’s Molmo lineup as of July 2026:

ModelFocusReleased
MolmoOpen multimodal (vision + language)2024, iterated through 2025-2026
MolmoActOpen action model for robotics2025
MolmoMotionOpen 3D motion forecastingJuly 1, 2026

MolmoMotion completes a family: perception (Molmo) → action (MolmoAct) → prediction (MolmoMotion). All three are Apache-licensed with open weights.

Comparison: MolmoMotion vs Closed Motion Models

PropertyMolmoMotionGoogle Genie 3Closed lab world models
LicenseApache 2.0 (open weights)ProprietaryProprietary
OutputStructured 3D trajectoriesGenerated videoVaries
Language conditioningYesYesSometimes
Best forDownstream plannersVideo generationInternal use
CostFree to self-hostAPI + creditsNot available externally

Limitations

  • Small model, small scenes. MolmoMotion is not a general-purpose world simulator. It’s tuned for a handful of seconds and a modest number of objects.
  • Depth ambiguity. Predicting 3D motion from monocular video is fundamentally underconstrained. MolmoMotion outputs uncertainty, but users must handle it.
  • No physics simulator. The model predicts what motion looks plausible, not what physics actually would produce. Downstream planners still need a real simulator or safety layer.

What to watch

  • Community fine-tunes on sports (NBA/NFL analytics), driving (nuScenes, Waymo Open), and manipulation datasets
  • Integrations with open robotics stacks (LeRobot, DROID) and open video generators (Runway ML’s open-source components, Stable Video Diffusion successors)
  • Ai2’s next model — a Molmo-scale world simulator would close the gap with Google Genie

Bottom line

MolmoMotion is a small but important release: the first fully open, language-conditioned 3D motion forecasting model. It won’t compete with closed world models on generative video quality, but it gives the open community a compact, composable motion prior that can plug into robotics, video generation, and analytics pipelines. In a week dominated by proprietary flagship model announcements, it’s the most useful open contribution.


Related: AI video generators 2026: Veo vs Runway vs Kling vs Pika · AI video after Sora shutdown