HyperFrames Review: HeyGen's HTML-to-Video for AI Agents

TL;DR

HyperFrames is the open-source rendering engine HeyGen quietly carved out of its internal video stack and dropped on GitHub in mid-April 2026. The pitch fits on a sticker: write HTML, render video, built for agents. Compositions are plain index.html files with data-start and data-duration attributes; a headless Chrome renderer seeks each frame, FFmpeg encodes it, and the result is a deterministic MP4. No React, no proprietary timeline, no bundler. Key facts:

Apache 2.0, no per-render fees, no commercial-use thresholds
21,894 GitHub stars and growing fast — one of the most-starred new AI repos of Q2 2026
Node.js 22+ and FFmpeg are the only hard requirements
Adapter-based animation: GSAP, CSS, Lottie, Three.js, Anime.js, WAAPI, or your own runtime
Deterministic by design: same HTML in, same bytes out — built for CI, regression tests, and reproducible renders
Agent skills shipped on day one for Claude Code, Cursor, Gemini CLI, Codex
AWS Lambda render path included for distributed rendering at scale
In production at HeyGen with community use at tldraw and TanStack
Repository: github.com/heygen-com/hyperframes

The interesting bet here isn’t “another video framework.” It’s that AI coding agents already know HTML — so if your video format is HTML, agents become competent video editors essentially for free. After a week of trying it, that claim mostly holds up.

Why “HTML as a video format” is the real story

Most video frameworks treat the timeline as a first-class object. You drag clips, set keyframes, and the tool serialises that into JSON, JSX, or a proprietary DSL. HyperFrames inverts that. The timeline is the HTML document:

<div id="stage" data-composition-id="launch"
     data-start="0" data-width="1920" data-height="1080">
  <video class="clip" data-start="0" data-duration="6"
         data-track-index="0" src="intro.mp4" muted playsinline></video>

  <h1 id="title" class="clip" data-start="1" data-duration="4"
      data-track-index="1">Launch day</h1>

  <audio data-start="0" data-duration="6" data-track-index="2"
         data-volume="0.5" src="music.wav"></audio>

  <script src="https://cdn.jsdelivr.net/npm/gsap@3/dist/gsap.min.js"></script>
  <script>
    const tl = gsap.timeline({ paused: true });
    tl.from("#title", { opacity: 0, y: 40, duration: 0.8 }, 1);
    window.__timelines = window.__timelines || {};
    window.__timelines.launch = tl;
  </script>
</div>

That’s a complete six-second composition: background video, fade-in title, background music at half volume. Open index.html in Chrome, hit play, and you can preview it without any tooling. Run npx hyperframes render and the same file becomes an MP4.

The trick that makes it work is the paused: true GSAP timeline registered on window.__timelines.<id>. The renderer doesn’t play your animation in wall-clock time — it seeks it. For each frame, it sets the timeline’s progress to the correct point, waits for layout, and screenshots. Animations stay frame-accurate at 60fps and you can shard a render across Lambda workers without drift.

For agents this is a huge unlock. Claude Code doesn’t need to learn a new “compositions and sequences” mental model — it writes the kind of HTML it already writes ten times a day, sprinkles in a few data attributes from the skill prompt, and the framework handles the rest.

Install and first render

# Prereqs: Node 22+, FFmpeg
brew install ffmpeg     # macOS
sudo apt install ffmpeg # Ubuntu/Debian

# Scaffold a project
npx hyperframes init my-video
cd my-video

# Preview in browser with live reload
npx hyperframes preview

# Render to MP4
npx hyperframes render

The init template gives you a working composition, a hyperframes.config.json, and a folder structure that won’t surprise anyone who’s used Vite or Next. Preview runs on port 4200 by default with live reload. First render of a 10-second 1080p composition takes 35–50 seconds on an M2 Pro — not blazing, but the deterministic property means identical re-renders hit cache.

The CLI is non-interactive by default: no [y/N] prompts, no spinners that confuse an agent’s pty parser. Whoever designed this clearly had claude and codex in mind, not humans typing into a terminal.

Using HyperFrames from Claude Code

This is where it stops being “another Remotion alternative” and starts being interesting. One command installs the official skill:

npx skills add heygen-com/hyperframes

Then in a Claude Code session:

Using /hyperframes, create a 10-second product intro for our blog post about HyperFrames. Title fades in at 1s, a background video plays the whole time, and there’s a kinetic caption from 4s to 9s. Render at 1080p.

What I got back, end-to-end and unattended:

A scaffolded composition directory
An index.html with three tracks correctly wired
A GSAP timeline with the fade-in and a slide-up caption
A preview run, then a render run, then a ffprobe on the output to verify duration
A summary message with the file path and runtime

This is the production loop the skills teach: plan → write HTML → wire animation → add media → lint → preview → render. The lint step is doing more work than it sounds — it catches missing data-duration, audio tracks that overrun the composition, and timeline IDs the renderer can’t find. Those are the exact errors that would otherwise make an agent loop endlessly trying to figure out why its render is silent or black.

After a week of using it daily, the main failure mode is the agent over-engineering animations: defaulting to Three.js when CSS transforms would do, or pulling in Lottie for what should be a text-shadow keyframe. The catalog mostly fixes that:

npx hyperframes add flash-through-white   # shader transition
npx hyperframes add instagram-follow      # social overlay
npx hyperframes add data-chart            # animated chart

If you point the agent at the catalog up front (Use blocks from the HyperFrames catalog where possible), output quality goes up noticeably.

HyperFrames vs Remotion: the honest comparison

Remotion has been the default answer for “video as code” since 2021. It’s mature, it’s well-documented, and Remotion Lambda is genuinely impressive. HyperFrames is openly inspired by it. So when does each one make sense?

Question	HyperFrames	Remotion
Authoring model	HTML + CSS + seekable animation	React components
Build step	None — `index.html` plays as-is	Bundler required
Agent handoff	Plain HTML files	JSX/TSX in a React project
Animation timing	Frame-accurate via seek adapters	Wall-clock; care needed for determinism
Distributed rendering	Local + AWS Lambda	Remotion Lambda (mature, more features)
License	Apache 2.0	Source-available Remotion License
Commercial cost above N renders	Zero	Paid tiers above thresholds
Component ecosystem	Catalog (small but growing)	Large React ecosystem
Learning curve for a React team	Slight relearn	Already there
Learning curve for a coding agent	Trivial	Higher (full React project mental model)

If you have a React team already shipping Remotion compositions, there’s no reason to switch — Remotion is more mature for human authors and its Lambda story has years of polish on HyperFrames’. If you’re building an automated content pipeline where the author is an agent, HyperFrames is genuinely a step change. The Apache 2.0 license also matters at scale: HeyGen explicitly removed the per-render and seat-count thresholds that creep into the Remotion License once you’re past hobby usage.

The deterministic-output property is a real technical win regardless of which side you’re on. Byte-identical MP4s from byte-identical HTML means you can put renders behind a content hash and skip rebuilds, run regression tests with diff, and shard work across workers without seams.

What people are saying

The launch hit r/ClaudeCode and r/ClaudeAI in mid-April and the reaction has been more measured than the typical “next-Remotion-killer” thread. A representative slice:

“Output is byte-identical across runs, so CI caching and shard-parallel rendering work. There is a frame-adapter pattern that lets GSAP, Lottie, CSS, Three.js, and (experimentally) Remotion coexist in one composition.” — top comment on the r/ClaudeAI launch thread, by a contributor
“Setup isn’t trivial and sometimes you spend more time debugging [Remotion] than creating. So when I saw HyperFrames calling itself ‘agent-native’ I got curious.” — r/buildinpublic
“This is the right primitive. Video as deterministic function of HTML. Everything else is a wrapper.” — top reply on the r/coolgithubprojects thread

The skeptical takes are mostly about ecosystem maturity (“the catalog is thin compared to React/Remotion ecosystem”) and about whether HTML is really the right abstraction for complex sequences. Both are fair. The counter-argument from the maintainers is that “agents write HTML” beats “humans write JSX” by enough margin to justify the smaller initial surface area.

Worth noting: the Medium piece from AI Engineering and The Agent Times both lead with “deterministic video primitive” rather than “Remotion killer,” and I think that’s the more accurate framing.

Honest limitations after a week

Things that bit me, in rough order of severity:

1. Render speed isn’t competitive on a single machine. Headless Chrome seek-per-frame is slower than Remotion’s parallel-frame approach for simple compositions. A 30-second 1080p render that takes ~3 minutes locally takes ~25 seconds on Remotion Lambda. The HyperFrames AWS Lambda story closes that gap but the bundle and setup is heavier than Remotion’s.

2. Audio sync is fiddly with seek-based animation. Because animations are seeked rather than played, audio that’s tightly choreographed to timeline events needs careful data-start math. The default audio mixer handles offsets and volume fine, but Lottie-style audio-reactive animation isn’t really a supported pattern yet.

3. The catalog is small. Right now it’s transitions, overlays, captions, charts, and a handful of effects. If you want a Lottie-style ecosystem of polished components, you’re going to build them. The blocks that exist are well-made, but expect to write a lot of HTML.

4. Git LFS for the test baselines is ~240MB. If you’re cloning to contribute, the GIT_LFS_SKIP_SMUDGE=1 flag in the README is your friend. The repo will build fine without the LFS content; you just can’t run the visual regression suite.

5. No first-party React adapter (yet). There’s an experimental Remotion frame adapter, but if you have a deep React component library you want to reuse, you’re going to wrap it yourself or wait. For greenfield work this doesn’t matter; for migrations it might.

6. The Studio is “available, evolving.” The browser-based editor exists but it’s clearly not the primary surface — that’s the agent skills. If you want a polished GUI editor with timeline scrubbing and visual track management, this isn’t it yet.

None of these are deal-breakers. The framework is doing exactly what it advertises; it’s just early.

When HyperFrames is the right call

Reach for it when:

You’re building an automated content pipeline where an agent generates the video (newsletter highlights, PR changelog reels, social posts from blog content)
You need deterministic output for CI caching, regression tests, or shard-parallel rendering
The Apache 2.0 license matters because you’ll be doing thousands of renders
Your team is comfortable writing HTML/CSS/JS and doesn’t want to adopt React just to make a video
You want a low-ceremony local preview that doesn’t require a dev server

Skip it (for now) when:

You already have a Remotion pipeline that works and a React team to maintain it
You need polished visual editing — Studio isn’t there yet
You need the maximum-throughput cloud render path with zero setup; Remotion Lambda is more mature
Your sequences are heavily React-component-driven and porting them isn’t worth it

FAQ

Q: Is HyperFrames actually open source, or is this a HeyGen-services play with an OSS wrapper? A: It’s Apache 2.0 with no usage caps, no telemetry requirement, and no “you must use our cloud for renders” clause. The Lambda render package is a self-deploy SDK — you stand up the Lambda functions in your own AWS account. HeyGen presumably benefits from ecosystem mindshare and from people who eventually want hosted authoring, but the OSS framework stands on its own.

Q: Do I need HeyGen credentials, an API key, or any cloud account to use it? A: No. Local CLI usage requires nothing except Node 22+ and FFmpeg. The AWS Lambda path requires an AWS account (yours, not HeyGen’s). hyperframes.dev is an optional community playground for sharing compositions.

Q: How does it compare to Manim or Motion Canvas? A: Manim and Motion Canvas are oriented toward mathematical/educational animation with their own DSLs. HyperFrames is generalist video composition with HTML as the format. For lecture-style geometric animations Manim is still better; for product videos, social cuts, or dashboards-to-video, HyperFrames fits better.

Q: Can I use existing React components inside a HyperFrames composition? A: Sort of. You can render React to a DOM node and have HyperFrames screenshot it like anything else, but you’ll lose the determinism benefits if your components rely on wall-clock state. The clean integration story is to render React to static HTML+CSS first and let HyperFrames handle timing. The experimental Remotion adapter is the closest thing to first-class React support today.

Q: What’s the actual ceiling — could I render a feature-length film with this? A: Technically yes; practically no one would. HyperFrames is optimised for short-to-medium compositions (seconds to a few minutes) where determinism, agent authorship, and HTML-native authoring are the wins. For anything over 10 minutes you’d want a different toolchain — or you’d want to compose many HyperFrames outputs together with FFmpeg directly. The deterministic primitive does make that kind of pipeline easier.

Q: Will HeyGen keep maintaining this or quietly abandon it? A: It’s used in production at HeyGen itself, commits come from active engineers there, and HeyGen staff respond on the community Discord. Listed adopters include tldraw and TanStack. The risk profile feels similar to corporate-sponsored OSS that’s load-bearing for the sponsor.

Verdict

HyperFrames is the first video framework I’ve used where “AI agent writes the video” feels like the intended workflow rather than a happy accident. The HTML-as-format bet is genuinely insightful: it dramatically lowers the surface area an agent has to learn while giving humans an authoring model they already know. The deterministic rendering property is the kind of unsexy technical decision that pays off for years.

It’s not as mature as Remotion. The catalog is thin, render throughput on a single machine is unimpressive, and the Studio is still a work in progress. But for the specific job of “wire video generation into an automated content pipeline driven by Claude Code or similar,” nothing else comes close right now.

If you’re building agent workflows that need to produce video, install the skill and try a prompt today. If you’re a Remotion shop with React infrastructure and human authors, there’s no need to switch. If you’re somewhere in between, watch the catalog and Studio progress over the next two quarters — that’s where the gap will close.

For me, this goes on the short list of repos I expect to be load-bearing in agent stacks by the end of 2026.

Try it: github.com/heygen-com/hyperframes · Docs · Catalog · Discord