ProofShot: Give AI Coding Agents Eyes to Verify UI

TL;DR

What: Open-source CLI that gives AI coding agents a real browser to verify their UI work
Creator: AmElmo on GitHub
How it works: Records video, captures screenshots, collects console/server errors, bundles everything into a self-contained HTML report
Works with: Claude Code, Cursor, Codex, Gemini CLI, Windsurf, GitHub Copilot — any agent that can run shell commands
Install: npm install -g proofshot && proofshot install
License: Open source on GitHub
HN traction: 60+ points and 44 comments on Show HN
Key insight: AI agents write code blind — they can’t see if the UI they built actually looks right. ProofShot closes that feedback loop.

Quick Reference Table

Detail	Value
Repository	AmElmo/proofshot
Package	proofshot on npm
Language	TypeScript (ESM-only)
Browser engine	agent-browser (headless Chromium)
Install	`npm install -g proofshot`
Supported agents	Claude Code, Cursor, Codex, Gemini CLI, Windsurf, GitHub Copilot
License	Open source
Error detection	10+ languages (Node.js, Python, Ruby, Go, Java, Rust, PHP, C#, Elixir, more)

Here’s the uncomfortable truth about AI coding agents in 2026: they can write thousands of lines of frontend code, but they have no idea what the result actually looks like.

Claude Code can scaffold an entire React dashboard. Cursor can refactor your component library. Codex can implement a checkout flow. But none of them can open a browser and check whether the login button is actually visible, whether the layout broke on mobile, or whether the form throws a console error when you click submit.

The feedback loop is broken. The agent writes code, you review it manually, you find visual bugs, you go back to the agent, repeat. For UI-heavy work, this manual verification step eats 30-40% of development time.

ProofShot fixes this by giving AI agents what they’ve been missing: a real browser they can drive, record, and report on.

How ProofShot Works

The workflow is deliberately simple — three commands that any AI agent can run:

Step 1: Start a Session

proofshot start --run "npm run dev" --port 3000 --description "Verify checkout flow"

This launches your dev server, opens a headless Chromium browser via agent-browser, and starts recording everything — video, console output, server logs.

Step 2: The Agent Drives the Browser

# See what's on the page
agent-browser snapshot -i

# Navigate to the target page
agent-browser open http://localhost:3000/login

# Fill in a form
agent-browser fill @e2 "[email protected]"

# Click a button
agent-browser click @e5

# Capture a screenshot as proof
agent-browser screenshot ./proofshot-artifacts/step-login.png

The agent interacts with real DOM elements using element references (@e2, @e5) that agent-browser discovers via snapshot. This isn’t simulated — it’s actual browser automation.

Step 3: Stop and Bundle Proof

proofshot stop

This stops recording and generates a timestamped folder in ./proofshot-artifacts/ containing:

File	What it is
`session.webm`	Full video recording of the browser session
`viewer.html`	Standalone interactive viewer with scrub bar, timeline, and log tabs
`SUMMARY.md`	Markdown report with errors, screenshots, and video
`step-*.png`	Screenshots captured at key moments
`session-log.json`	Action timeline with timestamps and element data
`server.log`	Dev server stdout/stderr
`console-output.log`	Browser console output

The viewer.html is the standout feature — it’s a self-contained HTML file you can open in any browser. You get the video recording with a scrub bar, action markers on the timeline, and tabs for console and server logs with error highlighting synced to the video timestamps.

The Skill System: Zero-Config Agent Integration

The clever part is how ProofShot teaches agents to use it. Running proofshot install auto-detects your AI coding tools and installs a skill file at the user level:

proofshot install              # Interactive — picks all detected tools
proofshot install --only claude # Just Claude Code
proofshot install --skip cursor # Everything except Cursor

Here’s where each skill gets installed:

Agent	Skill Location
Claude Code	`~/.claude/skills/proofshot/SKILL.md`
Cursor	`~/.cursor/rules/proofshot.mdc`
Codex (OpenAI)	`~/.codex/skills/proofshot/SKILL.md`
Gemini CLI	Appends to `~/.gemini/GEMINI.md`
Windsurf	Appends to `~/.codeium/windsurf/memories/global_rules.md`

Once installed, the agent just knows how to use ProofShot. You say “verify this with proofshot” and the agent handles the entire start → test → stop workflow automatically. No per-project configuration needed.

GitHub PR Integration

This is where ProofShot becomes a real workflow tool, not just a demo:

proofshot pr        # Auto-detect PR from current branch
proofshot pr 42     # Target a specific PR number
proofshot pr --dry-run  # Preview markdown without posting

proofshot pr finds all verification sessions recorded on the current branch, uploads screenshots and video to GitHub, and posts a formatted comment on the PR with inline media. Your reviewer sees exactly what the agent built — video proof, screenshots, and error logs — right in the PR.

Requires GitHub CLI (gh) to be installed and authenticated. If ffmpeg is available, it converts .webm video to .mp4 for better compatibility.

Visual Regression Testing

ProofShot also supports visual diff comparisons:

proofshot diff --baseline ./previous-artifacts

This compares current screenshots against a baseline set, catching visual regressions that code review alone would miss. For teams doing rapid AI-assisted iteration, this is a safety net against layout breakage.

What the Community Is Saying

The Show HN thread (60+ points, 44 comments) shows strong developer interest:

The positive reactions:

Developers are excited about closing the AI coding feedback loop
Integration with existing tools (Claude Code, Cursor) means low adoption friction
The PR upload feature resonated — reviewers want visual proof, not just code diffs
Potential to reduce manual QA cycles by 30-40% for UI work

The concerns:

How well can it detect subtle visual issues vs. just functional correctness?
Performance overhead of running headless Chromium alongside dev server
Whether the element reference system (@e2, @e5) is stable enough for complex SPAs

The broader interest:

Several commenters highlighted potential integration with GitHub Copilot for end-to-end automation
Discussion around using ProofShot for rapid prototyping verification
Interest in extending it to mobile viewport testing

Error Detection Across 10+ Languages

ProofShot doesn’t just record video — it actively detects errors from server logs across multiple backend languages: JavaScript/Node.js, Python, Ruby/Rails, Go, Java/Kotlin, Rust, PHP, C#/.NET, Elixir/Phoenix, and more.

The error patterns are defined in src/utils/error-patterns.ts and are extensible. When a server error fires during verification, it shows up in the viewer timeline synced to the exact moment it occurred in the video.

Architecture Under the Hood

ProofShot is built on:

agent-browser by Vercel Labs — headless Chromium with an element reference system designed for AI agents
TypeScript (ESM-only) with tsup for builds and vitest for tests
GCD-style session management — each session produces isolated artifacts in timestamped folders
Zero cloud dependency — everything runs locally, artifacts stay on your machine

The key architectural decision is using agent-browser instead of raw Playwright or Puppeteer. The element reference system (@e2, @e5) gives agents stable handles to interact with DOM elements, which is more reliable than CSS selectors or XPath for AI-driven automation.

Getting Started: Try It in 5 Minutes

# Install globally
npm install -g proofshot
proofshot install

# Clone the repo for sample apps
git clone https://github.com/AmElmo/proofshot.git
cd proofshot
npm install && npm run build && npm link

# Set up sample app
cd test/fixtures/sample-app
npm install

Then open your AI agent in the test/fixtures/sample-app/ directory and prompt:

Verify the sample app with proofshot. Start on the homepage, check the hero section, navigate to the Dashboard and check the metrics, then go to Settings and update the profile name. Screenshot each page.

Or run without an agent:

bash test-proofshot.sh

Check proofshot-artifacts/ for the generated video, screenshots, and report.

Who Should Use This (and Who Shouldn’t)

Use ProofShot if:

You’re using AI coding agents for UI/frontend work and tired of manually checking every change
Your team wants visual proof in PR reviews, not just code diffs
You’re iterating rapidly with AI agents and need a fast verification loop
You want automated error detection across multiple backend languages

Skip it if:

You’re not doing UI work (backend-only codebases won’t benefit much)
You need pixel-perfect visual testing (dedicated tools like Percy or Chromatic are more mature)
You’re working on mobile-native apps (browser-only for now)

Comparison with Alternatives

Tool	Focus	AI Agent Integration	Local/Cloud	PR Integration
ProofShot	AI agent UI verification	Built-in skill system	Local only	GitHub via `proofshot pr`
Playwright	E2E testing	Manual scripting	Local	Via CI
Percy	Visual regression	No AI integration	Cloud	GitHub/GitLab
Chromatic	Storybook visual testing	No AI integration	Cloud	GitHub
Meticulous	AI-generated E2E tests	Generates tests, doesn’t verify agent work	Cloud	GitHub

ProofShot occupies a unique niche: it’s not a testing framework, it’s a verification layer specifically for AI coding agents. The skill system and self-contained artifacts make it practical for the “agent builds, human reviews” workflow that most teams actually use.

FAQ

Does ProofShot work with any AI coding agent?

Yes — any agent that can execute shell commands can use ProofShot. The proofshot install command has built-in support for Claude Code, Cursor, Codex, Gemini CLI, Windsurf, and GitHub Copilot, but the CLI itself is agent-agnostic.

Does it require a cloud service or API key?

No. ProofShot is fully local. No cloud dependency, no vendor lock-in. Artifacts stay on your machine. The only external integration is optional GitHub PR uploads via the gh CLI.

How much overhead does it add?

ProofShot runs a headless Chromium instance via agent-browser alongside your dev server. On a modern machine with 16GB+ RAM, the overhead is minimal. The video recording and screenshot capture run asynchronously.

Can I use it for visual regression testing in CI?

The proofshot diff command supports baseline comparisons, but ProofShot is primarily designed for developer-time verification, not CI pipelines. For production visual regression testing, tools like Percy or Chromatic are more battle-tested.

What about mobile viewport testing?

Currently browser-only with desktop viewports. Mobile viewport simulation could be added via Chromium’s device emulation, but it’s not built-in yet.

Is the video recording resource-intensive?

The .webm recording uses efficient browser-native capture. Session videos are typically 5-20MB depending on length. The self-contained viewer.html file adds minimal overhead since it embeds the video reference rather than the video itself.