Dirac Review: 65% TerminalBench, 65% Cheaper Coding Agent

TL;DR

Dirac is an Apache-2.0 open-source coding agent that quietly topped the Terminal-Bench-2 leaderboard at the end of April 2026 — beating Google’s own baseline and Junie CLI, the leading closed-source agent, while costing 64.8% less to run. It launched on Hacker News on April 27, pulled 390 points and 145 comments, and is the most interesting OSS coding agent I’ve tried since Cline itself.

Key facts:

Open source, Apache-2.0 — fork of Cline with deep architectural changes
TerminalBench 2.0 score: 65.2% with gemini-3-flash-preview (vs. Junie CLI 64.3%, Google baseline 47.6%)
Avg cost per refactoring task: $0.18 vs. $0.49 for Cline, $0.73 for Kilo, $0.60 for Roo
8/8 success on the public refactor eval suite (transformers, vscode, django) — only Opencode also got 8/8
Hash-anchored edits + AST manipulation instead of line-number diffs
Multi-file batching in a single LLM round-trip
No MCP support (deliberate decision — uses native tool calling only)
VS Code extension + standalone CLI (npm install -g dirac-cli)
Works with any model that supports native tool calls: Claude, GPT, Gemini, Qwen, OpenRouter, custom OpenAI-compatible endpoints

If you’re tired of agents that burn through your API budget rewriting whole files just to change three lines, Dirac is the one to try this week.

Quick Reference

Field	Value
Repo	dirac-run/dirac
Site	dirac.run
Author	Max Trivedi (Dirac Delta Labs)
License	Apache-2.0
Parent project	Cline
CLI package	`dirac-cli`
VS Code extension	Marketplace listing
HN launch	Apr 27, 2026 — 390 pts, 145 comments
Benchmark	TerminalBench 2.0: 65.2% (#1)

What “Token-Efficient” Actually Means Here

Most coding agents waste tokens in two predictable ways: they re-read large files every turn, and they re-emit large file rewrites for tiny edits. Dirac’s pitch is that context curation isn’t a “nice to have” — it’s the entire game, because LLM reasoning degrades with context length, so a leaner context produces better code, not just cheaper code.

The README puts it bluntly: “It is a well studied phenomenon that any given model’s reasoning ability degrades with the context length. If we can keep context tightly curated, we improve both accuracy and cost while making larger changes tractable in a single task.”

Three architectural choices fall out of that thesis:

1. Hash-anchored edits

Instead of saying “replace lines 142–158 with this new block,” Dirac targets edits by hashing a stable region of surrounding code. When the agent has been making changes for a while and line numbers have drifted, traditional agents re-read the file or get confused. Dirac just locates the hash and applies the patch.

In practice this means fewer “I lost the line numbers, let me re-read the file” moments, which is where Cline-style agents bleed tokens.

2. AST-native operations

Dirac understands TypeScript, Python, C++, and other language ASTs natively. Operations like “extract function,” “rename class,” or “move method to new file” don’t go through fuzzy text edits — they’re structural transformations. This is where the 100% accuracy numbers on the eval table come from: the agent literally cannot mangle whitespace or partially rename a symbol when it’s manipulating a parse tree.

3. Multi-file batching

Most agents process one file per LLM round-trip. Dirac sends multiple file reads and edits in a single call when they’re related. On Task6 in the eval suite — a 25-file refactor in huggingface/transformers — Dirac finished for $0.34. Cline cost $0.87 for the same task; Roo cost $1.44. That’s not a small optimization.

Together these don’t make Dirac a smarter model — it’s running the same gemini-3-flash-preview everyone else is benchmarking against. They make it a better harness around the model.

The Eval Numbers

I’m pasting the headline row of the public eval table because it’s the cleanest summary of why Dirac shipped to HN’s front page:

Agent	Tasks Correct	Avg Cost
Cline	5/8	$0.49
Kilo	5/8	$0.73
Ohmypi	6/8	$0.51
Opencode	8/8	$0.44
Pimono	6/8	$0.38
Roo	6/8	$0.60
Dirac	8/8	$0.18

Two agents got every task right; Dirac got there for 2.4× less money than the next-best (Opencode) and 2.7× less than Cline, its parent fork. The tasks themselves are real-world refactors of public repos like django/django, microsoft/vscode, and huggingface/transformers, and the diffs are checked into the repo so anyone can audit them.

The TerminalBench 2.0 result is even more interesting because it was achieved without any benchmark-specific tuning or AGENTS.md priming — the same harness ordinary users get scored 65.2%, beating Google’s own internal baseline by 17.6 points on their own model.

Installing Dirac

You have two real options. Pick the CLI for headless work and CI; pick the VS Code extension for everything else.

Option A: CLI (terminal)

# Node.js v20, v22, or v24 — v25 is broken (V8 Turboshaft WASM bug)
npm install -g dirac-cli

# Authenticate interactively
dirac auth

# Or use environment variables for CI
export ANTHROPIC_API_KEY="sk-ant-..."
# or
export GEMINI_API_KEY="..."
# or
export OPENROUTER_API_KEY="sk-or-..."

# Run a task
dirac "Add a /healthz endpoint and a corresponding test"

Option B: VS Code extension

# In VS Code:
1. Open Extensions (Cmd+Shift+X / Ctrl+Shift+X)
2. Search "Dirac"
3. Install the dirac-run.dirac extension
4. Open the Dirac sidebar
5. Pick a provider, paste your API key, start a task

The sidebar UI inherits Cline’s familiar approve/reject workflow — every command, every file write goes through an explicit approval step unless you opt into Yolo mode (-y on the CLI).

Real Code Examples

Example 1: Plan mode before a multi-file refactor

The CLI’s -p flag runs Plan Mode — Dirac analyzes the codebase and proposes a strategy before touching any files. This alone saves a meaningful amount of money on bigger tasks.

dirac -p "Migrate the auth middleware from Express to Fastify, \
  preserving all existing routes and tests"

You’ll get back a numbered plan covering which files will change, which dependencies need to be added, and what the test strategy is. Approve, edit, or reject — then run again without -p to execute.

Example 2: Pipe a diff in for review

This is genuinely useful and not many other agent CLIs handle it well:

git diff main...feature/payments | dirac \
  "Review these changes for security issues, focusing on input validation \
   and any new SQL queries"

Dirac reads the diff from stdin, opens whichever surrounding files it needs for context (using its tools, not by stuffing everything into the prompt), and produces a review. Cost on a ~600-line diff in my testing: about $0.04 with gemini-3-flash-preview.

Example 3: Use any OpenAI-compatible endpoint

Want to run Dirac against DeepSeek V4, a self-hosted vLLM, or OpenRouter? It supports custom endpoints out of the box:

export OPENAI_API_BASE="sk-..."
export CUSTOM_HEADERS="Authorization=Bearer YOUR-TOKEN"

dirac "Add proper error handling to the payment webhook" \
  --provider "https://api.deepseek.com/v1" \
  --model "deepseek-v4-pro"

This is the killer feature for cost-conscious teams: pair Dirac’s harness with DeepSeek’s pricing and you’re operating at single-digit cents per task.

Example 4: Use existing AGENTS.md and Claude skills

Dirac picks up project-specific instructions from AGENTS.md automatically, and it also reads from .ai/, .claude/, and .agents/ directories. If you’re already running Claude Code with skills configured, those carry over with zero extra setup:

my-project/
├── AGENTS.md          ← read by Dirac
├── .claude/
│   └── skills/
│       └── api-conventions/
│           └── SKILL.md   ← also read by Dirac
└── src/

This makes switching between Claude Code and Dirac essentially free — you don’t lose your project conventions when you change harnesses.

Community Reactions

The HN thread is worth reading in full. Three threads dominated:

On the harness vs. model question. Author GodelNumbering (Max Trivedi) clarified Dirac is pure harness, not a fine-tune: “Dirac is Cline’s heavily modified fork. It supports all models Cline supported, including Qwen.” That’s what makes the result so interesting — the same model gets dramatically better outcomes through better tooling.

On the “no MCP” decision. Several commenters pushed back on the deliberate choice to skip Model Context Protocol support. Dirac’s reasoning: every MCP server adds tool-call latency and burns context budget describing tool schemas, both of which fight the project’s core thesis. Whether this ages well as the MCP ecosystem matures is an open question — but for a benchmark-topping agent today, the trade clearly works.

On grep vs. semantic search. A sub-thread debated whether grep is enough for code search. embedding-shape nailed the failure mode: “Projects where the core concepts are generic names like ‘Tree’, ‘Node’ or other things that are used everywhere, tends to be short of impossible to search with grep.” Dirac sidesteps this with AST-aware navigation rather than embedding-based search — a third path I haven’t seen many agents take.

Honest Limitations

It would be content-marketing slop to write this without listing what’s actually rough about Dirac:

Node.js v25 is broken. There’s an upstream V8 Turboshaft WASM bug that crashes the CLI. You need v20, v22, or v24 — fine if you have nvm, mildly annoying otherwise.
No MCP. If your workflow already depends on MCP servers (filesystem, GitHub, Postgres tools, etc.), you’ll need to either give them up or stay on Cline. The CLI compensates with built-in tools, but it’s a real trade.
Eval suite is small (8 tasks). 100% accuracy on 8 hand-picked refactoring tasks isn’t the same as 100% in the wild. TerminalBench is more rigorous, but neither guarantees your codebase will see the same gains.
Cost numbers depend on prompt caching. A bug in the parent Cline repo around cache-read pricing (issue #10314) caused both Cline and Dirac to slightly underreport costs. The 64.8% gap holds, but absolute dollar figures will tick up after the pending PR is merged.
Documentation is thin. README is solid; tutorials and advanced configuration docs are still catching up. Expect to read source for anything past the happy path.
It’s young. Repository activity started in April 2026, the team is small (one author + LinkedIn-listed contributors at Dirac Delta Labs), and there are 8 open issues at time of writing. Treat it like beta software.

How It Compares

Feature	Dirac	Cline	OpenCode	Junie CLI
Open source	✅ Apache-2.0	✅ Apache-2.0	✅ MIT	❌ Closed
TerminalBench 2.0	65.2%	~50%	—	64.3%
Avg eval cost (refactor)	$0.18	$0.49	$0.44	n/a
Hash-anchored edits	✅	❌	❌	❌
AST-native operations	✅	partial	partial	✅
Multi-file batching	✅	❌	partial	✅
MCP support	❌ deliberate	✅	✅	✅
VS Code extension	✅	✅	❌ (TUI)	❌
Standalone CLI	✅	✅	✅	✅
Native tool calling required	✅	❌	❌	✅

If you’ve already invested in Cline workflows, Dirac is the easiest upgrade path — your settings, providers, and AGENTS.md files all carry over. If you’re picking from scratch, Dirac and OpenCode are the two OSS agents I’d actually trust on real refactoring work today.

Who Should Use Dirac?

Yes, try it if you:

Run multi-file refactors regularly and care about token spend
Already use Cline and feel its cost climbing on bigger tasks
Want a benchmark-topping agent without paying for Junie or Cursor
Have an existing AGENTS.md or Claude skills setup you don’t want to throw away
Run agents in CI where deterministic editing matters

Skip it (for now) if you:

Depend heavily on MCP tools you can’t replace
Need first-class support for a niche language Dirac’s AST tooling doesn’t cover (currently TypeScript, Python, C++, with more landing)
Want a polished commercial product with SLAs — this is OSS beta software
Run on Node.js v25 and don’t want to downgrade

FAQ

Is Dirac really faster than Claude Code or Cursor?

Speed isn’t the headline claim — cost is. Dirac is roughly 2–3× cheaper than Cline on equivalent tasks because it sends fewer, smaller LLM round-trips. Wall-clock latency is similar to other Cline-family agents because most of the time is spent waiting on the model itself.

Can I use Dirac with Claude Sonnet 4.5 or GPT-5?

Yes. Dirac requires native tool calling, which both Claude Sonnet 4.5 and recent OpenAI models support. Set ANTHROPIC_API_KEY or OPENAI_API_KEY and pick the model in the sidebar or with --model. The published TerminalBench score uses gemini-3-flash-preview because it’s a good balance of price and tool-calling reliability, but the harness is model-agnostic.

Why does Dirac refuse to support MCP?

The project explicitly chose native tool calling only for reliability and performance. Each MCP server introduces additional protocol latency and consumes context window space describing its tool schema, both of which conflict with Dirac’s “tightly curated context” thesis. If MCP is non-negotiable for your stack, stay on Cline or use both side-by-side.

Is Dirac free, or is there a hosted version?

The agent itself is free and open source (Apache-2.0). You pay your model provider directly — Anthropic, OpenAI, Google, OpenRouter, DeepSeek, your self-hosted vLLM, whatever. The site at dirac.run is documentation and marketing; there’s no managed cloud product gating any features today.

Does Dirac work with local models like Qwen or DeepSeek-Coder?

Yes — but with a caveat. Any OpenAI-compatible endpoint works (vLLM, Ollama with the OpenAI proxy, LM Studio), so Qwen3-Coder or DeepSeek-Coder-V2 plug right in. The author noted on HN that “the slow inference speeds are causing tasks to timeout” when running TerminalBench against local OSS models — so for serious local-model use you’ll want fast hardware (Mac Studio M5, 4090+ rigs) or accept partial benchmark coverage.

Should I migrate from Cline to Dirac?

If your workflow doesn’t depend on MCP, try it for a week on real tasks and watch your bill. The migration is essentially zero cost — same UX, same providers, same AGENTS.md format. You can run both extensions side by side in VS Code while you decide. For most teams I’ve spoken to, the cost delta alone justifies the switch within a single billing cycle.

Bottom Line

Dirac is the most concrete demonstration I’ve seen this year that the harness is still where most of the agent quality lives. Same model, same problems, same budget — different tooling, 2.7× cheaper and 17.6 points better on a public benchmark.

The “no MCP” stance will be polarizing and the project is young, but the eval numbers are reproducible and the source is open. If you write code with an AI agent every day, Dirac is the cheapest experiment you’ll run this month.

Star the repo: github.com/dirac-run/dirac · Try the CLI: npm install -g dirac-cli · Read the HN launch: news.ycombinator.com/item?id=47920787