CodeGraph Review: Pre-Indexed Knowledge Graph for AI Agents

TL;DR

CodeGraph is an open-source MCP server from Colby McHenry that gives AI coding agents a pre-indexed, AST-based knowledge graph of your codebase. It’s currently the #2 repo on GitHub Trending this week — 31,090 stars total, 21,424 gained in seven days. Highlights:

MCP-native — auto-configures Claude Code, Cursor, Codex CLI, opencode, Hermes Agent, Gemini CLI, Antigravity, and Kiro
Tree-sitter AST extraction across 20+ languages — symbols, call graphs, import chains, and references stored in local SQLite
No embeddings, no vector DB, no API keys — pure structural graph + FTS5 full-text search
Auto-syncing via native FSEvents/inotify with debounced re-indexing
35% cheaper, 57% fewer tokens, 46% faster, 71% fewer tool calls in published median-of-4 benchmarks on Claude Opus 4.7 across 7 real codebases
Framework-aware routing for 14 web frameworks (Django, Flask, FastAPI, Express, NestJS, Laravel, Rails, Spring, ASP.NET, etc.)
Cross-language bridging for Swift ↔ ObjC and React Native (legacy bridge, TurboModules, Fabric, Expo)
Apache 2.0, bundles its own runtime, one-command install on macOS, Linux, or Windows

If Claude Context is the vector-DB answer to “stop my AI agent from grepping the same files 50 times,” CodeGraph is the structural answer — and the two are looking like complementary halves of the same problem.

Quick Reference


Repository	github.com/colbymchenry/codegraph
License	Apache 2.0
Language	TypeScript
Author	Colby McHenry
Stars	31,090 (+21,424 this week)
Install	`curl -fsSL https://raw.githubusercontent.com/colbymchenry/codegraph/main/install.sh \| sh`
NPM	`@colbymchenry/codegraph` (also works via `npx`)
Storage	Local SQLite, FTS5 full-text search
Requires	Nothing (bundled runtime) — or Node ≥ 18 for npm install

What Is CodeGraph?

When Claude Code answers an architecture question — “how does X talk to Y here?” — it doesn’t actually know your codebase. It launches Explore sub-agents that fan out across grep, glob, and Read, follow imports, re-read the same files, and spend most of their token budget on discovery before they can answer.

CodeGraph attacks that discovery cost by pre-computing what those sub-agents would otherwise have to learn from scratch. It parses your repo with tree-sitter, extracts every symbol, call site, import, and reference, and stores them as nodes and edges in a local SQLite database. An MCP server exposes that graph through three primary tools: codegraph_context, codegraph_explore, and codegraph_status.

No embeddings, no API keys, no Docker, no vector store. The SQLite file lives in .codegraph/. A native filesystem watcher debounces edits and re-indexes only what moved.

The pitch: the agent already pays for tool calls — make those tool calls answer with structure instead of bytes.

Three things converged this week:

1. Reproducible benchmarks. The README’s headline numbers — 35% cheaper, 57% fewer tokens, 46% faster, 71% fewer tool calls — are eye-catching, but what convinced Hacker News was the methodology: claude -p Opus 4.7 headless, --strict-mcp-config, 4 runs per arm, median reported, raw per-repo numbers published. Reproducible, not marketing.

2. Structural counter-pitch to vector search. A month after Claude Context brought BM25 + embeddings to MCP, CodeGraph argues you don’t need either. Symbol graphs are deterministic, lossless, and don’t drift when you rename a function. For “who calls processOrder?”, a graph answers in one query. Embeddings have to guess.

3. Zero infrastructure. No vector DB, no embedding API, no API keys. codegraph init -i and the MCP server wires into every coding agent on your machine.

How It Works

CodeGraph is three layers stacked on top of tree-sitter:

Parse layer. Tree-sitter grammars produce ASTs for 20+ languages — TypeScript, JavaScript, Python, Go, Rust, Java, C#, PHP, Ruby, C, C++, Objective-C, Swift, Kotlin, Dart, Lua, Luau, Svelte, Liquid, and Pascal/Delphi. CodeGraph walks each AST and extracts symbol declarations, references, imports, exports, and call sites.

Graph layer. Symbols become nodes; calls, references, and imports become edges. Stored in SQLite with an FTS5 virtual table for name search. On top of that, CodeGraph layers two enrichments:

Framework-aware route recognition for 14 web frameworks. Django path(), FastAPI @router.get(...), Express router.post(...), NestJS @Controller + @Get, Rails get '/x', to: 'users#index', Spring @GetMapping, ASP.NET [HttpGet] — each becomes a route node linked to its handler. “Who handles /api/orders?” now jumps straight to the controller.
Cross-language bridging for iOS / React Native / Expo. Swift ↔ ObjC @objc auto-bridging, JS NativeModules.X.fn(...) linked to ObjC RCT_EXPORT_METHOD or Java/Kotlin @ReactMethod, Fabric components, TurboModule specs, native → JS event emitters, and Expo’s Module { Name("X"); AsyncFunction("fn") } DSL. The kind of thing static parsers normally drop on the floor.

MCP layer. A Node server speaks MCP and exposes three primary tools: codegraph_context(area) (entry points + related symbols), codegraph_explore(symbol) (full source plus immediate neighbors), and codegraph_status (pending edits, freshness banner).

The Benchmark, In Detail

The README’s benchmark is the most-discussed part of the project. Here’s the raw shape (medians of 4 runs per arm, Claude Opus 4.7 headless, --strict-mcp-config):

Codebase	Language · Files	Cost WITH → WITHOUT	Tokens	Time	Tool calls
VS Code	TS · ~10k	$0.60 → $0.80	601k → 2.8M	1m 10s → 2m 26s	8 → 55
Excalidraw	TS · ~640	$0.43 → $0.90	344k → 3.5M	48s → 2m 58s	3 → 79
Django	Py · ~3k	$0.59 → $0.67	739k → 1.2M	1m 19s → 1m 38s	9 → 19
Tokio	Rust · ~790	$0.42 → $2.41	379k → 2.6M	53s → 3m 2s	4 → 53
OkHttp	Java · ~645	$0.47 → $0.47	636k → 730k	42s → 1m 1s	6 → 11
Gin	Go · ~110	$0.37 → $0.47	444k → 675k	44s → 1m 0s	6 → 10
Alamofire	Swift · ~110	$0.61 → $1.14	1.0M → 2.8M	1m 17s → 2m 27s	12 → 69

Three things stand out:

Gains scale with codebase size. On VS Code (~10k files) the no-CodeGraph arm needs 55 tool calls and reads 2.8M tokens. On Gin (~110 files), native grep is already cheap and CodeGraph’s edge collapses to 21% cheaper. ROI is real around the 1k-file mark, dramatic above 5k.
Tool calls drop harder than tokens. On Excalidraw it’s 3 vs 79 — a 96% reduction. The WITHOUT arm spawns Explore sub-agents that themselves read files, multiplying calls. CodeGraph short-circuits the tree at the parent.
OkHttp is the honest outlier. 2% cheaper, tokens barely moved. Its query hits a small, localized part of the code where grep was already efficient. Not every question rewards a graph.

The author’s own caveat is healthy: CodeGraph only helps when queried directly — if the parent agent delegates exploration to a file-reading sub-agent, the graph never gets called and becomes overhead. The system prompt shim matters as much as the index.

Getting Started

The installer is one command:

# macOS / Linux
curl -fsSL https://raw.githubusercontent.com/colbymchenry/codegraph/main/install.sh | sh

# Windows PowerShell
irm https://raw.githubusercontent.com/colbymchenry/codegraph/main/install.ps1 | iex

# Or via npm
npx @colbymchenry/codegraph

Then, in your project root:

cd your-project
codegraph init -i

The -i flag launches an interactive installer that detects every coding agent on your system — Claude Code, Cursor, Codex CLI, opencode, Hermes Agent, Gemini CLI, Antigravity, Kiro — and writes the MCP config and instruction shim into each one you select. No manual JSON editing.

Open Claude Code and start asking architecture questions. The first session triggers the initial index (seconds for small repos, minutes for VS Code-scale). The filesystem watcher keeps it current after that.

To uninstall: codegraph uninstall strips MCP config from every agent it touched; codegraph uninit removes .codegraph/ from the project. Cleanest uninstall story in the MCP ecosystem.

Real-World Use Cases

Onboarding to a large codebase. Drop CodeGraph on a 100k-line monorepo, ask “how does authentication flow work end to end?” — get a routed answer hitting middleware, the JWT verifier, and the user store in one tool call.
Refactor impact analysis. “What breaks if I change the signature of processPayment?” One codegraph_explore call returns every caller and callee.
Cross-language iOS apps. Swift ↔ ObjC and React Native bridge support means “where does this JS prop end up on the native side?” actually resolves across the boundary.
Cutting Claude/OpenAI API spend. Reddit reports in r/ClaudeCode put the saving at 30–50% on long sessions, consistent with the README’s 35% median.
Auto-fresh long sessions. The file watcher debounces and re-indexes, so multi-hour agent sessions don’t drift from the working tree.

First Impressions From the Community

Reception on Hacker News and r/ClaudeCode this week has been warm — partly the reproducible benchmark, partly the painless install:

“The MCP config auto-writes into every agent on your machine in one shot. codegraph init -i and Claude Code suddenly stops grepping.” — r/ClaudeCode

“Symbol graph beats embeddings for ‘who calls this?’ questions. Embeddings are fuzzy by design. CodeGraph just knows.” — Hacker News

“On a 200k-line legacy Java service we cut Claude Code’s average session cost from $4 to $1.50.” — Reddit testimonial

The common gripe is the converse: for fuzzy semantic questions (“find the place that probably handles edge cases in checkout”), a symbol graph isn’t as good as a vector store. Several commenters already run CodeGraph and Claude Context side by side.

Honest Limitations

CodeGraph is impressive, but worth knowing before you bet on it:

Symbol graphs don’t help with fuzzy questions. If you don’t know what the symbol is called, the graph can’t find it. Vector search degrades more gracefully here.
First-index time scales with repo size. A 10k-file TS repo takes a couple of minutes to parse. Incremental after that, but the initial wait is real.
Tree-sitter coverage varies. Top-tier languages (TS, JS, Python, Go, Rust, Java) are excellent. Pascal/Delphi and Liquid work but with thinner symbol coverage. Anything outside the 20+ list falls back to FTS5 text search.
Benchmark is one question per repo. Real sessions ask many questions; some graph queries handle worse than grep. Median field cost lands closer to the lower half of the table than the headline average.
No multi-repo workspace yet. Index lives per project. Microservices repos mean multiple .codegraph/ directories with no cross-repo query.

Who Should Use This (And Who Shouldn’t)

Use CodeGraph if:

You work on a 1k+ file codebase and your AI agent burns tokens on discovery
You want zero infrastructure — no embedding API, no vector DB, no Docker
You hop between Claude Code, Cursor, and Codex CLI and want one index across all of them
You work on iOS / React Native and lose context at the bridge

Skip CodeGraph if:

Your repo is under ~300 files (native grep is fast enough)
Your questions are mostly semantic (“find the place that handles X” without knowing the symbol name) — Claude Context fits better
You can’t run a local SQLite file or are in a sandboxed environment with no filesystem watcher (CODEGRAPH_NO_DAEMON=1 works, but you’ll need manual codegraph sync)

CodeGraph vs. Claude Context vs. Other Indexers

The MCP code-search space has crystallized into two distinct approaches:

Tool	Approach	Storage	Local-only	MCP	Multi-client	Best for
CodeGraph	AST symbol graph	SQLite	✅ always	✅	✅ 8+	Structural questions, refactors
Claude Context	Hybrid BM25 + embeddings	Milvus / Zilliz	⚠️ via Ollama	✅	✅ 13+	Semantic questions, vague queries
Cursor Codebase Index	Embeddings	Cursor cloud	❌	❌	❌ Cursor only	Cursor users
Aider repo-map	Tree-sitter graph	In-memory	✅	❌	❌ Aider only	Aider users
Sourcegraph Cody	Hybrid + graph	Sourcegraph	✅ enterprise	❌	❌	Enterprise
Continue @codebase	Embeddings	LanceDB	✅	❌	❌ Continue only	Continue users

Symbol graphs vs. embeddings is not a winner-take-all fight. The two answer different question shapes. CodeGraph nails “who calls X?”, “what’s the route for /api/orders?”, “what breaks if I rename Y?”. Claude Context nails “find the place that handles the corner case where users have two emails.” Several commenters this week are running both — the graph as the structural source of truth, the vector store for fuzzy recall.

If you only have time to add one MCP server this week and your codebase is over 1k files: CodeGraph is the lower-friction install (no API keys, no Docker) and lands the bigger token reduction on architecture questions, which is what most agents waste budget on.

FAQ

Does CodeGraph work with Cursor and Codex CLI, or only Claude Code?

It auto-configures eight clients: Claude Code, Cursor, Codex CLI, opencode, Hermes Agent, Gemini CLI, Antigravity IDE, and Kiro. The interactive installer (codegraph init -i) detects which are present and lets you choose. The MCP server itself is client-agnostic — anything that speaks MCP can connect.

How does CodeGraph compare to Claude Context (Zilliz’s MCP indexer)?

CodeGraph uses a tree-sitter symbol graph in local SQLite. Claude Context uses BM25 + dense embeddings in Milvus. CodeGraph wins on structural questions (“who calls X?”, “what’s the route for Y?”) and zero-infrastructure setup. Claude Context wins on fuzzy semantic questions and recall when you don’t know the symbol name. They’re complementary, and several teams run both.

Is CodeGraph really 100% local?

Yes. No API keys, no embeddings, no external services. The graph is a SQLite database in .codegraph/ inside your project. The MCP server runs as a local Node process. Nothing leaves your machine.

Do I need Node.js installed?

No. The native installer (install.sh / install.ps1) bundles its own runtime. If you already have Node, npx @colbymchenry/codegraph works too — both paths land at the same binary.

How does the auto-sync work? Do I need to run `codegraph sync` manually?

You don’t. A native filesystem watcher (FSEvents on macOS, inotify on Linux, ReadDirectoryChangesW on Windows) catches every file change and re-indexes after a 2-second debounce (tunable). On reconnect, the MCP server does a fast (size, mtime) + content-hash reconciliation. Manual codegraph sync only matters in sandboxed environments where the watcher is disabled.

Are the benchmark numbers reproducible?

Yes. The README publishes the methodology (claude -p Opus 4.7 headless, --strict-mcp-config, 4 runs per arm, median reported), the exact query for each of the 7 repos, and the raw WITH → WITHOUT medians per cell. You can clone any of the benchmark repos at --depth 1 and run the same comparison yourself.

Bottom Line

CodeGraph is the strongest pitch yet for symbol graphs as the structural layer beneath AI coding agents. The benchmark is reproducible, the install story is the lowest friction in MCP code search, and the 21,424 stars in seven days suggest a lot of developers had the same thought: I’m tired of watching Claude Code re-grep the same files.

If your repo is over 1,000 files and you’re paying for an agent’s tool calls, CodeGraph likely pays for itself in this week’s Claude bill. Run it alongside Claude Context for fuzzy recall and you have the closest thing to a complete MCP code-intelligence stack that exists today.

Repo: github.com/colbymchenry/codegraph — Apache 2.0 licensed, 31K stars and gaining ~3K/day.