TL;DR
Claude Context is an open-source MCP plugin from Zilliz (the team behind Milvus) that gives AI coding agents semantic search across your entire codebase. It’s currently one of the hottest repos on GitHub Trending — 9,977 stars total with 3,725 stars gained this week, putting it firmly in the top of the weekly chart. Highlights:
- MCP-first — works with Claude Code, Cursor, Codex CLI, Gemini CLI, Qwen Code, Cline, Roo Code, Windsurf, Augment, and Claude Desktop
- Hybrid search combines BM25 (keyword) + dense vector embeddings for better recall than either alone
- AST-aware chunking splits TypeScript, JavaScript, Python, Java, Go, Rust, C++, C#, PHP, Ruby, Swift, Kotlin, Scala, and Markdown along syntax boundaries
- Incremental indexing via Merkle trees — only re-embeds files that actually changed
- ~40% token reduction in Zilliz’s own evaluation while matching retrieval quality
- Pluggable embeddings — OpenAI, VoyageAI, Gemini, or local Ollama
- Pluggable vector store — self-hosted Milvus or managed Zilliz Cloud
- MIT licensed, runs as a single
npx @zilliz/claude-context-mcp@latestcommand
If you’ve been frustrated with Claude Code or Cursor losing the plot in a 200-file repo, Claude Context is the most legitimate fix that’s emerged from the MCP ecosystem this year.
Quick Reference
| Repository | github.com/zilliztech/claude-context |
| License | MIT |
| Language | TypeScript |
| Vendor | Zilliz (Milvus team) |
| Stars | 9,977 (+3,725 this week) |
| Install | npx @zilliz/claude-context-mcp@latest |
| NPM | @zilliz/claude-context-mcp, @zilliz/claude-context-core |
| VS Code Extension | ”Semantic Code Search” by zilliz |
| Requires | Node ≥ 20, OpenAI/Voyage/Gemini/Ollama key, Milvus or Zilliz Cloud |
What Is Claude Context?
Most AI coding agents have the same architectural blind spot: when you ask a question about your codebase, they either dump entire directories into the context window (expensive, slow, runs out of tokens fast) or rely on file globs and grep (cheap, but blind to anything that isn’t a literal string match).
Claude Context fixes this by indexing your repo into a vector database — chunked along AST boundaries, embedded with a model of your choice — and exposing that index through the Model Context Protocol (MCP). Once it’s installed, your agent can call a search_code tool with a natural-language query like “find functions that handle JWT refresh” and get back semantically relevant code from across millions of lines without ever pulling the full files into context.
It’s basically Cursor’s “Codebase Index” feature, except open source, model-agnostic, and usable from any MCP-compatible client. That last part matters a lot in 2026 — Claude Code, Codex CLI, Gemini CLI, Cursor, Windsurf, and Cline all speak MCP now, but each ships its own (closed) indexer. Claude Context is the first credible attempt to standardize that layer.
Why It’s Trending Now
Three reasons it’s exploded in late April 2026:
- The “context window math” problem got worse, not better. As Claude Sonnet 4.5, GPT-5.1, and Gemini 3 pushed effective context to 1M+ tokens, cost per request exploded. Loading a 200K-line monorepo into context for every question is technically possible and economically insane. Claude Context’s evaluation shows ~40% token reduction at equal retrieval quality — that translates to real money.
- MCP became the dominant agent integration protocol. With Anthropic, OpenAI, Google, Cursor, and Windsurf all shipping MCP support in Q1 2026, a single MCP server now reaches every major coding agent. Claude Context shipped first-class configs for 13+ clients in its README, which is why it’s being adopted faster than competing indexers.
- Zilliz reputation. Milvus is the most-deployed open-source vector database in the world. When the people who built Milvus ship an MCP plugin, the AI infra crowd takes it seriously.
Key Features
1. Hybrid Search (BM25 + Dense Vectors)
Pure semantic search misses exact identifiers. Pure keyword search misses paraphrases. Claude Context does both at the same time and merges results — a technique that’s been state-of-the-art in retrieval research for years but is still rare in code-search MCPs.
2. AST-Based Chunking
Most RAG-for-code systems chunk files into fixed-size character windows, which routinely splits a function in half. Claude Context parses each file with a tree-sitter-style AST splitter and only chunks on syntactic boundaries — function bodies, class definitions, top-level blocks. When the AST parser fails (weird grammars, partial files), it falls back to LangChain’s character splitter so nothing breaks.
3. Incremental Indexing via Merkle Trees
Re-indexing a 1M-line monorepo on every commit is unworkable. Claude Context maintains a Merkle tree over your file hashes and only re-embeds files whose hash changed since last index. In practice, a typical commit triggers re-embedding for 1–5 files instead of thousands.
4. Pluggable Embeddings & Vector Stores
You’re not locked into OpenAI or Zilliz Cloud:
- Embeddings: OpenAI (
text-embedding-3-small/large), VoyageAI (voyage-code-3, optimized for code), Gemini, or local Ollama - Vector store: Self-hosted Milvus (free, runs in Docker) or Zilliz Cloud (managed, has a free tier)
This is huge for privacy-sensitive shops — you can run the entire stack on-prem with Ollama embeddings + self-hosted Milvus and never send a line of code to a third party.
5. MCP Tools Exposed
Once running, the server exposes four MCP tools:
index_codebase— index a directorysearch_code— hybrid search over the indexclear_index— wipe a codebase’s indexget_indexing_status— progress / completion check
Your agent calls them just like any other MCP tool.
Installation: Real Commands
1. Get a vector database
The fastest path is Zilliz Cloud’s free tier — sign up at cloud.zilliz.com, create a serverless cluster, and copy the public endpoint + API token.
If you’d rather self-host, run Milvus locally:
docker run -d --name milvus \
-p 19530:19530 -p 9091:9091 \
milvusdb/milvus:latest standalone
2. Install in Claude Code
claude mcp add claude-context \
-e OPENAI_API_KEY=sk-your-openai-api-key \
-e MILVUS_ADDRESS=your-zilliz-cloud-public-endpoint \
-e MILVUS_TOKEN=your-zilliz-cloud-api-key \
-- npx @zilliz/claude-context-mcp@latest
3. Install in Codex CLI
Edit ~/.codex/config.toml:
[mcp_servers.claude-context]
command = "npx"
args = ["@zilliz/claude-context-mcp@latest"]
env = { OPENAI_API_KEY = "sk-...", MILVUS_TOKEN = "your-zilliz-token" }
startup_timeout_ms = 20000
4. Install in Cursor
Add to ~/.cursor/mcp.json:
{
"mcpServers": {
"claude-context": {
"command": "npx",
"args": ["-y", "@zilliz/claude-context-mcp@latest"],
"env": {
"OPENAI_API_KEY": "sk-...",
"MILVUS_ADDRESS": "https://in03-xxx.api.gcp-us-west1.zillizcloud.com",
"MILVUS_TOKEN": "your-zilliz-token"
}
}
}
}
5. First Run
In Claude Code or any client:
> Index this codebase
[Claude Context indexes ~/your-project]
> Find functions that handle user authentication
[Returns semantically relevant code chunks across the repo]
> Check the indexing status
[Reports % complete, last indexed file]
The first index pass is the slow part — for a 100K-line repo with OpenAI embeddings, expect 3–10 minutes and a one-time cost of a few cents to a few dollars depending on your embedding model. Subsequent commits re-index in seconds thanks to Merkle-tree diffing.
Architecture: How It Works
The repo is a TypeScript monorepo with three packages:
@zilliz/claude-context-core— language-agnostic indexing engine. Walks the file tree, runs the AST splitter, calls the embedding provider, writes vectors to Milvus, maintains the Merkle tree.@zilliz/claude-context-mcp— thin MCP server that exposesindex_codebase,search_code,clear_index, andget_indexing_statusover stdio.- VS Code extension (“Semantic Code Search”) — same core, different UI; gives you a search box inside the editor.
A query flows like this: agent calls search_code("auth flow") → MCP server embeds the query with the same model used for indexing → Milvus runs hybrid BM25 + dense search → top-K chunks come back with file paths, line ranges, and snippets → agent inlines those snippets into its working context. The full files never enter the LLM context unless the agent explicitly opens them afterward.
Real-World Use Cases (From the Community)
- Onboarding to a giant repo. “My first day at the new job, I dropped Claude Context on our 800K-line Python monolith and started asking ‘how does X work?’ questions. Got useful answers in minutes instead of weeks.” — Reddit r/ClaudeAI
- Refactoring sweeps. Find every place a deprecated API is used semantically (not just textually), even when callers wrap or alias it.
- Cross-repo bug triage. Index a microservices monorepo, ask “where does the order ID get re-validated after payment?” and get hits across 6 services.
- Reducing Claude Code’s bill. Teams report 30–50% lower API spend on long sessions because the agent stops dumping whole directories into context.
- Replacing Cursor’s “Codebase Index” while keeping the rest of the Cursor workflow — useful for orgs that don’t want a managed vendor indexing their code.
First Impressions From the Community
“Basically using the tool can achieve ~40% reduction in token usage in addition to some quality gain in complex problems.” — Zilliz, in their published evaluation, cross-posted on r/ClaudeAI
“Context7 complements codebase retrieval by supplying version-specific library documentation alongside results from semantic code search.” — Augment Code, recommending it in their MCP directory
“The MCP server can be integrated with any MCP-compatible client by running
npx @zilliz/claude-context-mcp@latest.” — official docs; the install simplicity is genuinely the headline feature for most users
The dominant sentiment on Hacker News and r/LocalLLaMA threads is that it’s the first MCP indexer that actually feels production-ready. The most common gripe is the dependency on a vector database — people want a fully local, single-binary version. (See “Limitations” below.)
Honest Limitations
Claude Context is impressive, but it’s not magic and the docs gloss over a few things:
- You need a vector database. Even with self-hosted Milvus, that’s a new piece of infrastructure to run. There’s no SQLite-style zero-dependency mode (yet).
- Embedding cost is real on first index. A 1M-line repo with
text-embedding-3-largecan cost $5–$20 to index initially. Usetext-embedding-3-smallor VoyageAI’s code-tuned model to cut that 5–10x. - No re-ranking by default. Hybrid search is good, but a cross-encoder re-ranker on the top-50 hits would push quality higher. Not yet built in.
- AST splitter coverage is uneven. Top-tier languages (TS, JS, Python, Go, Rust, Java) work great. Edge cases — Elixir, Clojure, Solidity — fall back to character splitting, which hurts recall.
- Multi-repo indexing is manual. You can index multiple codebases, but there’s no first-class “workspace” concept that searches across them in one query.
- OpenAI key required by default. Yes, you can swap to Ollama, but the README leads with OpenAI and a lot of users miss the local option.
Who Should Use This (And Who Shouldn’t)
Use Claude Context if:
- You work in a 50K+ line repo and your AI agent keeps “forgetting” parts of it
- You’re already paying Claude/OpenAI/Gemini API bills and want to cut tokens
- You want one indexer that works across Claude Code, Cursor, Codex, and Gemini CLI
- You’re comfortable running (or paying for) a vector DB
Skip it if:
- Your repo is under 5K lines —
grepandReadtool calls are still fine - You can’t run Milvus and don’t want a managed vendor in the loop
- You need on-prem with no managed services and don’t want Docker — there’s no SQLite-only path yet
Comparison With Alternatives
| Tool | Open source | MCP | Hybrid search | AST chunking | Local-only option | Multi-client |
|---|---|---|---|---|---|---|
| Claude Context | ✅ MIT | ✅ | ✅ | ✅ | ⚠️ (needs Milvus + Ollama) | ✅ 13+ clients |
| Cursor Codebase Index | ❌ | ❌ | ✅ | ✅ | ❌ | ❌ Cursor only |
| Sourcegraph Cody | Partial | ❌ | ✅ | ✅ | ✅ enterprise | ❌ |
| Aider repo-map | ✅ | ❌ | ❌ (graph-based) | ✅ | ✅ | ❌ Aider only |
| Continue @codebase | ✅ Apache | ❌ | ✅ | ✅ | ✅ LanceDB | ❌ Continue only |
| Greptile | ❌ SaaS | ✅ | ✅ | ✅ | ❌ | ✅ |
The unique slot Claude Context fills is open-source + MCP + multi-client. Cursor’s index is best-in-class but only works in Cursor. Continue’s index is good but only in Continue. Greptile works across clients but is closed SaaS. Claude Context is the only tool that’s all three at once.
FAQ
Does Claude Context work with Cursor, Codex CLI, and Gemini CLI, or only Claude Code?
It works with 13+ MCP-compatible clients: Claude Code, Cursor, Codex CLI, Gemini CLI, Qwen Code, Windsurf, Cline, Roo Code, Augment, Zencoder, Claude Desktop, Void, and Cherry Studio. The README has copy-paste configs for each. The “Claude” in the name is misleading — it’s a generic MCP server.
How much does it cost to run on a 100K-line codebase?
Initial indexing with text-embedding-3-small is roughly $0.02 per 1M tokens of code, so ~$1–$3 for a 100K-line repo. Storage on Zilliz Cloud’s free tier covers most personal projects. With self-hosted Milvus + Ollama embeddings (nomic-embed-text), the marginal cost is zero.
Can I run Claude Context fully offline / on-prem?
Yes. Use Ollama for embeddings (set EMBEDDING_PROVIDER=ollama) and self-hosted Milvus in Docker for the vector store. No code or queries leave your machine. The README hides this option a few clicks deep but it’s fully supported.
How is this different from Cursor’s built-in codebase indexing?
Cursor’s index is closed source, lives only inside Cursor, and uses Cursor’s hosted infrastructure. Claude Context is MIT-licensed, MCP-based, and works across 13+ clients with your choice of embedding model and vector store. If you’re a Cursor-only shop, the built-in index is fine. If you switch between Claude Code, Cursor, and Codex, Claude Context gives you one consistent index across all three.
Does it actually save tokens, or is that marketing?
Zilliz published its evaluation methodology and results — controlled benchmarks showing ~40% token reduction at equivalent retrieval quality. The evaluation set is in the repo so you can reproduce it. Real-world reports on r/ClaudeAI confirm 30–50% reductions in long coding sessions.
Is it safe to use on private/proprietary code?
If you use OpenAI / VoyageAI / Gemini embeddings, your code chunks are sent to those providers’ embedding endpoints. They typically don’t train on API data, but check the terms for your account tier. For full isolation, switch to Ollama embeddings + self-hosted Milvus — nothing leaves your network.
Bottom Line
Claude Context is the first MCP-native code-search tool that feels production-ready. The ~40% token reduction is real, the AST-aware chunking beats fixed-size windows, and the 13-client compatibility matrix means you set it up once regardless of which agent you use this week. The 3,725 stars in a single week suggest a lot of developers agree.
If you’re already running Claude Code, Cursor, or Codex CLI on a serious codebase, Claude Context is worth 30 minutes of setup tonight.
Repo: github.com/zilliztech/claude-context — MIT licensed, ~10K stars and climbing fast.