Claude Context Review: Semantic Code Search MCP (2026)

TL;DR

Claude Context is an open-source MCP plugin from Zilliz (the team behind Milvus) that gives AI coding agents semantic search across your entire codebase. It’s currently one of the hottest repos on GitHub Trending — 9,977 stars total with 3,725 stars gained this week, putting it firmly in the top of the weekly chart. Highlights:

MCP-first — works with Claude Code, Cursor, Codex CLI, Gemini CLI, Qwen Code, Cline, Roo Code, Windsurf, Augment, and Claude Desktop
Hybrid search combines BM25 (keyword) + dense vector embeddings for better recall than either alone
AST-aware chunking splits TypeScript, JavaScript, Python, Java, Go, Rust, C++, C#, PHP, Ruby, Swift, Kotlin, Scala, and Markdown along syntax boundaries
Incremental indexing via Merkle trees — only re-embeds files that actually changed
~40% token reduction in Zilliz’s own evaluation while matching retrieval quality
Pluggable embeddings — OpenAI, VoyageAI, Gemini, or local Ollama
Pluggable vector store — self-hosted Milvus or managed Zilliz Cloud
MIT licensed, runs as a single npx @zilliz/claude-context-mcp@latest command

If you’ve been frustrated with Claude Code or Cursor losing the plot in a 200-file repo, Claude Context is the most legitimate fix that’s emerged from the MCP ecosystem this year.

Quick Reference


Repository	github.com/zilliztech/claude-context
License	MIT
Language	TypeScript
Vendor	Zilliz (Milvus team)
Stars	9,977 (+3,725 this week)
Install	`npx @zilliz/claude-context-mcp@latest`
NPM	`@zilliz/claude-context-mcp`, `@zilliz/claude-context-core`
VS Code Extension	”Semantic Code Search” by zilliz
Requires	Node ≥ 20, OpenAI/Voyage/Gemini/Ollama key, Milvus or Zilliz Cloud

What Is Claude Context?

Most AI coding agents have the same architectural blind spot: when you ask a question about your codebase, they either dump entire directories into the context window (expensive, slow, runs out of tokens fast) or rely on file globs and grep (cheap, but blind to anything that isn’t a literal string match).

Claude Context fixes this by indexing your repo into a vector database — chunked along AST boundaries, embedded with a model of your choice — and exposing that index through the Model Context Protocol (MCP). Once it’s installed, your agent can call a search_code tool with a natural-language query like “find functions that handle JWT refresh” and get back semantically relevant code from across millions of lines without ever pulling the full files into context.

It’s basically Cursor’s “Codebase Index” feature, except open source, model-agnostic, and usable from any MCP-compatible client. That last part matters a lot in 2026 — Claude Code, Codex CLI, Gemini CLI, Cursor, Windsurf, and Cline all speak MCP now, but each ships its own (closed) indexer. Claude Context is the first credible attempt to standardize that layer.

Three reasons it’s exploded in late April 2026:

The “context window math” problem got worse, not better. As Claude Sonnet 4.5, GPT-5.1, and Gemini 3 pushed effective context to 1M+ tokens, cost per request exploded. Loading a 200K-line monorepo into context for every question is technically possible and economically insane. Claude Context’s evaluation shows ~40% token reduction at equal retrieval quality — that translates to real money.
MCP became the dominant agent integration protocol. With Anthropic, OpenAI, Google, Cursor, and Windsurf all shipping MCP support in Q1 2026, a single MCP server now reaches every major coding agent. Claude Context shipped first-class configs for 13+ clients in its README, which is why it’s being adopted faster than competing indexers.
Zilliz reputation. Milvus is the most-deployed open-source vector database in the world. When the people who built Milvus ship an MCP plugin, the AI infra crowd takes it seriously.

Key Features

1. Hybrid Search (BM25 + Dense Vectors)

Pure semantic search misses exact identifiers. Pure keyword search misses paraphrases. Claude Context does both at the same time and merges results — a technique that’s been state-of-the-art in retrieval research for years but is still rare in code-search MCPs.

2. AST-Based Chunking

Most RAG-for-code systems chunk files into fixed-size character windows, which routinely splits a function in half. Claude Context parses each file with a tree-sitter-style AST splitter and only chunks on syntactic boundaries — function bodies, class definitions, top-level blocks. When the AST parser fails (weird grammars, partial files), it falls back to LangChain’s character splitter so nothing breaks.

3. Incremental Indexing via Merkle Trees

Re-indexing a 1M-line monorepo on every commit is unworkable. Claude Context maintains a Merkle tree over your file hashes and only re-embeds files whose hash changed since last index. In practice, a typical commit triggers re-embedding for 1–5 files instead of thousands.

4. Pluggable Embeddings & Vector Stores

You’re not locked into OpenAI or Zilliz Cloud:

Embeddings: OpenAI (text-embedding-3-small/large), VoyageAI (voyage-code-3, optimized for code), Gemini, or local Ollama
Vector store: Self-hosted Milvus (free, runs in Docker) or Zilliz Cloud (managed, has a free tier)

This is huge for privacy-sensitive shops — you can run the entire stack on-prem with Ollama embeddings + self-hosted Milvus and never send a line of code to a third party.

5. MCP Tools Exposed

Once running, the server exposes four MCP tools:

index_codebase — index a directory
search_code — hybrid search over the index
clear_index — wipe a codebase’s index
get_indexing_status — progress / completion check

Your agent calls them just like any other MCP tool.

Installation: Real Commands

1. Get a vector database

The fastest path is Zilliz Cloud’s free tier — sign up at cloud.zilliz.com, create a serverless cluster, and copy the public endpoint + API token.

If you’d rather self-host, run Milvus locally:

docker run -d --name milvus \
  -p 19530:19530 -p 9091:9091 \
  milvusdb/milvus:latest standalone

2. Install in Claude Code

claude mcp add claude-context \
  -e OPENAI_API_KEY=sk-your-openai-api-key \
  -e MILVUS_ADDRESS=your-zilliz-cloud-public-endpoint \
  -e MILVUS_TOKEN=your-zilliz-cloud-api-key \
  -- npx @zilliz/claude-context-mcp@latest

3. Install in Codex CLI

Edit ~/.codex/config.toml:

[mcp_servers.claude-context]
command = "npx"
args = ["@zilliz/claude-context-mcp@latest"]
env = { OPENAI_API_KEY = "sk-...", MILVUS_TOKEN = "your-zilliz-token" }
startup_timeout_ms = 20000

4. Install in Cursor

Add to ~/.cursor/mcp.json:

{
  "mcpServers": {
    "claude-context": {
      "command": "npx",
      "args": ["-y", "@zilliz/claude-context-mcp@latest"],
      "env": {
        "OPENAI_API_KEY": "sk-...",
        "MILVUS_ADDRESS": "https://in03-xxx.api.gcp-us-west1.zillizcloud.com",
        "MILVUS_TOKEN": "your-zilliz-token"
      }
    }
  }
}

5. First Run

In Claude Code or any client:

> Index this codebase
[Claude Context indexes ~/your-project]

> Find functions that handle user authentication
[Returns semantically relevant code chunks across the repo]

> Check the indexing status
[Reports % complete, last indexed file]

The first index pass is the slow part — for a 100K-line repo with OpenAI embeddings, expect 3–10 minutes and a one-time cost of a few cents to a few dollars depending on your embedding model. Subsequent commits re-index in seconds thanks to Merkle-tree diffing.

Architecture: How It Works

The repo is a TypeScript monorepo with three packages:

@zilliz/claude-context-core — language-agnostic indexing engine. Walks the file tree, runs the AST splitter, calls the embedding provider, writes vectors to Milvus, maintains the Merkle tree.
@zilliz/claude-context-mcp — thin MCP server that exposes index_codebase, search_code, clear_index, and get_indexing_status over stdio.
VS Code extension (“Semantic Code Search”) — same core, different UI; gives you a search box inside the editor.

A query flows like this: agent calls search_code("auth flow") → MCP server embeds the query with the same model used for indexing → Milvus runs hybrid BM25 + dense search → top-K chunks come back with file paths, line ranges, and snippets → agent inlines those snippets into its working context. The full files never enter the LLM context unless the agent explicitly opens them afterward.

Real-World Use Cases (From the Community)

Onboarding to a giant repo. “My first day at the new job, I dropped Claude Context on our 800K-line Python monolith and started asking ‘how does X work?’ questions. Got useful answers in minutes instead of weeks.” — Reddit r/ClaudeAI
Refactoring sweeps. Find every place a deprecated API is used semantically (not just textually), even when callers wrap or alias it.
Cross-repo bug triage. Index a microservices monorepo, ask “where does the order ID get re-validated after payment?” and get hits across 6 services.
Reducing Claude Code’s bill. Teams report 30–50% lower API spend on long sessions because the agent stops dumping whole directories into context.
Replacing Cursor’s “Codebase Index” while keeping the rest of the Cursor workflow — useful for orgs that don’t want a managed vendor indexing their code.

First Impressions From the Community

“Basically using the tool can achieve ~40% reduction in token usage in addition to some quality gain in complex problems.” — Zilliz, in their published evaluation, cross-posted on r/ClaudeAI

“Context7 complements codebase retrieval by supplying version-specific library documentation alongside results from semantic code search.” — Augment Code, recommending it in their MCP directory

“The MCP server can be integrated with any MCP-compatible client by running npx @zilliz/claude-context-mcp@latest.” — official docs; the install simplicity is genuinely the headline feature for most users

The dominant sentiment on Hacker News and r/LocalLLaMA threads is that it’s the first MCP indexer that actually feels production-ready. The most common gripe is the dependency on a vector database — people want a fully local, single-binary version. (See “Limitations” below.)

Honest Limitations

Claude Context is impressive, but it’s not magic and the docs gloss over a few things:

You need a vector database. Even with self-hosted Milvus, that’s a new piece of infrastructure to run. There’s no SQLite-style zero-dependency mode (yet).
Embedding cost is real on first index. A 1M-line repo with text-embedding-3-large can cost $5–$20 to index initially. Use text-embedding-3-small or VoyageAI’s code-tuned model to cut that 5–10x.
No re-ranking by default. Hybrid search is good, but a cross-encoder re-ranker on the top-50 hits would push quality higher. Not yet built in.
AST splitter coverage is uneven. Top-tier languages (TS, JS, Python, Go, Rust, Java) work great. Edge cases — Elixir, Clojure, Solidity — fall back to character splitting, which hurts recall.
Multi-repo indexing is manual. You can index multiple codebases, but there’s no first-class “workspace” concept that searches across them in one query.
OpenAI key required by default. Yes, you can swap to Ollama, but the README leads with OpenAI and a lot of users miss the local option.

Who Should Use This (And Who Shouldn’t)

Use Claude Context if:

You work in a 50K+ line repo and your AI agent keeps “forgetting” parts of it
You’re already paying Claude/OpenAI/Gemini API bills and want to cut tokens
You want one indexer that works across Claude Code, Cursor, Codex, and Gemini CLI
You’re comfortable running (or paying for) a vector DB

Skip it if:

Your repo is under 5K lines — grep and Read tool calls are still fine
You can’t run Milvus and don’t want a managed vendor in the loop
You need on-prem with no managed services and don’t want Docker — there’s no SQLite-only path yet

Comparison With Alternatives

Tool	Open source	MCP	Hybrid search	AST chunking	Local-only option	Multi-client
Claude Context	✅ MIT	✅	✅	✅	⚠️ (needs Milvus + Ollama)	✅ 13+ clients
Cursor Codebase Index	❌	❌	✅	✅	❌	❌ Cursor only
Sourcegraph Cody	Partial	❌	✅	✅	✅ enterprise	❌
Aider repo-map	✅	❌	❌ (graph-based)	✅	✅	❌ Aider only
Continue @codebase	✅ Apache	❌	✅	✅	✅ LanceDB	❌ Continue only
Greptile	❌ SaaS	✅	✅	✅	❌	✅

The unique slot Claude Context fills is open-source + MCP + multi-client. Cursor’s index is best-in-class but only works in Cursor. Continue’s index is good but only in Continue. Greptile works across clients but is closed SaaS. Claude Context is the only tool that’s all three at once.

FAQ

Does Claude Context work with Cursor, Codex CLI, and Gemini CLI, or only Claude Code?

It works with 13+ MCP-compatible clients: Claude Code, Cursor, Codex CLI, Gemini CLI, Qwen Code, Windsurf, Cline, Roo Code, Augment, Zencoder, Claude Desktop, Void, and Cherry Studio. The README has copy-paste configs for each. The “Claude” in the name is misleading — it’s a generic MCP server.

How much does it cost to run on a 100K-line codebase?

Initial indexing with text-embedding-3-small is roughly $0.02 per 1M tokens of code, so ~$1–$3 for a 100K-line repo. Storage on Zilliz Cloud’s free tier covers most personal projects. With self-hosted Milvus + Ollama embeddings (nomic-embed-text), the marginal cost is zero.

Can I run Claude Context fully offline / on-prem?

Yes. Use Ollama for embeddings (set EMBEDDING_PROVIDER=ollama) and self-hosted Milvus in Docker for the vector store. No code or queries leave your machine. The README hides this option a few clicks deep but it’s fully supported.

How is this different from Cursor’s built-in codebase indexing?

Cursor’s index is closed source, lives only inside Cursor, and uses Cursor’s hosted infrastructure. Claude Context is MIT-licensed, MCP-based, and works across 13+ clients with your choice of embedding model and vector store. If you’re a Cursor-only shop, the built-in index is fine. If you switch between Claude Code, Cursor, and Codex, Claude Context gives you one consistent index across all three.

Does it actually save tokens, or is that marketing?

Zilliz published its evaluation methodology and results — controlled benchmarks showing ~40% token reduction at equivalent retrieval quality. The evaluation set is in the repo so you can reproduce it. Real-world reports on r/ClaudeAI confirm 30–50% reductions in long coding sessions.

Is it safe to use on private/proprietary code?

If you use OpenAI / VoyageAI / Gemini embeddings, your code chunks are sent to those providers’ embedding endpoints. They typically don’t train on API data, but check the terms for your account tier. For full isolation, switch to Ollama embeddings + self-hosted Milvus — nothing leaves your network.

Bottom Line

Claude Context is the first MCP-native code-search tool that feels production-ready. The ~40% token reduction is real, the AST-aware chunking beats fixed-size windows, and the 13-client compatibility matrix means you set it up once regardless of which agent you use this week. The 3,725 stars in a single week suggest a lot of developers agree.

If you’re already running Claude Code, Cursor, or Codex CLI on a serious codebase, Claude Context is worth 30 minutes of setup tonight.

Repo: github.com/zilliztech/claude-context — MIT licensed, ~10K stars and climbing fast.