CocoIndex Review: Incremental RAG Engine for AI Agents

TL;DR

CocoIndex is an open-source Python framework (with a Rust core) that solves the most underrated problem in production AI: your agent’s RAG index goes stale the moment the data changes. Instead of rebuilding the whole vector store every hour, CocoIndex tracks per-row provenance and only reprocesses the delta when a source file, a chunking function, or an embedding model changes. It’s trending hard on GitHub right now — +1,798 stars this week, ~9,700 total — and the framework has been pitched as “React for data engineering” because you declare the target state and the engine keeps it in sync forever.

Key facts:

Incremental by design — change one file in a 10,000-document corpus and only that file’s chunks re-embed; the other 99.9% stay cached
Rust core + Python API — production-grade ingestion under the hood, but you write your pipeline in 20 lines of Python
Connectors — local filesystem, Postgres, Qdrant, Neo4j, Kafka, plus custom source connectors for any API
Lineage built in — every vector or graph node in the target traces back to the exact source byte that produced it
Code-aware caching — @coco.fn(memo=True) hashes both input and function source, so editing your splitter only re-runs the affected branch
Apache 2.0, Python 3.10–3.13, ships as pip install cocoindex
20+ working examples in the repo: code embedding, PDF embedding, Hacker News trending topics, knowledge graph from conversations, CSV-to-Kafka, and more
Flagship product on top: CocoIndex-code, an MCP server for Claude Code / Cursor that exposes an AST-aware semantic code index with sub-second freshness
Honest limitation — it’s infrastructure, not a magic agent button. You still own the data model, chunking strategy, and embedding choices; incremental correctness depends on your invalidation logic being sound.

If you’re shipping an AI agent that has to reason over data that actually changes — a codebase, a CRM, a wiki, an email inbox — CocoIndex is currently the most ergonomic open-source way to keep its memory fresh without re-embedding the world every cycle.

The Problem: Stale RAG Is Quietly Killing Your Agent

Every team building a production AI agent hits the same wall. You stand up a beautiful demo where the agent answers questions over your docs, your code, your Slack history. It works. You ship it. And then, two weeks in, the complaints start:

“The agent doesn’t know about the new pricing page.”
“It keeps citing the deprecated API.”
“Why does it think Sarah is still on the team?”

The answer is always the same: the index is stale. Your batch pipeline runs once a night, takes 90 minutes, and burns $40 in embeddings. So you only run it nightly. So your agent is always at least a few hours out of date — and on a busy product day, half a day behind reality.

The naive fix is “just rebuild more often.” But for a real corpus — even 50,000 documents — full rebuilds quickly become economically and computationally impossible. You don’t want to re-embed the entire repository because one CLAUDE.md file changed. You want to re-embed that file.

This is the problem CocoIndex was built to solve. It treats your RAG index the way React treats the DOM: you declare what the target should contain as a function of the source, and the engine handles the diffing.

Three things converged in the last 60 days:

Long-horizon agents are the new shape of AI workloads. Coding agents like Claude Code, Cursor, and OpenAI’s Codex CLI now run for hours over a single repo. They need to see current code, not last night’s snapshot. CocoIndex’s flagship CocoIndex-code MCP server is aimed straight at that use case.
MCP made fresh context a portable problem. Once Anthropic standardized the Model Context Protocol, it became obvious that whoever ships the best “live, semantic context server” wins a slice of every agent. CocoIndex’s positioning — fresh context as a service — slots cleanly into that gap.
The Rust core just hit production maturity. Recent releases added parallel chunking, zero-copy transforms, and failure isolation so one bad PDF doesn’t stall the flow. That’s the difference between a clever side project and something you’d actually run in front of customers.

The result: 1,798 stars in seven days, a Trendshift badge, and a wave of “Show HN” and Reddit discussion threads where people are reporting real cost savings on their embedding bills.

How It Works: Target = F(Source)

The mental model is one line:

Target = F(Source)

You describe the transformation F as a Python function. CocoIndex’s engine watches the source, watches the function source code, and keeps the target in sync — forever.

Here’s the canonical example from the README:

import cocoindex as coco
from cocoindex.connectors import localfs, postgres
from cocoindex.ops.text import RecursiveSplitter

@coco.fn(memo=True)  # ← cached by hash(input) + hash(code)
async def index_file(file, table):
    for chunk in RecursiveSplitter().split(await file.read_text()):
        table.declare_row(text=chunk.text, embedding=embed(chunk.text))

@coco.fn
async def main(src):
    table = await postgres.mount_table_target(PG, table_name="docs")
    table.declare_vector_index(column="embedding")
    await coco.mount_each(index_file, localfs.walk_dir(src).items(), table)

coco.App(coco.AppConfig(name="docs"), main, src="./docs").update_blocking()

Run it once: it backfills. Run it again tomorrow: nothing re-embeds, because nothing changed. Edit one Markdown file: only that file’s chunks re-embed, the affected Postgres rows update, and stale rows get retired. Change the splitter from RecursiveSplitter to a smarter one: only the rows whose outputs depended on RecursiveSplitter’s code re-run.

That last point is the magic. Because the @coco.fn(memo=True) decorator hashes the function’s source code, refactoring your transformation invalidates exactly the right portion of the index — no manual cache busting, no awkward versioning scheme, no global “delete and rebuild.”

Key Features (With Code)

1. Real connectors, not just file globs

Out of the box CocoIndex supports:

Sources: local filesystem, S3-compatible blob storage, Postgres CDC, Slack, Notion, REST APIs (via custom source connectors), Hacker News (yes, really)
Targets: Postgres (with pgvector), Qdrant, Neo4j (for knowledge graphs), Kafka (as an output topic), data warehouses

The custom source connector pattern is just a Python class — there’s an example in the repo of a Hacker News connector that pulls threads, recursively walks comments, and only re-runs the LLM topic extraction on threads that changed.

2. Built-in ops for the boring stuff

You don’t have to write your own chunker, OCR step, or embedder for the common cases:

from cocoindex.ops.text import RecursiveSplitter, MarkdownSplitter
from cocoindex.ops.vision import OCR
from cocoindex.ops.embed import OpenAIEmbedder, SentenceTransformerEmbedder

These are first-class operators that participate in the incremental graph — their outputs are cached and invalidated like any other transformation.

3. Knowledge graphs, not just vectors

A surprising number of teams discover halfway through their RAG project that a flat vector index doesn’t actually model their domain. People, tickets, customers, codebases — these are graphs. CocoIndex lets you emit nodes and edges into Neo4j from the same flow:

@coco.fn(memo=True)
async def extract_entities(doc, graph):
    entities = await llm.extract(doc.text, schema=PersonOrCompany)
    for e in entities:
        graph.upsert_node(label="Person", id=e.name, props={"role": e.role})
        graph.upsert_edge(src=doc.id, dst=e.name, label="MENTIONS")

Incremental graph updates are hard to get right by hand. The engine retiring stale edges when a document changes is genuinely useful.

4. CocoIndex-code: the flagship for coding agents

The team’s most aggressive bet is a separate product called CocoIndex-code — an MCP server that exposes an AST-aware, incremental, semantic code index to any MCP-compatible agent (Claude Code, Cursor, Continue). Their claims:

70% fewer tokens per turn (because the agent retrieves just the relevant symbols, not 200KB of file dumps)
80–90% cache hits on re-index after a commit
Sub-second freshness from git commit to “agent sees the new function”
Supports Python, TypeScript, Rust, Go

If you’re building or evaluating coding agents, this is the most concrete proof point for the framework. The same incremental engine powers it.

Community Reception

The reaction on HN and Reddit has been notably substantive — fewer “looks cool, starred” comments, more “here’s how I’d use this”:

On the Show HN thread, one founder reported saving “a significant amount of time” updating vector embeddings for a startup product, calling out the step-by-step tutorial.
On r/cocoindex, users have been posting their custom source connectors — the Hacker News one, a Linear ticket connector, a Confluence one — which suggests the extension API is actually usable, not just theoretical.
A recurring theme in discussion: people grasp the “React for data” metaphor immediately and then the questions get good — about invalidation correctness, partial failures, and how the system handles schema migrations.
One critical voice on HN pushed back that incremental systems shift correctness work onto the user: if your @coco.fn is non-deterministic or has hidden inputs, the cache will silently serve wrong answers. This is a fair critique — CocoIndex’s recommendation is to keep transformation functions pure and route side effects through declared connectors.

The signal-to-noise ratio is high. This is a tool being adopted by people who have shipped production RAG before and know exactly what it costs them to not have incrementality.

Getting Started

Install:

pip install -U cocoindex
# plus whatever target you're using
docker run -d -p 5432:5432 \
  -e POSTGRES_PASSWORD=cocoindex \
  pgvector/pgvector:pg16

Clone an example to use as a starting point:

git clone https://github.com/cocoindex-io/cocoindex
cd cocoindex/examples/code_embedding
python flow.py

That one walks a local git repo, splits Python/TypeScript files by AST, embeds the chunks with a model of your choice, and writes them to Postgres with a pgvector index. Edit a source file, re-run, and watch only the affected rows update — that’s the “aha” moment.

If you’re driving CocoIndex from inside a coding agent (Claude Code, Cursor), the team ships a CocoIndex skill file you can drop into your agent’s context. It packs the concepts, APIs, and patterns into one file so the agent writes correct v1 code instead of hallucinating decorator names.

Who Should Use This (And Who Shouldn’t)

Good fits:

You’re shipping an AI agent that reads from data sources that actually change — codebases, CRMs, internal wikis, ticket systems
Your corpus is large enough (>10K docs) that nightly full rebuilds are painful or expensive
You care about lineage and explainability — “why did the agent say that?” should be answerable
You want to use Postgres or Neo4j as your vector/graph store instead of yet another managed service
You’re building an MCP server or coding agent and need semantic, incremental code search

Not the right fit:

Your corpus is small (a few hundred docs) and changes once a week — a daily cron rebuilding into Chroma or FAISS is simpler and fine
You need a hosted, click-to-deploy RAG service — CocoIndex is a framework you run, not a SaaS
Your team has zero Python or Postgres operational experience — there’s a learning curve, even though the API is clean
You want a no-code UI — CocoIndex is firmly a developer tool

How CocoIndex Compares

Tool	Incremental?	Lineage	Graph support	Code-aware	License
CocoIndex	✅ Per-row + per-fn-source	✅ Built in	✅ (Neo4j)	✅ (CocoIndex-code)	Apache 2.0
LlamaIndex	Partial (manual)	❌	Partial	❌	MIT
LangChain	❌ (rebuild)	❌	Partial	❌	MIT
Haystack	❌ (rebuild)	❌	❌	❌	Apache 2.0
Pathway	✅ (streaming)	Partial	❌	❌	BUSL → MIT
Unstructured.io	❌ (parsing only)	❌	❌	❌	Apache 2.0

The closest comparable in spirit is Pathway (also incremental, streaming-first), but Pathway leans heavier on the streaming-engine framing while CocoIndex leans into the “declarative target = F(source)” mental model. For most RAG-style workloads, CocoIndex’s API surface is smaller and easier to onboard onto.

If you’ve already invested in LlamaIndex or LangChain, CocoIndex isn’t necessarily a replacement — it’s the layer under them. You can have CocoIndex maintain a fresh Postgres + pgvector index and point your LlamaIndex retriever at it.

Honest Limitations

A few sharp edges worth knowing before you adopt:

Postgres-centric defaults. Other targets work, but the happy path runs through Postgres. If you’re a SQLite or DuckDB shop, expect some legwork.
Async-only Python API. Everything is async def — fine for new projects, occasionally awkward if you’re embedding it inside a sync codebase.
You own correctness. As one HN commenter put it: incremental systems are only as correct as your invalidation logic. Non-deterministic transforms or hidden side effects will silently corrupt your index. The fix is hygiene (pure functions, declared connectors) but it’s hygiene the framework can’t enforce.
Operational footprint. Running a Rust binary + Postgres + your own embedding service is real ops work. For a hobby project this is overkill; for a production agent it’s table stakes.
No managed offering yet. There’s an enterprise page on the site, but as of writing this is still primarily a self-host story.

None of these are deal-breakers, but they should shape how you scope your first project — start with one source, one target, one transformation, and grow from there.

FAQ

Is CocoIndex a RAG framework like LlamaIndex?

Not exactly. LlamaIndex and LangChain are retrieval and orchestration frameworks — they help you wire LLMs to data at query time. CocoIndex sits one layer below: it builds and maintains the index that those frameworks query. The cleanest pattern is to use CocoIndex to keep a Postgres + pgvector store fresh, then point your LlamaIndex retriever at it. They’re complementary, not competitive.

How does CocoIndex compare to Pathway for incremental RAG?

Both are genuinely incremental. Pathway is positioned as a streaming computation engine — closer in spirit to Apache Flink — and tends to suit event-driven, low-latency workloads. CocoIndex is positioned as a declarative data pipeline with React-style mental model and a more compact Python API. For typical RAG workloads (rebuild an index as the corpus drifts), CocoIndex is the simpler onboarding. For high-throughput streaming with windowed joins, Pathway has more depth.

Can I use it without Postgres?

Yes — Qdrant, Neo4j, and Kafka are first-class targets, and the connector API is open. But the documentation and examples lean Postgres-heavy, so be prepared to read source code for less-trodden targets.

Will my embedding bill actually go down?

In practice, yes — significantly, if your corpus is large and your change rate is small (which it almost always is). The pathological case is a corpus that changes 50% per day, where incrementality buys you less. For a typical codebase or doc set where 0.1–1% of files change per day, you can expect 50–100x reductions in re-embedding cost.

Is this production-ready?

The Rust core is described by the maintainers as “production-grade from day zero,” with retries, exponential backoff, dead-letter queues, and per-record failure isolation. That said: 9,700 stars and 1,800-a-week growth means the user base is still relatively young. Treat it the way you’d treat any Apache-licensed framework in its growth phase — pin versions, read the changelog, and have a rollback plan.

CocoIndex is one of the most interesting infrastructure projects in the AI stack right now precisely because it’s not trying to be another agent framework. It’s tackling the much less glamorous, much more valuable problem of keeping the agent’s view of the world current. If you’re building anything that has to answer “what’s in the data right now” instead of “what was in the data last night,” it’s worth a serious afternoon of evaluation.

Repo: github.com/cocoindex-io/cocoindex Docs: cocoindex.io/docs License: Apache 2.0