Cognee Review: Open-Source AI Memory Platform for Agents

TL;DR

Cognee is an open-source AI memory platform that gives agents persistent long-term memory across sessions. It combines vector embeddings, graph reasoning, and cognitive-science-grounded ontology generation into a single Python package with a four-verb API: remember, recall, forget, and improve. It just crossed 26,000 GitHub stars (6,400+ new this week) and, notably, in Cognee 1.0 you can run the entire memory layer — graph, vectors, sessions, metadata — on a single Postgres instance instead of the usual Neo4j + Redis + vector-DB stack.

Key facts:

Four-verb API: cognee.remember(), cognee.recall(), cognee.forget(), cognee.improve() — that’s the whole surface area
Two memory tiers: session memory (fast cache) + permanent knowledge graph (background sync)
Postgres-native: pgvector for embeddings, Postgres graph backend for relationships, SQL session cache — one database or swap in Neo4j/Redis/Qdrant
BEAM benchmark: 0.79 at 100K tokens (previous SOTA 0.735), 0.67 at 10M tokens — beats the state of the art on long-context memory
Claude Code plugin: hooks into SessionStart, UserPromptSubmit, PostToolUse, Stop, PreCompact, SessionEnd for automatic memory capture and context injection
Multi-language clients: Python, TypeScript (@cognee/cognee-ts), Rust (cognee-rs)
26,000+ GitHub stars, 6,400+ new stars this week, Apache 2.0

Quick Reference

Property	Value
Repository	github.com/topoteretes/cognee
Author	Topoteretes (Vasilije Markovic + team)
License	Apache 2.0
Languages	Python 3.10–3.14, TypeScript, Rust
Install	`uv pip install cognee`
GitHub Stars	26,229 (6,417 this week)
Docker Images	`cognee/cognee`, `cognee/cognee-mcp`
Cloud	cognee.ai (managed)
Research paper	arXiv:2505.24478
Benchmark	BEAM 0.79 @ 100K, 0.67 @ 10M tokens

What It Is

Cognee is the piece almost every serious agent build eventually needs but nobody wants to write from scratch: the memory layer. LLMs, on their own, have no persistent memory. RAG stores documents but doesn’t build relationships. Vector databases embed chunks but can’t reason about how facts connect. Cognee sits underneath your agent and turns raw inputs into a knowledge graph plus a vector index plus a session cache — all of which get queried together when the agent needs context.

The mental model is straightforward. You call remember() with text (or documents, or structured data). Cognee runs an ingestion pipeline that extracts entities, generates embeddings, builds subject–relation–object triplets, and stores everything in a knowledge graph that also knows how to be searched by meaning. When you later call recall() with a question, Cognee picks the best retrieval strategy — pure vector search, graph traversal, or a hybrid — and returns context that the agent can slot into its prompt.

What makes it different from Graphiti, Zep, or standard GraphRAG stacks: cognee ships the whole thing as one Python package with sensible defaults (SQLite + LanceDB + Kuzudb for local dev, Postgres + pgvector for production) and a four-verb API. You don’t need to stand up Neo4j to start; you just install the package and call remember.

Installation and First-Run

The quickstart is refreshingly boring, which is the highest compliment:

uv pip install cognee

import cognee
import asyncio
import os

os.environ["LLM_API_KEY"] = "sk-..."  # OpenAI, or configure any provider

async def main():
    # Store permanently in the knowledge graph
    await cognee.remember("Cognee turns documents into AI memory.")

    # Store in session memory (fast cache, syncs to graph in background)
    await cognee.remember(
        "User prefers detailed explanations.",
        session_id="chat_1",
    )

    # Query with auto-routing
    results = await cognee.recall("What does Cognee do?")
    for r in results:
        print(r)

    # Session-scoped recall (falls through to permanent graph if needed)
    results = await cognee.recall(
        "What does the user prefer?",
        session_id="chat_1",
    )
    for r in results:
        print(r)

asyncio.run(main())

That’s a functioning agent memory layer in ~15 lines. No graph database to configure, no vector store to provision, no schema to design. The first remember() call takes a few seconds because Cognee spins up the local SQLite + LanceDB stack and runs the entity-extraction pipeline; every call after that is fast.

There’s also a CLI for the shell-first crowd:

cognee-cli remember "Cognee turns documents into AI memory."
cognee-cli recall "What does Cognee do?"
cognee-cli forget --all
cognee-cli -ui   # local web UI (Docker required)

The Postgres-Only Deployment

This is the headline feature of Cognee 1.0 and the reason it’s trending this week. Traditionally, “graph memory” means running four services in production:

Memory layer	Traditional stack	Cognee on Postgres
Relationships	Neo4j (or another graph DB)	Cognee’s Postgres graph backend
Embeddings	Dedicated vector DB (Qdrant, Weaviate, Milvus)	pgvector
Sessions	Redis	SQL session-cache backend
Metadata	Relational DB	Same Postgres

Cognee collapses all four into one Postgres instance. The graph still exists — it just lives inside the same Postgres-backed memory layer as the text, metadata, and embeddings, so retrieval doesn’t cross service boundaries. In the project’s own CI benchmarks, this setup ran ~10% faster than the separate graph-plus-vector configuration.

To switch on Postgres mode:

pip install "cognee[postgres]"

# .env
DB_PROVIDER=postgres
VECTOR_DB_PROVIDER=pgvector
GRAPH_DATABASE_PROVIDER=postgres
CACHE_BACKEND=postgres

DB_HOST=localhost
DB_PORT=5432
DB_USERNAME=cognee
DB_PASSWORD=cognee
DB_NAME=cognee_db

For teams that already run Postgres, this is a large operational win. One database to back up, one to secure, one to patch. If your workload later demands specialized backends, you can still swap in Neo4j or Neptune for graphs, Redis for sessions, and Qdrant/Weaviate/Milvus/Chroma for vectors via community adapters. Nothing about the API changes.

Two-Tier Memory: Session vs Permanent

Cognee splits memory into two tiers, which is a good design decision borrowed from how human memory actually works:

Session memory is short-term working memory. Cognee loads relevant embeddings and graph fragments into a fast cache scoped by session_id. Reads and writes are cheap and low-latency, ideal for the currently-open chat or task.
Permanent memory is the long-term knowledge graph. User data, resolved interaction traces, learned patterns, and stable facts land here. Session data syncs into permanent memory in the background (or on session end).

The nice consequence: your hot path stays fast because you’re hitting the session cache, and your cold path is a proper graph query that can reason across everything the agent has ever learned.

# Session write — fast, scoped to this conversation
await cognee.remember(
    "The user is debugging a Postgres connection issue on Fly.io.",
    session_id="chat_2026_07_01",
)

# Session read first, fall through to permanent graph if needed
answer = await cognee.recall(
    "What was the user working on last?",
    session_id="chat_2026_07_01",
)

Under the Hood

Cognee’s ingestion pipeline is where the work happens. remember() is shorthand for add + cognify + improve:

add — ingest the raw document (text, PDF, code, JSON, structured data).
cognify — extract entities and subject–relation–object triplets via LLM calls, generate embeddings, write both into the graph and vector store.
improve — a feedback loop that refines the graph over time. Weights shift, redundant nodes merge, ontology grounding tightens.

recall() auto-routes: it inspects the query and picks between pure vector similarity, graph traversal, or a combined “GraphRAG” mode that pulls both. You can override with explicit search parameters when needed.

Cognee also supports OWL ontologies for domain-specific knowledge modeling — useful in regulated domains (medical, legal, financial) where you want the graph structured against a formal schema rather than whatever the LLM extracts on the fly.

Claude Code Plugin: Persistent Memory Across Sessions

If you use Claude Code, the Cognee memory plugin is the most useful integration to try. Install it once from the shell, before launching Claude Code:

claude plugin marketplace add topoteretes/cognee-integrations
claude plugin install cognee-memory@cognee

export LLM_API_KEY="sk-..."   # local mode
claude

The plugin hooks into Claude Code’s lifecycle events:

SessionStart — selects mode (local vs cloud) and sets up identity
UserPromptSubmit — injects dataset-scoped context from memory into the prompt
PostToolUse — captures tool call traces
Stop — writes the assistant’s answer into session memory
PreCompact — preserves memory across Claude Code’s context resets (this is the killer feature — no more losing the plot when the context window fills up)
SessionEnd — final sync into the permanent graph

You get a “Cognee Memory Connected” system message on startup. In practice, this means Claude Code remembers what you were working on last week, what conventions your codebase follows, and which fixes you’ve already tried and rejected — without you copy-pasting summaries into every new session.

Benchmarks: BEAM

Cognee published results on BEAM, a long-context benchmark designed to test whether a memory system can track a long, evolving conversation — a more useful test than the usual needle-in-a-haystack setups.

Benchmark	Setting	Cognee	Previous SOTA	Obsidian / RAG baseline
BEAM	100K tokens	0.79 (>0.8 with per-question routing)	0.735	~0.33
BEAM	10M tokens	0.67	0.641	~0.33

Two things to note. First, these numbers are with Cognee’s default settings and standard open-source features — no BEAM-specific pipelines, no custom fine-tunes. Second, the vanilla RAG baseline sits at ~0.33, less than half of what Cognee delivers, which lines up with what most people building agent memory report anecdotally: RAG alone is not enough once the context gets long or the questions get multi-hop.

An earlier evaluation on HotpotQA also reported 87% answer accuracy with human labeling (evaluations docs), consistent with the “top choice for knowledge-intensive scenarios and multi-hop reasoning” verdict from third-party reviewers.

Fair caveat that Cognee itself calls out: these numbers are directional. Long-context memory benchmarks are genuinely hard and the field is still figuring out what to measure. But even discounting the SOTA claim, the gap over vanilla RAG is large enough to justify trying it on your own workload.

Two Concrete Use Cases

Customer Support Agent. The agent resolves issues using a customer’s data across finance, support, and product history. Cognee tracks past interactions, failed actions, and resolved cases. When the customer says “my invoice looks wrong and the issue is still not resolved,” the agent can pull similar billing cases resolved last month and reply with a concrete diagnosis rather than a scripted response. Memory updates after execution so the agent never repeats the same failed step.

SQL Copilot. Cognee stores expert SQL queries, workflow patterns, and schema structures. When a junior asks “how do I calculate customer retention for this dataset?” Cognee matches the current schema to a known structure and adapts an expert’s retention query. Every successful implementation feeds back into the graph.

The pattern is the same in both cases: cognee handles the “what did we do before, and what worked” question, so the agent doesn’t have to.

Community Reactions

Cognee has been building momentum on Hacker News and Reddit for over a year — the Show HN in June 2025 framed it as “the AI memory layer that remembers context,” and the founder is active in r/LLMDevs and r/AIMemory.

Recurring themes:

Praise for the API surface: developers like that remember/recall is the whole thing.
Skepticism about graph overhead: some HN commenters worry graph construction is slow on large corpora. Cognee runs it in the background and lets you tune it.
Comparisons to Graphiti / Zep / Mem0: the founder describes cognee as “similar to Graphiti but more modular, and not just time graphs.” Graphiti is opinionated about temporal edges; cognee lets you shape the graph how you want.
Postgres-only mode is universally applauded — the top-requested feature for months, and the main reason the current release is trending.

The WeavAI review from May 2026 rates cognee 8.5/10 overall and calls it “the top choice for knowledge-intensive scenarios and multi-hop reasoning applications.”

Honest Limitations

First-run cost: ingesting a large corpus makes a lot of LLM calls (entity extraction, triplet generation, embedding). Point cognee at a 100MB knowledge base with OpenAI defaults and your first bill is not zero. Local LLM providers are supported and shrink this concern.
Graph queries are slower than pure vector search. If your workload is really just similarity retrieval, you’re paying for graph infrastructure you don’t need. Cognee is designed for cases where relationships matter.
Ontology tooling is Python-centric. TypeScript and Rust clients cover the API but don’t have first-class support for defining custom OWL ontologies — that lives in the Python SDK.
Docker requirement for cognee-cli -ui. The MCP-backed UI needs Docker or Colima installed.
Multi-tenant isolation is documented but young. Cognee ships user/tenant isolation primitives; if you’re deploying to enterprise with strict data-boundary requirements, build your own test harness before trusting it in production.

Deployment Options

Cognee ships 1-click deploy configurations for most modern platforms: Cognee Cloud (managed, await cognee.serve()), Modal (serverless, GPU), Railway (PaaS + native Postgres), Fly.io (edge + volumes), Render, Daytona (cloud sandboxes), and plain Docker Compose for self-hosters. docker compose up with the right profiles gets you the API server, frontend, MCP server, Postgres, and optionally Neo4j — all in one file.

FAQ

How is Cognee different from a normal RAG pipeline?

RAG chunks documents, embeds them, and does similarity search. It has no idea that “the customer” in chunk 47 is the same person as “the user” in chunk 302. Cognee builds a knowledge graph on top of the embeddings, so entities and relationships are explicit. When you query, Cognee can traverse the graph and do vector search, which is why it hits ~0.79 on BEAM at 100K tokens vs ~0.33 for vanilla RAG.

Do I need to run Neo4j?

No. Cognee 1.0 can run the entire memory layer on a single Postgres instance (pgvector for embeddings + Postgres graph backend for relationships + SQL session cache + metadata in the same DB). Local dev works with SQLite + LanceDB + Kuzudb, no services required. Neo4j is still supported if you want it.

How does Cognee compare to Graphiti, Zep, or Mem0?

The founder’s own framing (in r/LLMDevs): “similar to Graphiti but a bit more modular and customizable, and not just time graphs.” Graphiti is opinionated around temporal edges; Cognee is more general-purpose. Zep is a hosted product; Cognee is open source with a cloud option. Mem0 is more of a session-memory library; Cognee spans session + long-term graph.

Can I use Cognee without OpenAI?

Yes. Cognee’s LLM provider layer is pluggable — configure any OpenAI-compatible endpoint (local Ollama, LM Studio, vLLM, Anthropic, Gemini, etc.) via environment variables. See the LLM Provider docs.

Is there a JavaScript/TypeScript client?

Yes — @cognee/cognee-ts on npm. There’s also a Rust client, cognee-rs, for cargo add cognee. All three clients hit the same Cognee API, so you can mix languages across an application.

What’s the license?

Apache 2.0. You can self-host, modify, and use it commercially. There’s a managed Cognee Cloud tier for teams that don’t want to run the infra.

Verdict

Cognee earns the “top of GitHub trending” spot because it does one thing well: it’s the memory layer for AI agents that you don’t have to build yourself. The four-verb API (remember, recall, forget, improve) is small enough to fit in your head, the Postgres-only deployment removes the biggest operational objection to graph memory, and the BEAM benchmarks are strong enough — and honestly presented enough — to take seriously.

If you’re building anything more sophisticated than a stateless chatbot, you will eventually need agent memory. Cognee is the shortest path from “we need memory” to “we have memory” that currently exists in open source. Install it, spend an afternoon on the Claude Code plugin or the Python quickstart, and see if the two-tier session-plus-graph model fits your workload.

Repo: github.com/topoteretes/cognee · Docs: docs.cognee.ai · Discord: discord.gg/NQPKmU5CCg