MemPalace Review: Local AI Memory With 96.6% Recall

TL;DR

MemPalace is a local-first, open-source AI memory system that stores conversation history verbatim and retrieves it with semantic search — no summarization, no LLM rewriting, no API calls required. It currently leads its category with 96.6% R@5 on the LongMemEval benchmark in raw mode and 98.4% on a held-out hybrid run, and the GitHub repo has exploded to 55,500 stars with 1,819 added this week.

The pitch is straightforward: every AI memory tool you’ve used so far probably summarizes your past sessions into “facts,” loses nuance, and then can’t tell you what you actually said. MemPalace keeps your raw text — your Claude Code session, your Cursor history, your project notes — indexed in a structured palace (wings → rooms → drawers), and retrieves the original transcript chunk when you ask “why did we switch to GraphQL?” Nothing leaves your machine.

Key facts:

55,500 GitHub stars, 1,819 added this week — currently top-3 trending Python repo
96.6% R@5 on LongMemEval (raw, no LLM, no API key) — best public open-source number
98.4% on held-out 450 questions with hybrid v4 (keyword + temporal boosting)
Verbatim storage — no summarization, no paraphrasing, no information loss
Pluggable backend — ChromaDB default, plus sqlite_exact, Qdrant, and pgvector
29 MCP tools for palace reads/writes, knowledge graph, agent diaries
Auto-save hooks for Claude Code, Codex CLI, and Cursor IDE
Temporal knowledge graph with validity windows, backed by local SQLite
MIT licensed, Python 3.9+, runs entirely offline once the embedding model is downloaded

Why “facts extraction” memory tools fail

There’s a familiar pattern in AI memory tools: ingest a conversation, ask an LLM to extract “facts,” store them as embeddings, retrieve facts on demand. Mem0, Zep, Supermemory, Hindsight, and dozens of others work this way. It’s what every memory startup pitch deck shows.

It also has a problem that becomes obvious after a month: you can never get back what you actually said. The “facts” are a lossy summarization, and the LLM that wrote them had no idea which of your offhand asides would matter later. By the time you ask “wait, what was that OAuth library I mentioned?”, the original is gone — replaced with “the user uses OAuth” or nothing at all.

MemPalace’s bet is the opposite: store verbatim text, retrieve with semantic search, never summarize. The “palace” structure — people and projects become wings, topics become rooms, content lives in drawers — is purely an index for scoping. The drawers hold raw transcript chunks.

Lossless storage + good retrieval beats clever summarization at any task where you eventually need the original.

The benchmark numbers (and why they matter)

MemPalace publishes more reproducible benchmark detail than most commercial memory products. The headline result on LongMemEval (500 questions, recall@5):

Mode	R@5	LLM required
Raw (semantic search, no heuristics)	96.6%	None
Hybrid v4, held-out 450q	98.4%	None
Hybrid v4 + LLM rerank (full 500)	≥99%	Any capable model

A few things to notice:

96.6% with zero LLM calls, no API key, no cloud — this is the embedding-only path. The cost-per-query is effectively zero after install.
The 98.4% number is the held-out result — they trained the hybrid heuristics on 50 dev questions and report on the other 450 they never tuned against. That’s the honest generalisable figure.
They explicitly refuse to publish a 100% number, calling it “teaching to the test” — the gap to 99%+ was closed by inspecting wrong answers, which is exactly the failure mode of every benchmark in the field.

For comparison: MemBench (ACL 2025, 8,500 items) hits R@5 of 80.3%, LoCoMo top-10 with hybrid v5 hits 88.9%, and ConvoMem averages 92.9% recall across categories. These are the numbers you can verify yourself with the commands in benchmarks/BENCHMARKS.md.

What’s not in the README — pointedly — is a side-by-side against Mem0, Mastra, Supermemory, or Zep. The maintainers’ position: those projects publish different metrics on different splits, and stacking retrieval recall next to end-to-end QA accuracy isn’t an honest comparison. That’s the right call, but it does leave you to do your own bake-off if you’re picking between them.

How the palace is structured

The metaphor is load-bearing. Your data flows in as raw conversation, transcripts, or files, and gets indexed into a three-level hierarchy:

Palace
├── Wing      (people, projects)
│   ├── Room      (topics)
│   │   └── Drawer    (verbatim text chunks)

When you search, you can scope to a wing (“everything in the andrew.ooo project”) or query globally. The semantic index sits on the drawer contents; the wing/room labels are metadata for filtering. This avoids the failure mode of pure vector DBs where every query searches the entire corpus and noise drowns out signal in long-lived stores.

The pluggable backend layer is the part most people will care about as the project matures:

ChromaDB (default) — zero-config, local, fine for individual use
sqlite_exact — exact-vector correctness checks; useful for benchmarking and debugging recall
Qdrant — REST backend; opt-in via MEMPALACE_QDRANT_URL
pgvector — Postgres + JSONB; opt-in via MEMPALACE_PGVECTOR_DSN

The Qdrant and pgvector paths are explicitly described as opt-in — they will send your verbatim drawer text to the configured server. That’s correct posture for a local-first tool: cloud is possible, but never the default.

Real install — 60 seconds

# uv (recommended — isolated, on your PATH)
uv tool install mempalace

# Initialize a palace in your project
mempalace init ~/projects/myapp

# Mine project files into the palace
mempalace mine ~/projects/myapp

# Mine your existing Claude Code session history
mempalace mine ~/.claude/projects/ --mode convos

# Search
mempalace search "why did we switch to GraphQL"

# Load context for a new session
mempalace wake-up

The --mode convos flag is the killer feature for daily Claude Code users. Point it at ~/.claude/projects/ and it backfills your entire conversation history — months of sessions — into the palace. Combined with the auto-save hooks, every future session also lands in the palace, scoped by --wing so each project stays isolated.

For Cursor IDE there’s a separate hooks setup that adds session-start recall and a transcript snapshot before context compaction — that last part matters because Cursor’s compaction is exactly when you lose detail.

Wiring it into Claude Code (MCP)

This is the path most readers will take. MemPalace ships an MCP stdio server with 29 tools covering palace reads/writes, the knowledge graph, cross-wing navigation, drawer management, and agent diaries.

In your Claude Code config:

{
  "mcpServers": {
    "mempalace": {
      "command": "docker",
      "args": ["run", "-i", "--rm", "-v", "mempalace-data:/data", "mempalace"]
    }
  }
}

Or natively, without Docker, if you’ve uv tool install’d it. The auto-save hooks are documented for Claude Code, Codex CLI, and Cursor IDE — wire those up before you start a real project, because backfilling existing JSONL transcripts works (mempalace mine ~/.claude/projects/ --mode convos) but it’s nicer to capture cleanly from day one.

The knowledge graph

This is the under-marketed feature. MemPalace ships a temporal entity-relationship graph with validity windows — backed by local SQLite — that lets you encode relationships between people, projects, and concepts that change over time.

In practice: “Andrew uses Claude Code (valid: 2025-08 onwards)” can later be invalidated by “Andrew migrated to Codex (valid: 2026-06 onwards)” without overwriting the original. Queries respect the temporal window. This is what real long-term memory looks like — not “facts” extracted into a flat KV store, but a graph that knows what was true when.

For multi-agent setups, each specialist agent gets its own wing and its own diary. There’s a mempalace_list_agents tool for runtime discovery, so you don’t have to stuff agent context into the system prompt of every session.

Community reception

Reception on r/LocalLLaMA and r/ClaudeAI over the last two weeks has been unusually positive for a memory tool. Three themes recur:

“Finally, verbatim.” Users who got burned by Mem0 and Supermemory summarizing away the details they wanted later are the loudest fans. The pitch (“we don’t summarize”) lands.
“The benchmarks are reproducible.” Most memory tools publish marketing numbers; MemPalace publishes the exact commands. The R@5 96.6% number is hard to argue with when you can run it yourself in ten minutes.
“The MCP integration is the cleanest in the category.” 29 tools, all documented, all locally served. No hosted-API dependency, no rate limits, no privacy hand-wringing.

The most consistent critique: the palace metaphor takes a session to internalise. New users don’t immediately know whether their content should be a wing, a room, or a drawer. The docs have improved a lot, but expect a 30-minute learning curve before the structure clicks.

Secondary critique: there’s no built-in dedup against your existing Claude Code transcripts when you backfill. If you’ve been using another memory tool that also mined your transcripts, you’ll end up with overlapping content until you point both at the same backend or pick one to drop.

Honest limitations

Embedding model download is ~300 MB on first run. Onboarding offers embeddinggemma-300m (multilingual, recommended) or all-MiniLM-L6-v2 (English, ~30 MB).
No built-in privacy redaction. If your sessions contain secrets, they go into the palace verbatim. The local-first posture makes this safer than cloud tools, but it’s still on you to scrub.
External backends (Qdrant, pgvector) are previews. The Chroma path is what’s battle-tested; treat the others as solid but not yet bulletproof.
No GUI yet. Everything is CLI + MCP. A web UI is on the roadmap; for now, you query via mempalace search or via your agent.
It’s not a replacement for prompt caching. MemPalace is for long-term memory across sessions. For the same-session context-window pressure, you still want provider-native compaction and a compression layer like Headroom.

Compared to the alternatives

Tool	Storage model	Local-first	Open-source	Benchmark transparency
MemPalace	Verbatim	Yes	MIT	Reproducible, R@5 96.6%
Mem0	LLM-summarized facts	Optional cloud	Apache 2.0	Marketing numbers
Supermemory	Embeddings + facts	Cloud-first	Closed core	Published, some reproducible
Zep	Temporal KG + facts	Optional self-host	Apache 2.0	Published, partial
Hindsight	Embeddings	Yes	MIT	Limited
Mastra Memory	Vector + KV	Optional cloud	Open-source	Limited

The differentiation is real: verbatim + local + reproducible benchmarks is a triple that no other major tool in this category currently offers. Whether that combination matters to you depends on whether you trust LLM-summarized “facts” to capture what you’ll want later. After a year of using these tools, my answer is no — and the benchmark numbers back that up.

Should you use MemPalace?

Use it if:

You run daily Claude Code, Codex, or Cursor sessions and want them to remember across runs
You’ve been burned by other memory tools summarizing away the details you needed
You want a local-first tool with no cloud dependency for personal-use memory
You’re building multi-agent setups and need per-agent diaries with shared knowledge graph

Skip it if:

You only need same-session memory (provider-native compaction is enough)
Your stack already standardised on Mem0 / Zep and switching costs are high
You need a hosted SaaS with a polished web UI today

For most Claude Code daily drivers, the install path is: uv tool install mempalace → mempalace mine ~/.claude/projects/ --mode convos → wire up the auto-save hooks → wire up the MCP server. Twenty minutes of work, and your agent now remembers months of context.

FAQ

Q: Does MemPalace require an LLM or API key? No, not for the core path. The 96.6% R@5 raw benchmark is reached using embeddings + semantic search only — zero API calls, no cloud. The LLM rerank pipeline that pushes the number to ≥99% is opt-in and works with any model (Claude Haiku, Sonnet, or minimax-m2.7 via Ollama Cloud).

Q: How is this different from Mem0 or Supermemory? The biggest difference is no summarization. Mem0 and Supermemory extract LLM-generated “facts” from your conversations and discard the original text. MemPalace stores the verbatim text and searches it directly. When you ask “what was that OAuth library I mentioned?”, MemPalace can find it; fact-extraction tools usually can’t.

Q: Will it work with non-Claude agents? Yes. The MCP server speaks JSON-RPC over stdio, so any MCP-aware client works: Claude Code, Gemini CLI, Antigravity, Codex CLI, Cursor, and custom agents. Auto-save hooks are documented for Claude Code, Codex CLI, and Cursor IDE today; other integrations use the MCP tools directly.

Q: What backend should I pick? Start with ChromaDB (default). It’s local, zero-config, and handles single-user palaces well past 1M drawers. Move to Qdrant (REST) or pgvector if you want multi-tenant isolation or you already run that infrastructure. Use sqlite_exact only for benchmarking exact-vector correctness.

Q: How much disk does the palace use? The embedding model is ~300 MB on first install (or ~30 MB if you pick all-MiniLM-L6-v2). After that, each drawer is roughly the size of the raw text plus the embedding vector — typical Claude Code session histories land around 200-500 MB after months of use.

Q: Can I encrypt the palace? There’s no built-in encryption-at-rest, but since everything is local-first and the data lives under a single directory (/data in Docker, or your configured palace root), filesystem-level encryption (FileVault, LUKS, BitLocker) covers it. For multi-user setups, prefer the Qdrant or pgvector backend with namespace isolation.

Q: Does it work with the knowledge graph and verbatim storage in the same query? Yes. The knowledge graph and the drawer index are complementary. A query like “what’s the current OAuth library?” first hits the temporal KG for the latest valid relationship, then optionally pulls supporting drawer context if you need the original conversation.

Q: Is the verbatim approach really better than fact extraction? For retrieval-heavy use cases (debugging, code review, project archaeology) — yes, measurably. The R@5 numbers tell the story: lossless storage + good retrieval beats clever summarization on benchmarks where the question is “what did I actually say?” For pure summary use cases (give me a tldr of last week), summarization tools are arguably easier; MemPalace can also do this via reranking, but it’s not the primary design center.

Bottom line

MemPalace is the strongest open-source AI memory project in the category right now. Verbatim storage, local-first by default, reproducible 96.6% R@5 on LongMemEval, 29 MCP tools, auto-save hooks for the three agents most people actually use — there’s no major box left unchecked.

If you run Claude Code daily and you’ve been irritated by other memory tools losing the details you needed, install it tonight: uv tool install mempalace, point it at ~/.claude/projects/, and check whether the recall matches the benchmarks on your actual data. Twenty minutes will tell you.

→ Try it: uv tool install mempalace (GitHub · Docs · Benchmarks)