TL;DR
Supermemory is the memory and context layer for AI that just hit GitHub Trending this week with 25,615 stars and 2,740 new in seven days. It is currently #1 on all three major AI memory benchmarks — LongMemEval, LoCoMo, and ConvoMem — and ships an experimental agent-swarm variant that scored ~99% on LongMemEval_s, a 500-question conversational memory benchmark from ICLR 2025. Key facts:
- Triple-crown benchmark holder: 81.6% on LongMemEval (#1), #1 on LoCoMo (single-hop, multi-hop, temporal, adversarial recall), and #1 on Salesforce’s ConvoMem (personalization)
- Memory ≠ RAG: Supermemory does both — extracts and tracks facts about users over time (memory) AND retrieves document chunks (RAG) in a single hybrid query
- Auto-forgetting: temporary facts like “I have an exam tomorrow” expire after the date passes; contradictions like “I just moved to SF” automatically supersede “I live in NYC”
- One API, full stack: memory engine, user profiles, hybrid search, file processing (PDF/image/video/code), and real-time connectors for Google Drive, Gmail, Notion, OneDrive, GitHub
- ~50 ms profile calls — one request returns static facts (preferences, role) + dynamic context (recent activity) ready to inject into your system prompt
- MCP server out of the box — works with Claude Desktop, Cursor, Windsurf, VS Code, Claude Code, OpenCode, OpenClaw, and Hermes via OAuth or API key
- Founded by 20-year-old Dhravya Shah, backed by research lab status — the team publishes Supermemory Research papers and open-sourced MemoryBench for reproducible head-to-head comparisons with Mem0, Zep, and others
- No vector DB config, no embedding pipelines, no chunking strategy decisions — drop in
client.add()andclient.profile()and ship
If you’ve been hand-rolling vector DB plumbing every time you bolt memory onto an agent, this is the first SOTA-on-paper, drop-in-on-code memory layer that actually clears the “is this just another RAG wrapper?” bar.
Quick Reference
| Field | Value |
|---|---|
| Repo | supermemoryai/supermemory |
| Org | Supermemory (research lab) — founder Dhravya Shah |
| Stars | 25,615 (+2,740 this week) |
| Language | TypeScript (engine), Python + Node SDKs |
| License | Open source (engine + plugins) |
| Install (Node) | npm install supermemory |
| Install (Python) | pip install supermemory |
| MCP install | npx -y install-mcp@latest https://mcp.supermemory.ai/mcp --client claude --oauth=yes |
| Dashboard | console.supermemory.ai |
| Docs | supermemory.ai/docs |
| Benchmark | 81.6% LongMemEval (#1), ~99% LongMemEval_s (agent-swarm variant) |
What It Is
Supermemory is a memory and context engine that sits between your AI application and whatever model you call. Instead of stuffing every past conversation into a context window (expensive) or building your own RAG stack from Pinecone + LangChain + a chunking heuristic (slow, brittle), you call client.add() with text and client.profile() with a containerTag (typically the user id), and Supermemory:
- Extracts facts from the text — preferences, stable identity facts, ongoing projects, recent events.
- Builds a user profile with two layers:
static(long-term facts: “Senior engineer at Acme, prefers Vim”) anddynamic(current context: “Working on auth migration, debugging rate limits”). - Resolves contradictions over time — when you say “I just moved to SF,” the prior “I live in NYC” fact is automatically superseded, not appended.
- Forgets temporary facts when they expire — “I have an exam tomorrow” doesn’t follow you into a conversation three weeks later.
- Serves both memory and RAG in a single hybrid query —
client.search.memories({ q: "how do I deploy?" })returns both relevant deployment docs (RAG) and the user’s deploy preferences (memory).
Critically, Supermemory frames this as the difference between memory and RAG. RAG retrieves document chunks — stateless, the same results for everyone. Memory extracts and tracks facts about a specific user over time. Most “AI memory” tools collapse these into the same vector search; Supermemory keeps them as separate stores with a single query interface.
Why It’s Trending NOW
Three things converged in late May / early June 2026:
1. The 99% LongMemEval_s result (March 2026) — Supermemory’s experimental agent-swarm variant cleared ~99% accuracy on the ICLR 2025 benchmark, the closest anyone has come to “solved” on long-term conversational memory. That post went viral on AI Twitter and HN, but the production engine result (81.6%) was the more durable headline because it’s what the SDK actually ships.
2. Scira AI’s public switch from Mem0 — a documented case study where Scira moved off Mem0 to Supermemory citing latency, unreliable indexing, and context recall gaps. That gave the project its first big “X migrated to Supermemory” social-proof moment, which independent reviewers like atlan.com picked up.
3. MCP plugin proliferation — Supermemory shipped official MCP plugins for Claude Code, OpenCode, OpenClaw, and Hermes in May, plus a one-line npx install-mcp for Claude Desktop, Cursor, Windsurf, and VS Code. This is the same install-everywhere strategy that made tools like Honcho and AgentMemory discoverable inside coding agents, but Supermemory’s MCP install is shorter and OAuth-first.
The combination — SOTA benchmark, public migration story, instant install everywhere — is what’s driving 2,740 stars in a week.
Key Features (With Code)
1. One-line memory add
import Supermemory from "supermemory";
const client = new Supermemory();
await client.add({
content: "User loves TypeScript and prefers functional patterns",
containerTag: "user_123",
});
That’s the whole write path. Supermemory parses the text, extracts facts, decides whether they’re static or dynamic, and persists them under the containerTag namespace.
2. Profile + search in one call (~50 ms)
const { profile, searchResults } = await client.profile({
containerTag: "user_123",
q: "What programming style does the user prefer?",
});
// profile.static → ["Loves TypeScript", "Prefers functional patterns"]
// profile.dynamic → ["Working on API integration"]
// searchResults → Relevant memories ranked by similarity
The profile() endpoint is the killer feature. It returns the user profile and runs a search query in a single round trip. Drop it into your system prompt at the start of every turn and your agent has personalized context for ~50 ms of latency.
3. Hybrid search (RAG + memory together)
const results = await client.search.memories({
q: "how do I deploy?",
containerTag: "user_123",
searchMode: "hybrid", // default
});
// Returns deployment docs (RAG) + user's deploy preferences (memory)
searchMode accepts "hybrid" (default), "memories", or "documents". Most production use cases want hybrid — you almost always want the docs and the user’s preferences in the same query.
4. Drop-in wrappers for every major framework
// Vercel AI SDK
import { withSupermemory } from "@supermemory/tools/ai-sdk";
const model = withSupermemory(openai("gpt-4o"), {
containerTag: "user_123",
customId: "conv-1",
});
// Mastra
import { withSupermemory } from "@supermemory/tools/mastra";
const agent = new Agent(withSupermemory(config, "user-123", { mode: "full" }));
Wrappers exist for Vercel AI SDK, LangChain, LangGraph, OpenAI Agents SDK, Mastra, Agno, Claude Memory Tool, and n8n. If you’ve already written your agent with one of these, adding Supermemory is a single import.
5. MCP server for coding agents
{
"mcpServers": {
"supermemory": {
"url": "https://mcp.supermemory.ai/mcp"
}
}
}
Once the MCP server is connected, your AI gets three tools: memory (save/forget), recall (search by query), and context (inject full profile at conversation start — type /context in Cursor or Claude Code).
6. Connectors (real-time sync)
Auto-sync from Google Drive, Gmail, Notion, OneDrive, GitHub, and a generic web crawler. Documents are automatically processed, chunked, and made searchable, with webhooks firing on updates. This is what makes Supermemory viable as a “company brain” instead of just a per-user memory store.
Architecture & How It Works
Conceptually, Supermemory is five things stacked under one query interface:
Your app / AI tool
↓
Supermemory
│
├── Memory Engine Extracts facts, tracks updates, resolves
│ contradictions, auto-forgets expired info
├── User Profiles Static facts + dynamic context, always fresh
├── Hybrid Search RAG + Memory in one query
├── Connectors Real-time sync from Drive, Gmail, Notion, GitHub
└── File Processing PDFs, images (OCR), videos (transcription),
code (AST-aware chunking)
The memory engine is the hard part. It doesn’t just store the text — it runs an extraction pass to pull facts, classify them as static vs. dynamic, detect contradictions with prior facts, and assign expiry where the text implies one (“tomorrow,” “this week,” explicit dates). On read, the engine produces a profile object rather than raw chunks.
For file processing, Supermemory advertises AST-aware code chunking — meaningful in practice because it’s the difference between chunking a 2,000-line TypeScript file by paragraph (useless for code search) versus by function/class/module (useful).
The agent-swarm variant that hit ~99% on LongMemEval_s is not the production engine — the team is explicit about this. It’s a research prototype that uses multiple coordinated agents to solve a fixed benchmark; the shipped engine is the 81.6% number, which is still SOTA on the same benchmark.
Real-World Use Cases
- Coding agents that remember your stack — Cursor/Claude Code agents that recall your TypeScript style, deployment targets, and ongoing migration without re-prompting.
- Customer support agents — per-user profile loaded as system prompt so the agent knows the customer’s plan, recent tickets, and preferences.
- Personal “second brain” apps — the consumer-facing app.supermemory.ai with browser extension learns from every conversation across LLMs.
- Knowledge-base RAG with personalization — internal docs (RAG) + employee context (memory) returned in one query.
- Multi-tenant SaaS —
containerTagis effectively a tenant id, so one account serves arbitrarily many isolated memory stores.
First Impressions from the Community
atlan.com (April 8, 2026) ran the head-to-head and noted: “Supermemory claims 85.4% on LongMemEval (GPT-4o), 36 points above Mem0’s temporal sub-task score, and resolved the three failures that drove Scira AI’s documented switch from Mem0: latency, unreliable indexing, and context recall gaps.”
aihola.com (March 22, 2026) covered the agent-swarm 99% result and emphasized that the team called out the distinction between the research prototype and the production engine — “to be absolutely clear upfront: this is not our main production Supermemory engine.” That kind of preemptive scope-setting is rare in benchmark posts and earned the project goodwill.
Across HN and Reddit AI threads, the recurring positive signals are:
- “Finally, an AI memory tool that doesn’t make me run my own vector DB.”
- “The
profile()call is the API I’ve been writing by hand.” - “MCP install was actually one command. Worked first try with Claude Code.”
The recurring critical signals:
- Pricing on the hosted tier is opaque at the upper end — fine for hobby projects, but enterprise tiers require a sales call.
- The agent-swarm result blurs into the production marketing in some recaps even though the team itself was clear. Read the fine print on the 99% number.
- Some users report the auto-forgetting being slightly too aggressive in test workloads — easy to miss a fact you thought was permanent.
Getting Started
Option A — Hosted (fastest)
# Get an API key from console.supermemory.ai
export SUPERMEMORY_API_KEY=sm_...
# Node
npm install supermemory
# Python
pip install supermemory
from supermemory import Supermemory
client = Supermemory()
client.add(
content="User loves TypeScript and prefers functional patterns",
container_tag="user_123",
)
result = client.profile(container_tag="user_123", q="programming style")
print(result.profile.static) # Long-term facts
print(result.profile.dynamic) # Recent context
Option B — MCP into Claude Code / Cursor / Claude Desktop
npx -y install-mcp@latest https://mcp.supermemory.ai/mcp \
--client claude --oauth=yes
Replace claude with cursor, windsurf, vscode, etc. Once connected, type /context in your coding agent to inject your profile.
Option C — Self-host the open-source engine
git clone https://github.com/supermemoryai/supermemory
cd supermemory
# Follow repo README for self-hosted setup (Docker compose + Postgres)
The engine, MCP server, and plugins are all open source. For most teams the hosted API is the right starting point — the value is in not running your own vector infra.
Who Should Use This (and Who Shouldn’t)
Use Supermemory if:
- You’re building an AI app where the user expects the agent to remember them across sessions.
- You’ve already written
addMemory()andgetProfile()by hand and want to delete that code. - You need RAG and memory in the same query without writing two retrieval paths.
- You’re shipping an MCP-enabled coding agent and want users to keep their context.
- You want a benchmark-backed memory layer where you can actually point at the leaderboard.
Skip it if:
- You’re a hobbyist who just wants
chat.history.push(message)— you don’t need this. - You need to keep all data fully on-prem with no external dependency; self-hosting works but the hosted API is the smoother path.
- You’re on a strict zero-cost plan and your memory volume exceeds the free tier — sales-call pricing on enterprise tiers is a real friction point.
- You’ve already standardized on Mem0, Zep, or Honcho and the switching cost beats the benchmark delta for your workload.
Comparison with Alternatives
| Tool | SOTA Benchmark? | MCP? | RAG + Memory in one query? | Hosted API |
|---|---|---|---|---|
| Supermemory | ✅ #1 on LongMemEval, LoCoMo, ConvoMem | ✅ | ✅ Hybrid mode default | ✅ |
| Mem0 | Strong, behind Supermemory on temporal sub-tasks | ⚠️ Community | ❌ Memory-only | ✅ |
| Zep | Strong on long-context | ⚠️ Community | ⚠️ Separate APIs | ✅ |
| Honcho | Not benchmarked head-to-head | ⚠️ Community | ❌ Memory-focused | ✅ |
| AgentMemory | Not benchmarked head-to-head | ✅ Coding-agent focused | ❌ Memory-only | ❌ |
| Roll your own (Pinecone + LangChain) | N/A | ❌ DIY | DIY | DIY |
The honest summary: Supermemory wins on benchmark, wins on having both RAG and memory in one query, and wins on official MCP everywhere. Mem0 and Zep are mature competitors with their own strengths; Honcho and AgentMemory are more focused tools we’ve reviewed before that may fit narrower use cases better.
FAQ
Is Supermemory just RAG with a wrapper?
No. RAG retrieves document chunks — stateless, returns the same results for the same query regardless of who’s asking. Supermemory’s memory engine extracts and tracks facts about individual users over time, resolves contradictions, and auto-expires temporary facts. The hybrid search mode runs both in one query, but the memory side is genuinely different from RAG.
Can I self-host the whole thing?
Yes — the engine, MCP server, and plugins are all open source on GitHub. The hosted API at supermemory.ai is the recommended starting point, but for compliance-sensitive deployments self-hosting via Docker compose + Postgres is supported.
How does the 99% LongMemEval_s number relate to the production engine?
It doesn’t, directly. The 99% result is from an experimental agent-swarm research variant — coordinated multi-agent system tuned for that specific benchmark. The production SDK ships the 81.6% engine, which is still #1 on the official leaderboard. The team has been explicit about the distinction.
What’s the MCP install line again?
npx -y install-mcp@latest https://mcp.supermemory.ai/mcp --client claude --oauth=yes
Replace claude with cursor, windsurf, vscode, openclaw, etc. OAuth flow handles the API key automatically.
Does it work with the Vercel AI SDK and LangChain?
Yes. There are official drop-in wrappers for Vercel AI SDK, LangChain, LangGraph, OpenAI Agents SDK, Mastra, Agno, Claude Memory Tool, and n8n. The pattern is withSupermemory(yourModelOrAgent, { containerTag }).
How is it different from Honcho or AgentMemory?
Honcho is focused on social-cognition modeling of users (theory-of-mind style), AgentMemory is focused specifically on coding agents and IDEs. Supermemory is the broadest of the three — memory + RAG + connectors + file processing as one API — and the only one currently topping the public memory benchmarks. For a coding-agent-only deployment, AgentMemory may still be the right fit; for general-purpose AI app memory, Supermemory is the strongest current default.
Can I use Supermemory inside OpenClaw?
Yes — there’s a dedicated openclaw-supermemory plugin that wires the MCP server into OpenClaw’s plugin system. Install with the standard MCP install command and use --client openclaw.
Bottom line: Supermemory is the first AI memory layer that’s both genuinely SOTA on the public benchmarks and easy enough to drop into an existing app in under five minutes. If you’re building anything where “the AI should remember me” is part of the product, this is now the default to evaluate first.