Local Deep Research Review: 95% SimpleQA, Self-Hosted

TL;DR

Local Deep Research (LDR) is an open-source AI research assistant from LearningCircuit that does what ChatGPT’s “Deep Research” and Perplexity Pro do — but on your own hardware, against your own LLM, with your own search backend, and with everything stored in an AES-256 encrypted SQLite database that even the server admin can’t read.

Key facts as of early May 2026:

~5,950 GitHub stars, ~1,100 added this week — currently on GitHub’s weekly Python trending list
~95% accuracy on SimpleQA (preliminary, GPT-4.1-mini + SearXNG + focused-iteration strategy) — broadly comparable to closed-source deep research products
Apache-2.0 licensed, packaged on PyPI as local-deep-research, with signed Docker images on Docker Hub
20+ research strategies including a new langgraph-agent mode where the LLM decides which engines to use and when to synthesize
10+ search engines out of the box: arXiv, PubMed, Semantic Scholar, Wikipedia, SearXNG, GitHub, Wayback Machine, The Guardian, Wikinews, plus Tavily, Google (SerpAPI), and Brave as paid options
Any LLM: Ollama, llama.cpp, LM Studio, vLLM locally; OpenAI, Anthropic, Google, Mistral via API
SQLCipher per-user encrypted databases, no telemetry, no analytics — cosign verify on the Docker image will pass
MCP server for Claude Desktop / Claude Code so a coding agent can delegate research tasks to it
Honest caveat: the 95% number is preliminary on a single benchmark with a strong cloud model — local 27B-class models land in a noticeably different place, and the new LangGraph agent strategy is explicitly labeled “early results”

If you’ve ever wanted Perplexity Pro or OpenAI Deep Research without sending your queries to a third party, LDR is the closest open-source alternative shipping today.

Why LDR is showing up everywhere

Three reasons it’s trending hard right now.

The SimpleQA result. SimpleQA is OpenAI’s open-domain factuality benchmark — short, fact-seekable questions with a single correct answer. Hitting ~95% with a research loop is the “Perplexity-class” threshold, and LDR gets there with GPT-4.1-mini (a small, cheap model) plus SearXNG. That suggests the architecture is doing real work, not just memorizing the dataset.

The timing. OpenAI Deep Research, Anthropic Research, Perplexity Deep Research, and Google Deep Research all shipped inside a 12-month window. Self-hosters have been asking “where’s the open one?” since Perplexity Pro Search launched. LDR is the first credible answer that runs end-to-end on a single 3090.

The privacy story holds up. Plenty of “private” AI tools quietly phone home for analytics. LDR’s README is explicit: no telemetry, no analytics, no crash reporting. Docker images are signed with Cosign, include SLSA provenance attestations, and ship with SBOMs. Per-user databases are SQLCipher AES-256 with no password recovery — drop the password, drop the data.

Install in three minutes

Docker Compose is the fastest path — it wires up Ollama, SearXNG, and LDR in one shot.

CPU-only (macOS, Windows, Linux):

curl -O https://raw.githubusercontent.com/LearningCircuit/local-deep-research/main/docker-compose.yml
docker compose up -d

NVIDIA GPU on Linux:

curl -O https://raw.githubusercontent.com/LearningCircuit/local-deep-research/main/docker-compose.yml
curl -O https://raw.githubusercontent.com/LearningCircuit/local-deep-research/main/docker-compose.gpu.override.yml
docker compose -f docker-compose.yml -f docker-compose.gpu.override.yml up -d

After ~30 seconds, open http://localhost:5000. First-run setup creates your encrypted user database and prompts for a model.

Manual three-container path if you want each piece explicit:

docker run -d -p 11434:11434 --name ollama ollama/ollama
docker exec ollama ollama pull gpt-oss:20b
docker run -d -p 8080:8080 --name searxng searxng/searxng
docker run -d -p 5000:5000 --network host \
  --name local-deep-research \
  --volume "deep-research:/data" \
  -e LDR_DATA_DIR=/data \
  localdeepresearch/local-deep-research

Or skip Docker entirely:

pip install local-deep-research
ldr  # web UI at http://localhost:5000

The PyPI package ships SQLCipher pre-built wheels — no C toolchain needed. PDF export on Windows still wants Pango installed separately.

Verify the Docker image before any production-adjacent run:
cosign verify localdeepresearch/local-deep-research:latest

What it actually does end-to-end

The mental model is straightforward:

You ask a question — anything from “what is the latest on FDA approval for X” to “compile a 30-source literature review on Y.”
LDR picks (or you pick) a research strategy. There are ~20 of them, ranging from quick-summary (~30 seconds, web only) to focused-iteration (the SimpleQA-winning one) to the new langgraph-agent mode (LLM picks engines on the fly).
The strategy issues sub-queries against the configured search engines — say SearXNG + arXiv + PubMed + your own indexed PDFs.
Each result is scraped, chunked, and fed back to the LLM with citations.
Sources you found get downloaded into your encrypted local library, indexed and embedded for next time.
You get a Markdown / PDF report with proper citations and a research history entry you can re-open later.

The library piece is what quietly makes LDR more useful than “Ollama plus a search tool.” Today’s session on “GLP-1 mechanism of action” puts 12 PubMed PDFs into your encrypted library; tomorrow’s session on “GLP-1 cardiovascular outcomes” can search both the live web and yesterday’s papers in the same query.

flowchart LR
    R[Research] --> D[Download Sources]
    D --> L[(Library)]
    L --> I[Index & Embed]
    I --> S[Search Your Docs]
    S -.-> R

A real Python API session

LDR ships an authenticated Python client. The simplest possible end-to-end script:

from local_deep_research.api import LDRClient, quick_query

# Option A: one-shot
summary = quick_query("alice", "s3cret", "What is quantum computing?")
print(summary)

# Option B: client, multiple operations
client = LDRClient()
client.login("alice", "s3cret")
result = client.quick_research(
    "What are the latest advances in quantum computing?"
)
print(result["summary"])

That quick_research call returns a dict with summary, findings, sources, and report_path, plus a research history ID you can re-open in the web UI later.

If you have an existing knowledge base — say, a Chroma or FAISS vector store of your company’s docs — you can hand it to LDR as a first-class search engine:

from local_deep_research.api import quick_summary

result = quick_summary(
    query="What are our deployment procedures?",
    retrievers={"company_kb": your_langchain_retriever},
    search_tool="company_kb",
)

This works with any LangChain-compatible retriever — FAISS, Chroma, Pinecone, Weaviate, Elasticsearch — which means you can plug LDR on top of an existing RAG stack without rewriting your indexing pipeline. You get the deep-research orchestration for free.

The repo ships ready-to-use HTTP API examples under examples/api_usage/http/ that handle automatic user creation, CSRF, and result polling — useful if you’re calling LDR from Node, Go, or a shell script. The web UI and HTTP API share routes, so you do need a CSRF token dance; copy the examples instead of reinventing the polling loop.

MCP server: hand it to Claude Code

This is the integration that’s quietly the biggest deal for Claude Code and Claude Desktop users. LDR ships an MCP (Model Context Protocol) server, so you can register it as a tool and let Claude delegate deep research instead of trying to do it inline.

pip install "local-deep-research[mcp]"

Then in claude_desktop_config.json:

{
  "mcpServers": {
    "local-deep-research": {
      "command": "ldr-mcp",
      "env": {
        "LDR_LLM_PROVIDER": "openai",
        "LDR_LLM_OPENAI_API_KEY": "sk-..."
      }
    }
  }
}

Now when you ask Claude Code to “research the current state of WebGPU adoption,” it can route the long-tail tool calls to LDR running locally — and LDR will burn through SearXNG + arXiv + Wikipedia in parallel without filling Claude’s context window with raw HTML.

Security note from the maintainers: the MCP server is for local STDIO use only. There’s no built-in auth or rate limiting. Don’t expose it over a network without putting your own gateway in front.

Picking a model: use the community benchmarks

The single most useful link buried in the README is the LDR Benchmarks dataset on Hugging Face. Community contributors run LDR against SimpleQA with different models, search engines, and strategies, then upload the results.

Before you pull a 27B-parameter model that’s going to sit on your SSD for the next month, this is where you check whether it actually works for deep research. Some 30B-class models punch well above their weight; some name-brand 70B models surprisingly fall over because they can’t reliably emit JSON-formatted tool calls under the strategy’s instructions.

Practical heuristics from the published runs:

GPT-4.1-mini + SearXNG + focused-iteration: the published ~95% SimpleQA result. This is the “I just want it to work” baseline if you’re okay with cloud.
Local 20–30B models: land in the 70–85% range on SimpleQA depending on quantization and search engine. Still very useful, much cheaper, no data leaves your machine.
Anything below ~13B: works but expect rough edges on multi-hop questions.

Honest limitations

A few things to know before you commit:

The 95% number is on a single benchmark. SimpleQA is short factual questions. LDR’s performance on long-form synthesis (“write me a 30-page literature review”) is qualitatively good but not benchmarked the same way. Don’t generalize a single number into “as good as Perplexity Pro for everything.”
Local models need real hardware. A 3090 is the floor for the 20B-class models the team tests with. On an M-series Mac with 16 GB unified memory you’ll be living on the edge of memory pressure if you also run SearXNG locally.
langgraph-agent is early. The new agentic strategy that picks engines on the fly is explicitly marked “early results.” It’s adaptive and finds more sources, but it’s not (yet) the default for a reason.
Some sites block honest scrapers. LDR respects robots.txt and identifies itself, which means a small percentage of pages won’t fetch. The maintainers consider this the right trade-off; if you need stealth scraping, you need a different tool.
No password recovery. This is a security feature, not a bug — but it bites people. Back up your encrypted database file, or set LDR_BOOTSTRAP_ALLOW_UNENCRYPTED=true if you genuinely don’t need encryption (homelab single-user case).
PDF export on Windows is fiddly. WeasyPrint depends on Pango, which is not pip-installable. Markdown export works everywhere; PDF needs a one-time native dep install on Windows.

Community reactions

Recurring themes from GitHub issues, r/LocalDeepResearch, and the project Discord:

“The encrypted-by-default story is what convinced me.” For people coming off Perplexity or ChatGPT Deep Research, data ownership beats the accuracy number as the clincher.
“The library accumulating across sessions is the killer feature.” It’s the real differentiator from a one-shot search-and-summarize agent.
“20+ strategies is too many.” Most people land on quick-summary for chat-style questions, focused-iteration for benchmark-shaped questions, langgraph-agent when exploring.
“Adding SearXNG is the biggest single quality jump.” Reportedly bigger than going up two parameter classes in the model.

Where it fits — and where it doesn’t

Use LDR when:

You want deep research over private data plus the live web in the same query.
You’re building an internal research tool and can’t ship queries to OpenAI/Anthropic for compliance reasons.
You already run Ollama or llama.cpp and want to put a real workflow on top.
You’re a Claude Code or Claude Desktop user who wants research delegated via MCP instead of stuffing search results into context.
You want a research knowledge base that compounds over time instead of starting from scratch every query.

Skip LDR when:

You don’t have a 3090-class GPU or you’re unwilling to use cloud APIs — and you wanted a fully local experience. (You can still run it pointed at OpenAI, but at that point Perplexity is cheaper than the engineering time.)
You need stealth scraping of sites that block honest crawlers.
You want a single-binary CLI with zero infrastructure. LDR is a web app + Docker stack; that’s the trade-off for the multi-user encrypted database story.

FAQ

Q: How does LDR compare to Perplexity Pro or OpenAI Deep Research?

For factual questions on SimpleQA, the published numbers are roughly comparable when LDR is configured with GPT-4.1-mini + SearXNG. The differentiators run the other direction: LDR gives you full source access, an encrypted local library, no usage caps, and the ability to point it at private documents — none of which closed-source competitors offer. The trade-off is you operate the infrastructure.

Q: Can I run it 100% offline?

Yes, with caveats. Ollama or llama.cpp gives you the LLM. SearXNG running locally still needs upstream search engines for live web data — so “fully offline” really means “live web is off-limits.” If you’ve populated your library with PDFs and run searches scoped to local_documents, it’s genuinely offline.

Q: What’s the difference between the research strategies?

quick-summary does one or two search rounds and returns a paragraph. detailed-research does multiple rounds with structured findings. report-generation produces a long-form report with sections and a TOC. focused-iteration (the SimpleQA-winning one) iterates until it converges on a confident answer. langgraph-agent is the new one where the LLM picks search engines per query. Start with quick-summary for chat-shaped questions, escalate from there.

Q: How does it handle citations and hallucination?

Every claim in a generated report is tied back to a source URL or document ID, and the Journal Quality System automatically flags predatory or low-reputation sources. It’s not bulletproof — LLMs can still misattribute facts — but the citation surface is real and clickable, not made up.

Q: Is the data really encrypted at rest?

Yes. Each user gets their own SQLCipher database (AES-256), and there’s no password recovery path. In-process credentials are held in memory while you’re logged in, which is the same trade-off password managers and browsers make. If an attacker has memory-read access on your box, encryption-at-rest is not your line of defense; if your laptop is stolen powered-off, your data is unreadable.

Q: How does this play with andrew.ooo’s existing stack?

Pretty cleanly. If you’re already running OpenClaw or Claude Code, wire LDR in via MCP and your coding agent can delegate research instead of paying tokens to read raw web pages. If you’re running serena or any other MCP-aware tooling, the same model applies — LDR is one of the cleanest “research as a tool” MCP servers shipping today.

Verdict

LDR is the first open-source deep-research project where the architecture — encrypted per-user DBs, signed images, no telemetry, MCP integration, library-that-compounds — feels as carefully thought through as the benchmark number. The 95% SimpleQA result will get the headlines, but the part that will make you keep using it is that every research session leaves your local knowledge base measurably better.

If you’re a self-hoster who’s been waiting for “Perplexity, but mine,” this is the first one I’d actually recommend installing this week. Pull it down, point it at SearXNG, and run one real research question against it — that’s the single best 10-minute investment in your local AI stack right now.

Repository: github.com/LearningCircuit/local-deep-research License: Apache-2.0 Docs: Installation guide · Architecture · Benchmarks