Graphify Review: Turn Your Codebase Into a Queryable Graph

TL;DR

Graphify is an open-source AI coding assistant skill that turns any folder — code, SQL schemas, docs, PDFs, images, even videos — into a queryable knowledge graph your AI agent can search instead of grep-walking through files. You type /graphify . in Claude Code, Codex, Cursor, Gemini CLI, GitHub Copilot CLI, Aider, OpenClaw or any of ~20 supported assistants, and you get back three artifacts: an interactive graph.html, a GRAPH_REPORT.md with the surprising connections, and a graph.json you can query for the rest of the session.

The repo is #1 trending on GitHub this week with 67,416 stars (+5,478 in seven days) and is backed by Y Combinator. Built by /safishamsi with help from /claude, /cursoragent, /TheFedaikin, and /jippi.

Key facts:

67,416 GitHub stars, +5,478 this week, #1 trending repo
PyPI package: graphifyy (double-y — there’s a squatter on graphify), CLI command is still graphify
~20 supported AI coding assistants — Claude Code, Codex, OpenCode, Cursor, Gemini CLI, GitHub Copilot CLI, Aider, OpenClaw, Factory Droid, Trae, Hermes, Kimi Code, Kiro, Pi, Devin CLI, Google Antigravity, Amp, Kilo Code, and more
36 tree-sitter grammars plus Salesforce Apex, Terraform/HCL, MCP configs, Office docs, Google Workspace, PDFs, images, video/audio
Confidence-tagged edges — every inferred relationship is labeled EXTRACTED, INFERRED, or AMBIGUOUS
Persistent and queryable — graphify query "<question>" works for the whole session without re-reading files
Y Combinator–backed, MIT-style open source on GitHub

If you’ve ever watched Claude Code burn 40K tokens grepping for “where is auth validated?” — Graphify is the obvious fix.

What Graphify actually does

Most AI coding assistants are stateless re-readers. Ask “how does login work?” and the agent runs Glob, then Grep, then opens five files, then reads each one top to bottom. That’s the same shape every time. It costs tokens, it’s slow, and it misses anything the grep regex doesn’t match — like a relationship that lives in a PDF spec, a Mermaid diagram, or a comment 200 lines below the function name.

Graphify replaces the grep-and-read loop with a one-time extraction pass followed by scoped graph queries for the rest of the session.

The extraction pass walks your repository, runs language-specific extractors (tree-sitter for code, AST parsers for SQL/Terraform, OCR/VLM for images, faster-whisper for video), and emits a graph where:

Nodes are concepts: functions, classes, tables, env vars, MCP servers, design rationales, diagrams, PDF sections.
Edges are relationships: calls, imports, reads_table, documented_by, depends_on, plus design-time edges from comments (# WHY:, # HACK:) and docstrings.
God nodes — the most-connected concepts — get surfaced separately so you immediately see what everything routes through.
Surprising connections are ranked by how unexpected they are (e.g., a Terraform module that references a function three repos away).

Once the graph exists, the skill rewrites your assistant’s behavior so codebase questions hit the graph first. On Claude Code, Gemini CLI, CodeBuddy, Codex, and Kilo Code, PreToolUse hooks intercept search-style tool calls and nudge the agent toward graphify query before it grepwalks. On Cursor it’s a .cursor/rules/graphify.mdc file with alwaysApply: true. On OpenClaw, Aider, and others it’s via persistent instruction files.

Quickstart in 60 seconds

The install is genuinely tiny:

# 1. Install the package (uv is recommended — pipx works too)
uv tool install graphifyy

# 2. Register the skill with your AI assistant
graphify install

# 3. Open your AI assistant and type
/graphify .

That’s it. About 20–60 seconds later (depending on repo size) you get:

graphify-out/
├── graph.html        # open in any browser — click nodes, filter, search
├── GRAPH_REPORT.md   # key concepts, surprising connections, suggested questions
└── graph.json        # the full graph — query it anytime

You can also run graphify export callflow-html to get a readable architecture page with Mermaid call-flow diagrams baked in.

Project-scoped install (instead of writing to your user profile) is graphify install --project. The skill goes under .claude/skills/graphify/SKILL.md (or .agents/skills/graphify/SKILL.md, etc.) and the CLI even prints a git add hint for the files that should be committed.

For agents that need a nudge to use the graph after build, run the platform-specific bind once (graphify claude install, graphify cursor install, graphify codex install, graphify copilot install, graphify gemini install, graphify claw install, graphify aider install, graphify droid install). This writes the per-agent config that tells your assistant to prefer graphify query "<question>" over Read/Glob/Grep for architecture questions. GRAPH_REPORT.md stays available for broad reviews.

Querying the graph

Once the graph exists, the LLM (or you) can query it like a small structured search index:

# Plain question — the skill maps it to graph traversals
graphify query "where is request auth validated?"

# Scoped queries
graphify query --node "User"       # everything connected to the User node
graphify query --edge "calls" --from "handleLogin"

# Inspect god nodes — most-connected concepts
graphify report --god-nodes

# See surprising cross-module connections
graphify report --surprising

The skill exposes the same interface to the agent. A typical Claude Code interaction now looks like:

You: "Where do we revoke a session?"

Claude (with Graphify): "Per the graph, sessions are revoked in
auth/session.py::revoke_session(), which is called by /logout
(handlers/auth.py), the admin force-logout endpoint
(handlers/admin.py), and a TTL cleanup job in jobs/expiry.py.
The revoke writes through to redis_sessions and emits a
session.revoked event picked up by audit/listener.py.
Confidence: EXTRACTED for direct calls, INFERRED for the
event link (matched on event name)."

No grep storm. No 40K of tool output. The agent loaded a few graph slices and answered.

The headline is “knowledge graph from code,” but the multi-modal extractors are what make Graphify hard to clone:

Type	Extensions	Notes
Code	36 tree-sitter grammars: `.py .ts .js .jsx .tsx .mjs .go .rs .java .c .cpp .rb .cs .kt .scala .php .swift .lua .luau .zig .ps1 .ex .exs .m .mm .jl .vue .svelte .astro .groovy .gradle .dart .v .sv .svh .sql .f90 .pas .sh .bash .json .dm`	Plus Salesforce Apex (regex), Terraform/HCL (`[terraform]` extra)
MCP configs	`.mcp.json`, `claude_desktop_config.json`	Extracts server nodes, package refs, env var requirements
Docs	`.md .mdx .qmd .html .txt .rst .yaml .yml`	Headings, links, code blocks become nodes
Office	`.docx .xlsx`	`[office]` extra
Google Workspace	`.gdoc .gsheet .gslides`	Opt-in; needs `gws auth` and `--google-workspace`
PDFs	`.pdf`	Section-level extraction
Images	`.png .jpg .webp .gif`	VLM-described nodes linked to surrounding context
Video/Audio	`.mp4 .mov .mp3 .wav` and more	`[video]` extra (faster-whisper + yt-dlp); YouTube URLs work directly

That means a PDF spec sitting next to your code can contribute nodes that link to actual functions. A Mermaid diagram in ARCHITECTURE.md becomes traversable. A whiteboard photo committed to docs/ becomes nodes. For domains where the why lives outside the code — fintech, healthcare, infra-as-code, games — that’s a massive context win.

A clean detail: the extractor also pulls inline rationale (# NOTE:, # WHY:, # HACK:) and docstrings out as separate nodes linked to the code they explain. So your agent can say “this looks like dead code, but the # WHY: comment three lines above says it covers an iOS 16 bug” instead of cheerfully proposing to delete it.

Confidence tags — the part I really like

Every inferred edge in the graph is tagged:

EXTRACTED — directly observable in the source (a Python import, a SQL JOIN, a function call AST node).
INFERRED — derived from naming, file colocation, or pattern matching (an event name that appears as both an emit string and a listener handler).
AMBIGUOUS — multiple candidate targets; the graph keeps all of them with weights.

That’s a real differentiator. Most “code intelligence” tools quietly mash extracted and inferred edges into one bucket and let the LLM hallucinate as a result. Graphify makes confidence a first-class property of every edge, so the agent can say “definitely calls X, probably emits Y, possibly reads Z” instead of asserting all three.

Community reactions

Sentiment from Reddit and dev.to has been unusually positive for a tool that grew this fast:

r/ClaudeCode: A “My experience with Graphify” thread compares it to code-review-graph and reports Graphify works better on large polyglot codebases thanks to tree-sitter coverage and graph queries, while code-review-graph is sharper for review-focused diff context.
dev.to: A “Graphify + code-review-graph” combo tutorial argued for running both — Graphify for the persistent knowledge graph, code-review-graph for per-PR overlay. It’s one of the reasons the repo’s weekly traffic is so spiky.
knightli.com (May 2026) called it “Claude Code’s biggest limitation, solved” — the limitation being long-running coding sessions degrading as the context window fills with redundant file reads.
graphify.net went live in April 2026 as the marketing page; the project graduated to Y Combinator shortly after.

The recurring praise is the same thing in three sentences: it stops the agent from re-grepping the world, it picks up rationale from docs/diagrams/comments, and the EXTRACTED/INFERRED/AMBIGUOUS tags make the answers trustable.

Honest limitations

The README is unusually candid, which I appreciate.

PyPI naming gotcha. The package is graphifyy (two y’s). Other graphify* packages on PyPI are squatters/unrelated. Use uv tool install graphifyy or pipx install graphifyy.
pip install is fragile on macOS/Windows. The skill resolves Python at runtime from graphify-out/.graphify_python. If pip install lands the module in a different interpreter, you get ModuleNotFoundError. uv tool and pipx isolate the env and avoid this.
Git hooks need a reinstall after upgrades. graphify hook install embeds the interpreter path into the post-commit hook. Re-run it after upgrades or the hook silently fails in GUI git clients and CI runners.
Sequential extraction on OpenClaw and Aider. Parallel subagent dispatch lands on Claude Code, Codex, Trae, Factory Droid, CodeBuddy, and Gemini CLI. First-time builds on OpenClaw/Aider are slower.
Codex needs multi_agent = true under [features] in ~/.codex/config.toml for parallel extraction.
Codex command is $graphify not /graphify. Easy gotcha if you switch between assistants.
PowerShell users run graphify . — the leading slash is a path separator on Windows.
Leiden community detection ([leiden] extra) is Python 3.13–incompatible. Drop to 3.10–3.12 if you need it.
Optional extras pile up. Each backend (PDF, Office, video, Neo4j, FalkorDB, SQL, Postgres, Terraform, Ollama, OpenAI, Gemini, Anthropic, Bedrock, Azure) is its own extra. [all] works but it’s heavy.
It’s an extraction pass, not a watcher. The graph rebuilds on /graphify or the post-commit hook. Between rebuilds the graph can drift from your working tree.

Where it fits in the stack

Graphify isn’t the only “give your AI agent better code context” tool — and the author actually points at the others:

code-review-graph — same author, narrower scope. Builds a per-PR overlay graph for review context. Pairs cleanly with Graphify (persistent project graph + per-PR delta).
Headroom — compresses tool outputs, RAG chunks, and logs before they reach the LLM. Graphify reduces what you ask for; Headroom compresses whatever you still send. They stack.
Serena MCP — IDE-level coding agent skill. Graphify gives Serena a queryable graph instead of a grep loop.
Tree-sitter / ctags — Graphify is more or less “tree-sitter + AST parsers + multi-modal extractors + an LLM-friendly query layer + skill bindings for every coding assistant” wrapped together. If you only want code symbols, classic ctags is enough. If you want PDFs, diagrams, MCP configs, design rationale, and Terraform in the same graph, that’s Graphify.

The unique value isn’t the graph itself. It’s the distribution + skill integration: one graphify install makes ~20 different coding assistants behave like they have a shared semantic index of your project.

FAQ

Does Graphify send my code to a third-party server?

No. The default extractors run locally — tree-sitter, AST parsers, faster-whisper, OCR libs. You can opt into an LLM backend (--backend claude, --backend openai, --backend gemini, --backend bedrock, --backend azure, or --backend ollama for fully local) for richer relationship inference and image/video descriptions, but it’s not required for the core graph.

What’s the difference between `graphify` and `graphifyy` on PyPI?

The official package is graphifyy (double-y). The CLI command is still graphify. Other graphify* packages on PyPI are unrelated/squatters — don’t install them.

How big a repo can Graphify handle?

Reports on r/ClaudeCode put it at “comfortable on 100K LOC, slow but workable on 1M+ LOC with parallel extraction enabled.” The incremental cache and the post-commit hook are the steady-state answer — full rebuilds on giant monorepos can run several minutes, but per-commit deltas are fast.

Does Graphify work with OpenClaw?

Yes. Install with graphify install --platform claw (or the legacy graphify install --platform openclaw), then graphify claw install to register the skill in your project. Parallel extraction on OpenClaw is sequential as of v8 — first builds are slower than Claude Code, but the steady-state experience is the same.

Can the graph be pushed to Neo4j or FalkorDB?

Yes. uv tool install "graphifyy[neo4j]" or uv tool install "graphifyy[falkordb]" adds push support, and the CLI exposes graphify export neo4j / graphify export falkordb for one-shot ingest. Useful if you want to query the graph from BI tools, a data app, or alongside other organizational graphs.

Is Graphify production-ready?

For “AI-coding-assistant context,” yes — that’s its primary use case and the integration surface is the most mature part. For “use the graph as a build-time source of truth for a CI gate,” it’s getting there but check confidence tags before you fail builds on INFERRED edges. The author tags every inferred relationship for exactly this reason.

Verdict

Graphify is the rare AI tool that earns its trending position. It’s a clear win for anyone running Claude Code, Codex, Cursor, or OpenClaw on a non-trivial codebase: the agent stops grep-walking, picks up rationale that lives outside the code, and labels its own confidence. The multi-modal extractors are what set it apart — PDFs, Mermaid diagrams, MCP configs, even videos become first-class nodes alongside your code.

For the 60-second install cost and a graphifyy PyPI typo gotcha, you get back tokens, latency, and a measurably smarter agent on day-one of any new repo. Pair it with Headroom and you’ve got the cleanest “make my AI coding assistant cheaper and smarter” stack currently on GitHub.

Repo: github.com/safishamsi/graphify Install: uv tool install graphifyy && graphify install License: MIT (verify on the repo before commercial use) Backed by: Y Combinator