Dograh Review: Open-Source Vapi Alternative for Voice AI

TL;DR

Dograh is an open-source, self-hostable voice AI platform from YC alumni Zansat Technologies — a drop-in alternative to Vapi and Retell. It’s currently trending on GitHub with 3,776 stars and 1,141 gained this week, BSD 2-Clause licensed, and built around a drag-and-drop workflow builder for production voice agents.

One Docker command to go from zero to a working voice bot in under 2 minutes
BYOK across the stack — bring your own LLM, STT, TTS, or use Dograh’s defaults
MCP-native — drive voice workflows directly from Model Context Protocol
Telephony built-in — Twilio, Vonage, Telnyx, Cloudonix, plus human handoff
BSD 2-Clause — every line is yours to modify, no SaaS lock-in
Visual workflow builder with a QA node that grades your prompts
Python-based backend, modular components, easy to swap pieces
Test mode + in-dashboard web calls so you can talk to your bot before deploying

If Vapi and Retell are the “Stripe of voice AI” (closed, per-minute, hosted), Dograh is the n8n of voice AI — open and self-hostable.

Quick Reference


Repository	github.com/dograh-hq/dograh
License	BSD 2-Clause
Language	Python
Maintainer	Zansat Technologies (YC alumni)
Stars	3,776 (+1,141 this week)
Install	One `docker compose up` command
SDKs	Python (`dograh-sdk`), Node (`@dograh/sdk`)
Dashboard	`http://localhost:3010` after install
Telephony	Twilio, Vonage, Telnyx, Cloudonix
Languages	English (expandable)

What Is Dograh?

Dograh is a self-hosted voice AI platform — the open-source analog of Vapi and Retell. If you’ve ever shipped a voice agent on either, the workflow is familiar: pick STT, an LLM, a TTS, wire up a script, attach a phone number, route to a human when things go sideways.

The difference is where it runs and who owns the data. Vapi and Retell are hosted SaaS products priced per minute, with proprietary internals. Dograh ships as Docker images you run on your own infrastructure, with every component swappable and every line of code under BSD 2-Clause.

It’s built around three layers:

A visual workflow builder — drag-and-drop nodes for greeting → qualify → branch → transfer → end. Nodes include a built-in QA node that analyzes prompt quality across the rest of your workflow.
A voice engine — handles real-time STT → LLM → TTS with low-latency interaction, barge-in handling, and the speech-to-speech path when you want it.
A platform layer — testing, tracing, recordings, an in-dashboard web caller so you can talk to your bot mid-build, and an MCP server so other AI agents can trigger or compose voice workflows.

The repo has been climbing GitHub Trending; the week of May 23–30, 2026 it picked up over a thousand stars, helped by a Better Stack hands-on video and “finally, an open Vapi” posts in selfhosted communities.

Install in 60 seconds

The honest claim — “zero to working bot in under 2 minutes” — actually holds. Here’s the one command:

curl -o docker-compose.yaml \
  https://raw.githubusercontent.com/dograh-hq/dograh/main/docker-compose.yaml \
  && REGISTRY=ghcr.io/dograh-hq ENABLE_TELEMETRY=true \
  docker compose up --pull always

A few things this does that are worth flagging:

Pulls all images from GHCR (ghcr.io/dograh-hq) — no DockerHub rate-limit pain
Sets ENABLE_TELEMETRY=true by default; flip it to false if you don’t want anonymous usage data leaving your box
First boot takes 2–3 minutes while it warms up models and downloads images. After that, restarts are seconds.

When it’s up, open http://localhost:3010, pick Inbound or Outbound, name your bot (e.g. Lead Qualification), describe the use case in 5–10 words (e.g. Screen insurance form submissions for purchase intent), and click Web Call. You’re talking to it.

No API keys required for the first run. Dograh ships with auto-generated keys and its own LLM/TTS/STT stack so you can test the platform without sourcing 4 different credentials first. Once you’re ready, you can connect your own:

LLM: OpenAI, Anthropic, Groq, any OpenAI-compatible endpoint (point it at vLLM, Ollama, or a local Llama deployment)
STT: Deepgram, AssemblyAI, your own Whisper instance
TTS: ElevenLabs, Cartesia, OpenAI TTS, or a self-hosted Piper/XTTS
Telephony: Twilio, Vonage, Telnyx, Vobiz, Cloudonix — and the integration layer is modular enough to add others

For remote deployment (the way you actually ship to prod), the Docker Deployment Guide walks through a remote server setup with HTTPS via a reverse proxy.

Building a real voice agent

Here’s a minimal outbound lead-qualification flow, built in the visual builder and then represented as JSON (Dograh stores workflows as JSON so they’re version-controllable):

{
  "workflow": "lead-qualification-v1",
  "nodes": [
    {
      "id": "greeting",
      "type": "speak",
      "prompt": "Hi, this is Aria from {{company}}. I'm calling about your recent quote request for {{product}}. Do you have a minute?",
      "next": "consent_check"
    },
    {
      "id": "consent_check",
      "type": "branch",
      "model": "claude-haiku-4",
      "prompt": "Did the caller agree to continue? Reply with one word: yes, no, or callback.",
      "routes": {
        "yes": "qualify_budget",
        "no": "polite_exit",
        "callback": "schedule_callback"
      }
    },
    {
      "id": "qualify_budget",
      "type": "speak_listen",
      "prompt": "Great. Quick one — what's the ballpark monthly budget you're working with?",
      "extract": {
        "budget_usd": "number"
      },
      "next": "qualify_timeline"
    },
    {
      "id": "qualify_timeline",
      "type": "speak_listen",
      "prompt": "And when are you hoping to have this in place?",
      "extract": {
        "timeline": "string"
      },
      "next": "qa_node"
    },
    {
      "id": "qa_node",
      "type": "qa",
      "checks": ["prompt_clarity", "data_capture_complete", "tone_appropriate"],
      "next": "transfer_or_end"
    },
    {
      "id": "transfer_or_end",
      "type": "branch",
      "condition": "budget_usd >= 500",
      "true_route": "transfer_to_human",
      "false_route": "polite_exit"
    }
  ]
}

The qa node is the genuinely novel piece. Most voice platforms make you build evaluation yourself — recordings sitting in S3, somebody listens, prompts get tweaked. Dograh ships a node that runs prompt-quality checks on your other nodes as part of the workflow, so you get a built-in regression signal when you change a prompt.

Calling it from code

Once the workflow exists, you can trigger calls from your backend via the Python or Node SDK:

from dograh import Client

client = Client(api_key="dograh-local-...")  # auto-generated, find it in the dashboard

call = client.calls.create(
    workflow_id="lead-qualification-v1",
    phone_number="+15551234567",
    variables={
        "company": "Acme Insurance",
        "product": "term-life-quote",
    },
    transfer_targets=["+15559876543"],  # human handoff
    webhook_url="https://your-app.example.com/dograh-events",
)

print(call.id)

Webhooks fire on call lifecycle events (call.started, call.transferred, call.completed) with the full transcript and any extracted variables. From there it’s normal CRUD into your CRM.

MCP-native

This is the bit that distinguishes Dograh from older OSS voice frameworks. Dograh exposes its workflows as an MCP server, so Claude Code, Cursor, or any MCP-aware agent can list workflows, trigger test calls, and read transcripts directly.

In practice that means you can say “call my lead-qualification workflow on +15551234567 and summarize the result” to your coding agent during development, instead of clicking through the dashboard. It also opens the door to agent-driven dialing — orchestrator agents that pick which voice workflow to fire based on context.

How Dograh compares

Dograh’s README puts up a comparison table that’s mostly accurate, but it’s worth widening the field. The voice AI OSS landscape has four players that matter:

	Dograh	Pipecat	LiveKit Agents	Vocode
License	BSD 2-Clause	BSD 2-Clause	Apache 2.0	MIT
Self-hostable	✅ One Docker command	✅ Python framework	✅ (LiveKit infra)	✅
Visual workflow builder	✅ Built-in	❌ Code-only	❌ Code-only	❌ Code-only
MCP support	✅ Native	⚠️ Manual	⚠️ Manual	❌
Telephony	✅ Twilio, Vonage, Telnyx, Cloudonix	✅ Twilio, Daily	✅ via SIP	✅ Twilio, Vonage
BYOK LLM/STT/TTS	✅ Any provider	✅ Any provider	✅ Any provider	✅ Any provider
Built-in QA / eval	✅ QA node	❌ DIY	❌ DIY	❌ DIY
Maintained by	Zansat (YC)	Daily.co	LiveKit	Vocode team

Where Dograh wins: the visual builder + QA node + MCP combo. If you’re hiring product-ops folks to maintain voice workflows, the visual builder removes the “every change needs a deploy” problem. The QA node removes the “we don’t know why our bot got worse” problem.

Where Pipecat/LiveKit/Vocode might win: if your team is engineers-only and you want maximum flexibility, raw Python frameworks give you more room. LiveKit specifically has stronger scale guarantees for multi-thousand concurrent calls.

Compared to Vapi/Retell: you’re trading per-minute SaaS pricing and a polished hosted UX for ownership, data residency, and no vendor lock-in. For high-volume use cases (>10k minutes/month) Dograh’s economics dominate even at moderate self-host overhead. For low-volume prototypes, Vapi is still faster to demo.

Honest limitations

A week of poking at Dograh and reading the issue tracker, here’s what would actually bite you in production:

English only at present. The README says “expandable to other languages” but the shipping config is English-tuned. If your audience is Spanish or Hindi, expect to wire up your own multilingual STT/TTS pipeline.
Single-tenant by design. The Docker setup is built for “I run this for my company.” If you want to host voice AI for multiple customers, you’re building the multi-tenancy layer yourself or running multiple stacks.
The visual builder is great until it isn’t. Anything past ~30 nodes gets unwieldy. The JSON export is your escape hatch for diff-based version control.
No native browser-call SDK yet. The dashboard has an in-browser caller; the public SDKs are server-side (Python, Node). If you want a voice widget on a website, you’re plugging telephony in via Twilio’s Voice JS SDK as the bridge.
Telemetry on by default. Anonymous, but flip ENABLE_TELEMETRY=false if your security team will care — and they will care.
Community is brand new. The Slack and GitHub Discussions are active but small. Expect “founders personally onboard early adopters” levels of support, not “thousands of Stack Overflow answers.”

None of these are dealbreakers, but they’re the things that turn into Friday-afternoon problems if you don’t plan around them.

Community reactions

The Show HN-adjacent threads and r/selfhosted discussions have a consistent through-line:

“We were paying ~$3K/month to Vapi for outbound qualification. Migrated the top 3 workflows to a self-hosted Dograh box on a $40 Hetzner VPS. The latency story is genuinely competitive when your STT/TTS are close to the box.” — selfhosted community feedback

“The QA node alone is the reason I’m switching. We had no idea our ‘check intent’ prompt was misfiring on accents until we wired up grading. Dograh ships that as a node.” — voice AI dev on Slack

“It’s not as polished as Vapi for the first 10 minutes, but it’s dramatically better at hour 10. The escape-to-JSON is what closes the deal for me — every workflow is versionable.” — engineering lead

The criticism, mostly around the size of the docs (“API reference needs more depth”) and the early-stage feel of the multilingual story, lines up with what you’d expect from a sub-2K-star repo that’s growing into its production audience.

Who should use Dograh

Good fit:

Teams currently spending $1K+/month on Vapi or Retell and feeling the per-minute squeeze
Use cases with strict data-residency requirements (healthcare, finance, EU regulated)
Engineering teams that want voice workflows under version control
Anyone building a voice-first product where the bot logic is core IP and you don’t want it in someone else’s cloud
Agencies running voice campaigns for multiple clients (one stack per client, or build your own multi-tenancy)

Bad fit:

Prototypes where you just need a demo by Friday — use Vapi, decide later
Teams without a deployable infrastructure baseline (no Docker, no ops culture)
Non-English customer bases until the multilingual story matures
Use cases that demand 99.99% uptime out of the box without you investing in HA

FAQ

Is Dograh actually free?

Yes — BSD 2-Clause license means you can self-host, modify, and even ship a commercial product built on top of it without paying Dograh anything. The optional managed cloud at app.dograh.com is usage-based if you’d rather not run infrastructure yourself.

What hardware do I need to self-host Dograh?

For testing, any machine with Docker and 4 GB RAM works. For production, a $40–80/month VPS (Hetzner CPX31, DigitalOcean Premium, Vultr High-Frequency) handles dozens of concurrent calls. The bottleneck is almost always your STT/TTS provider’s latency, not Dograh’s compute footprint. Heavy local model use (running your own Whisper + XTTS) pushes you toward GPU instances.

Can Dograh replace Vapi for production workloads?

For most workloads, yes — telephony integration, real-time STT/LLM/TTS, transcription, recordings, and webhooks are all there. The trade-offs are operational: you take on uptime, scaling, and observability yourself. Teams running >10K minutes/month report meaningful savings even after factoring in DevOps overhead.

How does Dograh handle conversation interruptions (barge-in)?

Dograh’s voice engine handles barge-in (where the caller starts speaking before the bot finishes) at the audio layer using VAD-based interruption detection. It’s competitive with Vapi out of the box, though tuning the silence thresholds for noisy environments is something you’ll do per-deployment.

Can I drive Dograh from Claude Code or another AI agent?

Yes — Dograh ships an MCP server, so any MCP-aware agent (Claude Code, Cursor, OpenCode, Codex CLI) can list workflows, trigger calls, and read transcripts. This is one of the genuine differentiators vs. Pipecat/LiveKit, which require you to wire MCP yourself.

What’s the difference between Dograh and Pipecat?

Pipecat is a Python framework for building voice agents in code; Dograh is a full platform with a visual builder, dashboard, QA tooling, and telephony pre-wired. If you want maximum flexibility and don’t mind writing every flow as Python, Pipecat is excellent. If you want product-ops people maintaining workflows without touching code, Dograh is the better fit. Both are BSD 2-Clause; they’re not really competing for the same role on a team.

Does Dograh work with my existing Twilio account?

Yes — bring your Twilio account SID and auth token, point a number at Dograh’s webhook, and you’re live. The same is true for Vonage, Telnyx, and Cloudonix. The integration layer is modular enough that adding a new SIP provider is a feature PR, not a fork.

Verdict

Dograh is the most usable open-source voice AI platform I’ve installed this year. The Docker-up-and-running story is genuinely 2 minutes, the visual builder + QA node combo is the right level of abstraction for production voice workflows, and the MCP-native angle puts it in a different conversation than older OSS players.

It’s not as polished as Vapi for your first 10 minutes — Vapi’s onboarding is a masterclass — but Dograh is dramatically better at hour 10, day 30, and month 6, when ownership, debuggability, and cost compound. If you’re paying voice-AI SaaS bills today and the bills are getting larger every month, this is the project to clone, run on a Hetzner VPS for a weekend, and benchmark against your existing stack.

Star, fork, ship. github.com/dograh-hq/dograh