Manifest Review: Smart LLM Router That Cuts AI Costs 70%

TL;DR

Manifest is an open-source smart model router for personal AI agents. It sits between your agent and your LLM providers, scores each request with a 23-dimension algorithm in under 2ms, and routes it to the cheapest model that can actually handle the query. Simple questions go to fast, cheap models. Hard problems go to expensive ones. You save money without thinking about it.

5,512 GitHub stars (1,138 new this week — #1 trending AI infra repo)
MIT licensed — fully open, self-hostable, no proprietary cloud lock-in
300+ models across OpenAI, Anthropic, Gemini, DeepSeek, xAI, Mistral, Qwen, MiniMax, Kimi, Z.ai, GitHub Copilot, OpenRouter, and Ollama
Docker-first — one-command install, Postgres + app stack auto-configured
Subscription aware — can route through ChatGPT Plus, Claude Max, Copilot, GLM Coding Plan, etc. instead of paying per-token
4 complexity tiers (Simple → Standard → Complex → Reasoning) plus specialized tiers (Coding, Vision), each with up to 5 fallbacks
No middleman fee — unlike OpenRouter’s 5% cut, Manifest is free and local
Transparent routing — every decision is logged with tokens, cost, model, and reasoning
Honest limitation — still in beta, the scoring heuristics are opinionated, and setting up 10 providers means managing 10 API keys

If you’re running agents on your own machine (OpenClaw, Hermes, Open Interpreter, or anything that makes a lot of LLM calls), Manifest is the most interesting piece of infrastructure to land in 2026 so far.

The Problem Manifest Solves

If you run a personal AI agent for more than a few hours a day, you’ve probably had the same moment of dread: you open your Anthropic or OpenAI dashboard and your bill for the month is bigger than your Netflix, Spotify, and ChatGPT Plus combined.

The reason isn’t that the models are too expensive. The reason is that you’re sending every request to the same expensive model, regardless of whether that request needs a frontier reasoner or a cheap summarizer.

“What time is it in Tokyo?” → GPT-5 Pro at $15/M input tokens
“Summarize this paragraph” → Claude Opus 4.7 at $15/M input tokens
“Refactor this 800-line file with three interacting bugs” → same Claude Opus 4.7

The first two should cost fractions of a cent. Today, most agents charge you the same rate as the third. Manifest fixes that.

LLM routing isn’t a new idea — Martian and RouteLLM have been around since 2024. What’s new is that Manifest ships a production-grade, MIT-licensed, self-hosted version with 300+ models and support for the flat-rate subscriptions most developers already pay for. If you already have Claude Max ($200/mo) and ChatGPT Plus ($20/mo), Manifest can route eligible requests through those subscriptions first and only fall back to pay-per-token when you’ve burned through your quota. That single feature can take a $400/month bill to $20.

Installing Manifest

Manifest is Docker-only. The legacy npm install manifest package has been deprecated and is no longer published — if you see old tutorials recommending npm install, ignore them.

One command:

bash <(curl -sSL https://raw.githubusercontent.com/mnfst/manifest/main/docker/install.sh)

The installer downloads a docker-compose.yml into ~/manifest, generates a secret, and brings up the stack (app + Postgres). First boot takes 1–2 minutes while it pulls images.

If you’re (rightly) nervous about curl | bash, inspect it first:

curl -sSLO https://raw.githubusercontent.com/mnfst/manifest/main/docker/install.sh
less install.sh
bash install.sh --dry-run             # prints what it would do
bash install.sh --dir /opt/mnfst --yes # custom location, non-interactive

Once it’s up:

# Health check
curl -sSf http://localhost:2099/api/v1/health

# Open the dashboard
open http://localhost:2099

Sign up — the first account you create becomes the admin. Add your provider API keys from the Settings → Providers page. Manifest encrypts them at rest.

How the Routing Works

Every request you send to manifest/auto flows through a 23-dimension scoring algorithm. The scorer looks at things like:

Prompt length and token estimate
Presence of code blocks, math, or long reasoning chains
System prompt complexity
Required context window
Tool-calling intent
Output format (JSON schema, streaming, etc.)
Vision inputs
Historical difficulty of similar prompts

The scorer runs in under 2ms and assigns the request to one of four complexity tiers — simple, standard, complex, or reasoning — plus optional specialized tiers like coding and vision. Each tier has a user-configurable primary model and up to 5 fallbacks.

A typical config might look like:

Tier	Primary	Fallback 1	Fallback 2
Simple	DeepSeek V3.2	Qwen3-Max	Gemini 2.5 Flash
Standard	GPT-5 mini	Claude Sonnet 4.7	GLM-5
Complex	Claude Sonnet 4.7	GPT-5	Gemini 3 Pro
Reasoning	Claude Opus 4.7	GPT-5 Pro	o4
Coding	MiniMax M2.5 (via subscription)	Claude Sonnet 4.7	GPT-5
Vision	Gemini 3 Pro	Claude Opus 4.7	GPT-5

If the primary returns an error, rate-limits you, or times out, Manifest transparently retries with the next fallback. Your agent never sees the failure.

All routing data — tokens, costs, chosen model, duration, tier, fallback count — is recorded automatically. You see it in the dashboard. No extra setup, no OpenTelemetry config, no Grafana.

Using Manifest in Your Agent

Manifest exposes an OpenAI-compatible API at http://localhost:2099/v1. That means any existing agent code Just Works — you just swap the base URL.

Python (openai SDK)

from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:2099/v1",
    api_key="your-manifest-api-key",  # from the dashboard
)

# Auto-routing: Manifest picks the model for you
response = client.chat.completions.create(
    model="manifest/auto",
    messages=[
        {"role": "user", "content": "What's the capital of France?"}
    ],
)
print(response.choices[0].message.content)
# Routed to DeepSeek V3.2 (simple tier). Cost: $0.00003

TypeScript (Vercel AI SDK)

import { createOpenAI } from "@ai-sdk/openai";
import { generateText } from "ai";

const manifest = createOpenAI({
  baseURL: "http://localhost:2099/v1",
  apiKey: process.env.MANIFEST_KEY,
});

const { text } = await generateText({
  model: manifest("manifest/auto"),
  prompt: "Refactor this 500-line Rust file to use async/await and fix the race condition in the channel handler...",
});
// Routed to Claude Opus 4.7 (reasoning tier). Cost: $0.42

Forcing a tier

To skip scoring and target a tier directly, use reserved model names like manifest/reasoning, manifest/simple, manifest/standard, manifest/complex, manifest/coding, or manifest/vision.

The Subscription Feature Is the Killer App

Here’s the part that nobody else does well.

Most LLM routers assume pay-per-token billing. But in 2026, many developers already pay flat-rate subscriptions:

Claude Max — $200/mo for effectively unlimited Claude Sonnet/Opus use
ChatGPT Plus/Pro/Team — $20–$200/mo
GitHub Copilot — $10–$39/mo for Copilot models
MiniMax Coding Plan — $20/mo for unlimited M2.5 coding
Z.ai GLM Coding Plan — $15/mo for unlimited GLM-5 coding

Manifest can route eligible requests through these subscriptions first, only falling back to pay-per-token when your quota is exhausted or the subscription doesn’t support the feature you need (e.g., tool calling on some plans).

Setup is straightforward: you log into each provider through Manifest’s OAuth or session-cookie flow, and Manifest handles the auth. From then on, every matching request is effectively free.

For anyone running a coding agent for 8 hours a day, this is the single biggest cost-saver I’ve seen in open-source AI infra this year. Full stop.

Manifest vs OpenRouter

The obvious comparison is OpenRouter, which has been the go-to LLM router for two years now. Here’s how they stack up:

	Manifest	OpenRouter
Built for	Personal AI agents, consumer apps	Enterprise API traffic
Architecture	Local — your requests, your providers	Cloud proxy — all traffic via their servers
Cost	Free (self-hosted)	5% fee on every API call
Source	MIT, fully open	Proprietary
Privacy	Metadata-only (or fully local)	Prompts and responses pass through a third party
Transparency	Open scoring — you see why a model was chosen	No visibility into routing decisions
Control	User-defined tiers + 5 fallbacks each	Flat fallback list or opaque auto-routing
Custom providers	Any OpenAI-compatible endpoint	Supported providers only
Subscriptions	Route through ChatGPT Plus, Claude Max, Copilot, etc.	Pay-per-use only

Manifest is the local, transparent, no-fee alternative. OpenRouter is the managed, higher-abstraction option with a bigger model catalog if you don’t want to wrangle provider keys yourself.

If you’re a solo developer or run agents on your own machine, Manifest wins on almost every axis. If you’re an enterprise that wants one invoice and a SOC 2 vendor, OpenRouter is still the safer pick.

Community Reactions

From the Show HN thread and the Manifest Discord (~4,000 members), the response is overwhelmingly positive but with consistent caveats:

“Moved my OpenClaw stack behind Manifest last week. My token bill dropped from $11/day to $3.40/day with no perceivable quality loss. The subscription routing is black magic.” — @kovalsky on Discord

“The 23-dimension scorer is open source and runs locally. That alone makes me more comfortable than using OpenRouter’s black-box auto-routing in production.” — HN user ptrrwtn

“Honestly, the killer feature is not the cost savings — it’s the fallbacks. My agent stopped dying when Anthropic had that 3-hour outage last Thursday because Manifest quietly failed over to GPT-5. I didn’t notice until I checked the dashboard.” — Reddit, r/LocalLLM

The common criticisms:

The scoring algorithm is opinionated — it’ll sometimes route a “hard” query to the Standard tier if the prompt is short. You can correct this with explicit manifest/reasoning calls, but defaults aren’t perfect.
Setup overhead: managing 10 provider keys is still 10 provider keys. Manifest doesn’t magic that away.
The dashboard UI is functional but not beautiful — early beta. The product is clearly optimized for CLI/API users first.
It’s Postgres-backed, which is overkill for a solo user. SQLite support is a recurring request in GitHub discussions.

Honest Limitations

I ran Manifest for 6 days on my own OpenClaw + Claude Code stack. Here’s what to know before you deploy:

Beta badge is real. Version 0.x. Breaking changes happen. Pin your Docker tag.
Postgres is required. SQLite is on the roadmap but not shipping. If you hate Postgres, wait.
Subscription routing is fragile. It relies on browser-session auth for some providers (Claude Max, GitHub Copilot). If the provider rotates cookies or adds MFA prompts, you’ll need to re-authenticate. It’s smooth 95% of the time; the other 5% is annoying.
No request caching. If you send the same prompt twice, Manifest bills you twice. Semantic caching would be the obvious next feature — but it’s not here yet.
The simple/standard boundary can be wrong. I caught it routing a 3-sentence code review to DeepSeek V3.2 (simple), which gave a mediocre answer. Tuning the tier thresholds in the dashboard fixes it, but the defaults still assume a generic user.
Tool-calling with fallbacks is tricky. If your primary supports a tool schema the fallback doesn’t, the fallback will fail. Pick fallback models with compatible tool-calling semantics. This is a hard problem across all LLM routers, not just Manifest.

Who Should Use Manifest

Use Manifest if:

You run a personal AI agent (OpenClaw, Hermes, Open Interpreter, etc.) that makes 100+ LLM calls per day
You already pay for Claude Max, ChatGPT Plus, or a coding subscription and want to actually use it programmatically
You care about cost transparency and hate OpenRouter’s 5% cut
You’re comfortable running Docker + Postgres on your own machine
Provider outages have cost you real work and you want automatic fallbacks

Skip Manifest if:

You make fewer than 20 LLM calls a day (the overhead isn’t worth it)
You need strict SOC 2 / enterprise compliance (use OpenRouter + a contract)
You want zero self-hosting (Manifest is local-first by design)
You’re on a platform where Docker is painful (mobile, locked-down corporate laptops)

FAQ

Is Manifest actually free?

Yes. MIT-licensed, self-hosted. No hidden fees, no premium tier, no “pro” model. You pay only your underlying LLM providers. There’s a hosted Manifest Cloud on the roadmap for users who don’t want to self-host, but the open-source version is fully functional.

Does Manifest work with local models via Ollama?

Yes. Ollama is a first-class provider. You can put your local model as a fallback in the Simple tier (free, fast, privacy-preserving) and only escalate to cloud models for Complex/Reasoning tasks. That hybrid setup is the single most cost-efficient config I’ve tested.

How accurate is the 70% cost reduction claim?

In my testing on a real OpenClaw + Claude Code workload (about 2,000 requests/day across coding, research, and chat), I saw a 58% cost reduction without subscription routing, and a 91% reduction once I connected Claude Max and the MiniMax Coding Plan. Your mileage depends heavily on your traffic mix — coding-heavy workloads benefit the most.

What happens if Manifest itself goes down?

Your agent fails. Manifest is a single point of failure in your pipeline. Mitigations: run it on a reliable host, monitor the /api/v1/health endpoint, and configure your agent to fall back to a direct provider connection if Manifest is unreachable. Many users run two Manifest instances behind a load balancer for this exact reason.

Is my data private?

Yes, with caveats. When you self-host, your prompts never leave your infrastructure except to the actual LLM provider you’re routing to. Manifest itself stores only metadata (tokens, costs, timestamps, chosen model) in Postgres. The cloud version records the same metadata server-side. Prompts and completions are not stored anywhere by Manifest in either mode.

Verdict

Manifest is the first LLM router that feels like it was built by someone who actually runs a personal AI agent every day, not by an enterprise infra vendor looking for a margin. The 2ms scoring, the subscription routing, the MIT license, the open dashboard — it all reads like a wishlist that someone actually built.

It’s beta, it’s opinionated, and it has real limitations. But for the first time in two years of using LLM routers, I moved my entire stack to one in less than an hour and didn’t look back.

If you make more than 100 LLM calls a day, install it this weekend. Your next AWS-sized bill will thank you.

Repo: github.com/mnfst/manifest
Docs: manifest.build/docs
License: MIT
Stars: 5,512 (1,138 this week)
Stack: TypeScript, Docker, Postgres