What Is Sakana Fugu? Multi-Agent Orchestration (May 2026)
What Is Sakana Fugu? Multi-Agent Orchestration (May 2026)
Sakana Fugu is a beta commercial product from Sakana AI that orchestrates GPT-5, Claude, and Gemini behind a single API — driven by a 7-billion-parameter routing model trained with reinforcement learning. Here’s what it is and when to use it.
Last verified: May 16, 2026
TL;DR
| Field | Detail |
|---|---|
| Product | Sakana Fugu |
| Maker | Sakana AI (Tokyo) |
| Status | Beta |
| API | OpenAI-compatible |
| Under the hood | RL Conductor — 7B routing model trained with RL |
| Worker models | GPT-5, Claude Sonnet 4, Gemini 2.5 Pro (and more) |
| Tiers | Fugu Mini (latency), Fugu Ultra (quality) |
| Benchmarks | LiveCodeBench 83.9%, GPQA-Diamond 87.5% |
| Paper | ICLR 2026, “Learning to Orchestrate” — April 27, 2026 |
What Fugu does
When you send a request to Fugu, the RL Conductor:
- Reads the task and decomposes it into subtasks.
- Picks the best worker LLM per subtask (cost-aware).
- Writes the prompt for that worker.
- Combines results — sometimes sequentially, sometimes in parallel, sometimes recursively.
- Returns the final answer plus a routing trace.
You don’t see the routing logic. You just get the answer and pay less than calling Opus 4.7 for everything.
How it was trained
The paper Learning to Orchestrate (Sakana, April 27, 2026, ICLR 2026 accepted) describes the setup:
- Base model: Qwen2.5-7B.
- Reinforcement learning with reward shaping on correctness + cost penalty.
- Worker pool: GPT-5, Claude Sonnet 4, Gemini 2.5 Pro, and cheaper models.
- Training tasks: a mix of reasoning, coding, factual, and decomposable multi-step problems.
The Conductor learned non-obvious strategies:
- Use a cheap model for decomposition, then call a frontier model for the hard subtask.
- Run two workers in parallel and vote when uncertainty is high.
- Avoid the most expensive model unless absolutely necessary.
Reported wins
Sakana published these benchmark numbers:
- LiveCodeBench: 83.9% — beats GPT-5 solo.
- GPQA-Diamond: 87.5% — beats hand-designed multi-agent baselines.
- 30–60% fewer API calls than naive “always use Opus 4.7” pipelines.
Fugu Mini vs Fugu Ultra
| Fugu Mini | Fugu Ultra | |
|---|---|---|
| Optimized for | Latency | Quality |
| Worker pool | Smaller, faster models | Full frontier pool |
| Best for | Chatbots, real-time, simple agents | Hard reasoning, finance, defense |
| Cost | Lower per call | Higher per call |
| Latency | Sub-second to low-second | Multi-second |
How to use it
Fugu speaks the OpenAI Chat Completions format. Drop-in usage:
from openai import OpenAI
client = OpenAI(
base_url="https://api.sakana.ai/v1", # check current Fugu endpoint
api_key="..."
)
response = client.chat.completions.create(
model="fugu-ultra", # or "fugu-mini"
messages=[{"role": "user", "content": "Your task..."}]
)
That’s it. Conductor handles the rest.
When to use Fugu
Use Fugu when:
- You don’t know in advance which model is best per subtask.
- You’re cost-sensitive on a pipeline that already does many model calls.
- Coding, reasoning, math, or research workloads dominate.
- You can tolerate black-box routing (you can’t see exactly why a worker was picked).
Don’t use Fugu when:
- You need fully auditable routing decisions (regulated finance, healthcare, insurance).
- You’re already running a hand-tuned LangGraph workflow that works.
- Latency is sub-100ms (the routing adds overhead).
- You’re locked into a single-vendor stack for governance reasons.
Risks and watch-outs
- Black-box routing — you cannot easily explain why a specific worker was chosen for a subtask.
- Worker availability — if GPT-5 or Claude APIs are down, Fugu’s quality drops.
- Pricing surprises — Conductor picks workers dynamically; per-request cost is harder to predict.
- Beta status — SLAs and uptime are not yet production-grade.
- Geographic coverage — Sakana is Tokyo-based; data residency for non-Japan customers is worth confirming.
How it fits in the orchestration landscape
| Approach | Best for | |
|---|---|---|
| Sakana Fugu (RL Conductor) | Learned routing | Auto-routing unknown workloads |
| LangGraph | Hand-designed graph | Long-running stateful agents |
| CrewAI | Hand-designed agent roles | Multi-agent prototyping |
| OpenRouter | User-picks routing | Manual model selection by user |
| Anthropic managed agents | Outcome-routing | Claude-native workloads |
What’s coming next
- General availability — Sakana has not committed a GA date for Fugu.
- More workers — Mythos, GPT-5.5 Cyber, Gemini 3.1 Pro likely to be added.
- Customer-trained Conductor — Sakana hints at letting customers fine-tune the routing on their own task distribution.
- Self-hosted Conductor — likely; Qwen2.5-7B base means it’s technically deployable.
Why this matters
Most enterprise AI today still picks one model and prays. Sakana’s RL Conductor is the cleanest commercial demonstration that trained routing beats human routing on benchmarks, at lower cost. Expect Anthropic, Google, and OpenAI to ship their own trained routers within 12 months. Expect LangGraph and CrewAI to add trained-router nodes.
The orchestration layer is becoming a model itself.
Related reading
- Sakana RL Conductor vs LangGraph vs CrewAI (May 2026)
- Best AI Agent Control Planes (May 2026)
- Anthropic Dreaming vs LangGraph Memory vs OpenAI Memory (May 2026)
- Claude Managed Agents Outcomes vs LangGraph vs CrewAI (May 2026)
Sources: Sakana AI blog (sakana.ai/learning-to-orchestrate), VentureBeat, ICLR 2026 paper — April 27, 2026.