How is Fugu different from LangGraph or CrewAI?

LangGraph and CrewAI are hand-designed orchestration frameworks — you write the workflow. Fugu's RL Conductor learns the routing decisions automatically. The trade-off: Fugu is a black-box router with reported wins on hard reasoning and coding benchmarks (LiveCodeBench 83.9%, GPQA-Diamond 87.5%), while LangGraph and CrewAI remain fully inspectable and deterministic but require manual workflow design.

What are Fugu Mini and Fugu Ultra?

Two tiers of the commercial product: Fugu Mini is latency-optimized for chatbots and low-latency applications; Fugu Ultra is quality-optimized for complex multi-step reasoning, finance, and defense workloads. Both use the same RL Conductor routing layer but tune the worker pool and depth of orchestration differently.

Can I use Fugu in production today?

Fugu is in beta. The OpenAI-compatible API means swapping in is straightforward (change base URL, keep the same SDK). Use it for non-regulated workloads first. Avoid it where you need fully auditable routing decisions — Conductor is a black box, and regulators in finance, healthcare, or insurance may require deterministic, inspectable model choice.

Quick Answer

What Is Sakana Fugu? Multi-Agent Orchestration (May 2026)

Q: What is Sakana Fugu?

Sakana Fugu is a commercial multi-agent orchestration system from Tokyo-based Sakana AI, in beta as of May 2026. It is built on Sakana's RL Conductor — a 7-billion-parameter routing model trained with reinforcement learning to dynamically decide which frontier worker LLM (GPT-5, Claude Sonnet 4, Gemini 2.5 Pro) is best suited for each subtask. Fugu is accessible via an OpenAI-compatible API.

Published: May 16, 2026

What Is Sakana Fugu? Multi-Agent Orchestration (May 2026)

Sakana Fugu is a beta commercial product from Sakana AI that orchestrates GPT-5, Claude, and Gemini behind a single API — driven by a 7-billion-parameter routing model trained with reinforcement learning. Here’s what it is and when to use it.

Last verified: May 16, 2026

TL;DR

Field	Detail
Product	Sakana Fugu
Maker	Sakana AI (Tokyo)
Status	Beta
API	OpenAI-compatible
Under the hood	RL Conductor — 7B routing model trained with RL
Worker models	GPT-5, Claude Sonnet 4, Gemini 2.5 Pro (and more)
Tiers	Fugu Mini (latency), Fugu Ultra (quality)
Benchmarks	LiveCodeBench 83.9%, GPQA-Diamond 87.5%
Paper	ICLR 2026, “Learning to Orchestrate” — April 27, 2026

What Fugu does

When you send a request to Fugu, the RL Conductor:

Reads the task and decomposes it into subtasks.
Picks the best worker LLM per subtask (cost-aware).
Writes the prompt for that worker.
Combines results — sometimes sequentially, sometimes in parallel, sometimes recursively.
Returns the final answer plus a routing trace.

You don’t see the routing logic. You just get the answer and pay less than calling Opus 4.7 for everything.

How it was trained

The paper Learning to Orchestrate (Sakana, April 27, 2026, ICLR 2026 accepted) describes the setup:

Base model: Qwen2.5-7B.
Reinforcement learning with reward shaping on correctness + cost penalty.
Worker pool: GPT-5, Claude Sonnet 4, Gemini 2.5 Pro, and cheaper models.
Training tasks: a mix of reasoning, coding, factual, and decomposable multi-step problems.

The Conductor learned non-obvious strategies:

Use a cheap model for decomposition, then call a frontier model for the hard subtask.
Run two workers in parallel and vote when uncertainty is high.
Avoid the most expensive model unless absolutely necessary.

Reported wins

Sakana published these benchmark numbers:

LiveCodeBench: 83.9% — beats GPT-5 solo.
GPQA-Diamond: 87.5% — beats hand-designed multi-agent baselines.
30–60% fewer API calls than naive “always use Opus 4.7” pipelines.

Fugu Mini vs Fugu Ultra

	Fugu Mini	Fugu Ultra
Optimized for	Latency	Quality
Worker pool	Smaller, faster models	Full frontier pool
Best for	Chatbots, real-time, simple agents	Hard reasoning, finance, defense
Cost	Lower per call	Higher per call
Latency	Sub-second to low-second	Multi-second

How to use it

Fugu speaks the OpenAI Chat Completions format. Drop-in usage:

from openai import OpenAI

client = OpenAI(
    base_url="https://api.sakana.ai/v1",  # check current Fugu endpoint
    api_key="..."
)

response = client.chat.completions.create(
    model="fugu-ultra",  # or "fugu-mini"
    messages=[{"role": "user", "content": "Your task..."}]
)

That’s it. Conductor handles the rest.

When to use Fugu

Use Fugu when:

You don’t know in advance which model is best per subtask.
You’re cost-sensitive on a pipeline that already does many model calls.
Coding, reasoning, math, or research workloads dominate.
You can tolerate black-box routing (you can’t see exactly why a worker was picked).

Don’t use Fugu when:

You need fully auditable routing decisions (regulated finance, healthcare, insurance).
You’re already running a hand-tuned LangGraph workflow that works.
Latency is sub-100ms (the routing adds overhead).
You’re locked into a single-vendor stack for governance reasons.

Risks and watch-outs

Black-box routing — you cannot easily explain why a specific worker was chosen for a subtask.
Worker availability — if GPT-5 or Claude APIs are down, Fugu’s quality drops.
Pricing surprises — Conductor picks workers dynamically; per-request cost is harder to predict.
Beta status — SLAs and uptime are not yet production-grade.
Geographic coverage — Sakana is Tokyo-based; data residency for non-Japan customers is worth confirming.

How it fits in the orchestration landscape

	Approach	Best for
Sakana Fugu (RL Conductor)	Learned routing	Auto-routing unknown workloads
LangGraph	Hand-designed graph	Long-running stateful agents
CrewAI	Hand-designed agent roles	Multi-agent prototyping
OpenRouter	User-picks routing	Manual model selection by user
Anthropic managed agents	Outcome-routing	Claude-native workloads

What’s coming next

General availability — Sakana has not committed a GA date for Fugu.
More workers — Mythos, GPT-5.5 Cyber, Gemini 3.1 Pro likely to be added.
Customer-trained Conductor — Sakana hints at letting customers fine-tune the routing on their own task distribution.
Self-hosted Conductor — likely; Qwen2.5-7B base means it’s technically deployable.

Why this matters

Most enterprise AI today still picks one model and prays. Sakana’s RL Conductor is the cleanest commercial demonstration that trained routing beats human routing on benchmarks, at lower cost. Expect Anthropic, Google, and OpenAI to ship their own trained routers within 12 months. Expect LangGraph and CrewAI to add trained-router nodes.

The orchestration layer is becoming a model itself.

Sources: Sakana AI blog (sakana.ai/learning-to-orchestrate), VentureBeat, ICLR 2026 paper — April 27, 2026.

What Is Sakana Fugu? Multi-Agent Orchestration (May 2026)

TL;DR

What Fugu does

How it was trained

Reported wins

Fugu Mini vs Fugu Ultra

How to use it

When to use Fugu

Risks and watch-outs

How it fits in the orchestration landscape

What’s coming next

Why this matters

Related reading