AI agents · OpenClaw · self-hosting · automation

Quick Answer

What Is Sakana Fugu? Multi-Agent Orchestration (May 2026)

Published:

What Is Sakana Fugu? Multi-Agent Orchestration (May 2026)

Sakana Fugu is a beta commercial product from Sakana AI that orchestrates GPT-5, Claude, and Gemini behind a single API — driven by a 7-billion-parameter routing model trained with reinforcement learning. Here’s what it is and when to use it.

Last verified: May 16, 2026

TL;DR

FieldDetail
ProductSakana Fugu
MakerSakana AI (Tokyo)
StatusBeta
APIOpenAI-compatible
Under the hoodRL Conductor — 7B routing model trained with RL
Worker modelsGPT-5, Claude Sonnet 4, Gemini 2.5 Pro (and more)
TiersFugu Mini (latency), Fugu Ultra (quality)
BenchmarksLiveCodeBench 83.9%, GPQA-Diamond 87.5%
PaperICLR 2026, “Learning to Orchestrate” — April 27, 2026

What Fugu does

When you send a request to Fugu, the RL Conductor:

  1. Reads the task and decomposes it into subtasks.
  2. Picks the best worker LLM per subtask (cost-aware).
  3. Writes the prompt for that worker.
  4. Combines results — sometimes sequentially, sometimes in parallel, sometimes recursively.
  5. Returns the final answer plus a routing trace.

You don’t see the routing logic. You just get the answer and pay less than calling Opus 4.7 for everything.

How it was trained

The paper Learning to Orchestrate (Sakana, April 27, 2026, ICLR 2026 accepted) describes the setup:

  • Base model: Qwen2.5-7B.
  • Reinforcement learning with reward shaping on correctness + cost penalty.
  • Worker pool: GPT-5, Claude Sonnet 4, Gemini 2.5 Pro, and cheaper models.
  • Training tasks: a mix of reasoning, coding, factual, and decomposable multi-step problems.

The Conductor learned non-obvious strategies:

  • Use a cheap model for decomposition, then call a frontier model for the hard subtask.
  • Run two workers in parallel and vote when uncertainty is high.
  • Avoid the most expensive model unless absolutely necessary.

Reported wins

Sakana published these benchmark numbers:

  • LiveCodeBench: 83.9% — beats GPT-5 solo.
  • GPQA-Diamond: 87.5% — beats hand-designed multi-agent baselines.
  • 30–60% fewer API calls than naive “always use Opus 4.7” pipelines.

Fugu Mini vs Fugu Ultra

Fugu MiniFugu Ultra
Optimized forLatencyQuality
Worker poolSmaller, faster modelsFull frontier pool
Best forChatbots, real-time, simple agentsHard reasoning, finance, defense
CostLower per callHigher per call
LatencySub-second to low-secondMulti-second

How to use it

Fugu speaks the OpenAI Chat Completions format. Drop-in usage:

from openai import OpenAI

client = OpenAI(
    base_url="https://api.sakana.ai/v1",  # check current Fugu endpoint
    api_key="..."
)

response = client.chat.completions.create(
    model="fugu-ultra",  # or "fugu-mini"
    messages=[{"role": "user", "content": "Your task..."}]
)

That’s it. Conductor handles the rest.

When to use Fugu

Use Fugu when:

  • You don’t know in advance which model is best per subtask.
  • You’re cost-sensitive on a pipeline that already does many model calls.
  • Coding, reasoning, math, or research workloads dominate.
  • You can tolerate black-box routing (you can’t see exactly why a worker was picked).

Don’t use Fugu when:

  • You need fully auditable routing decisions (regulated finance, healthcare, insurance).
  • You’re already running a hand-tuned LangGraph workflow that works.
  • Latency is sub-100ms (the routing adds overhead).
  • You’re locked into a single-vendor stack for governance reasons.

Risks and watch-outs

  • Black-box routing — you cannot easily explain why a specific worker was chosen for a subtask.
  • Worker availability — if GPT-5 or Claude APIs are down, Fugu’s quality drops.
  • Pricing surprises — Conductor picks workers dynamically; per-request cost is harder to predict.
  • Beta status — SLAs and uptime are not yet production-grade.
  • Geographic coverage — Sakana is Tokyo-based; data residency for non-Japan customers is worth confirming.

How it fits in the orchestration landscape

ApproachBest for
Sakana Fugu (RL Conductor)Learned routingAuto-routing unknown workloads
LangGraphHand-designed graphLong-running stateful agents
CrewAIHand-designed agent rolesMulti-agent prototyping
OpenRouterUser-picks routingManual model selection by user
Anthropic managed agentsOutcome-routingClaude-native workloads

What’s coming next

  • General availability — Sakana has not committed a GA date for Fugu.
  • More workers — Mythos, GPT-5.5 Cyber, Gemini 3.1 Pro likely to be added.
  • Customer-trained Conductor — Sakana hints at letting customers fine-tune the routing on their own task distribution.
  • Self-hosted Conductor — likely; Qwen2.5-7B base means it’s technically deployable.

Why this matters

Most enterprise AI today still picks one model and prays. Sakana’s RL Conductor is the cleanest commercial demonstration that trained routing beats human routing on benchmarks, at lower cost. Expect Anthropic, Google, and OpenAI to ship their own trained routers within 12 months. Expect LangGraph and CrewAI to add trained-router nodes.

The orchestration layer is becoming a model itself.


Sources: Sakana AI blog (sakana.ai/learning-to-orchestrate), VentureBeat, ICLR 2026 paper — April 27, 2026.