How to Choose Claude Opus 4.8 Effort Level (June 2026)
How to Choose Claude Opus 4.8 Effort Level (June 2026)
Claude Opus 4.8 (released May 28, 2026) introduced user-adjustable effort levels — low, medium, and high. It’s one of the highest-impact decisions for your costs and quality in production agentic workflows. Here’s the decision framework with concrete examples.
Last verified: June 21, 2026.
TL;DR
- High effort (default): Use for complex reasoning, multi-step planning, hard coding, ambiguous tasks. Best quality, slowest, most expensive.
- Medium effort: Safe default when you don’t know task complexity. Balanced trade-off.
- Low effort (fast mode): Use for simple tasks, lookups, summaries. ~3x cheaper, ~2.5x faster than earlier Opus fast modes.
- Production tip: Mix levels within agentic loops — high for orchestrator, low for simple sub-tasks.
What effort level actually controls
Effort level adjusts how many internal reasoning tokens Claude Opus 4.8 spends before producing user-visible output. More reasoning tokens = more careful, considered answers. Fewer = faster cheaper answers that might miss nuance.
It’s not a different model — it’s the same Opus 4.8 model with different inference budget. This matters because the peak capability is the same; you’re choosing how much of that capability to spend on a given query.
The three settings
High effort (default)
- What it does: Full reasoning depth. Opus 4.8 thinks extensively before answering, including planning, self-critique, and verification.
- Speed: Slowest of the three.
- Cost: Highest (3x more than low for similar input/output lengths).
- Quality: Best. This is what scores 80.8% on SWE-bench and ~61 on Intelligence Index.
- Default: Yes, this is what you get if you don’t specify.
Medium effort
- What it does: Moderate reasoning. Some self-critique, less verification, faster than high.
- Speed: ~1.7x faster than high.
- Cost: ~1.5x cheaper than high.
- Quality: Slightly below high; most users won’t notice the difference on routine tasks.
- Best default: If you can’t predict task complexity ahead of time.
Low effort (fast mode)
- What it does: Minimal internal reasoning. Direct generation, light self-checking.
- Speed: ~2.5x faster than earlier Opus fast modes.
- Cost: ~3x cheaper than fast modes in earlier Opus versions; meaningfully cheaper than Opus 4.8 high.
- Quality: Significantly below high on complex tasks; comparable on simple tasks.
- Best for: Tasks where you’d otherwise route to Sonnet 4.6 or Haiku, but want Opus’s voice/style.
When to use each — decision rules
Use low effort when:
- Single-turn lookup (“what’s the syntax for X?”)
- Short summary of well-structured input
- Conversational reply that doesn’t require reasoning
- Bulk processing where throughput > per-item quality
- Code formatting, doc generation from clear spec, test scaffolding
- Any task where you’d consider routing to a smaller model
Use medium effort when:
- You don’t know how complex the task will be
- Production workflows that span varied query types
- Cost-sensitive but quality-meaningful customer-facing features
- General-purpose chatbot use
Use high effort when:
- Multi-step reasoning or planning
- Ambiguous requirements
- Long-context analysis (legal docs, large codebases)
- High-stakes outputs (medical, financial, legal, executive comms)
- Coding tasks that span multiple files or require architectural reasoning
- Final answer step in any agentic workflow
Cost-quality math: a concrete example
A 2K-token-in, 500-token-out coding query on a moderate-complexity bug fix:
| Effort | Approx cost | Approx latency | Accuracy on hard SWE-bench tasks |
|---|---|---|---|
| Low | ~$0.05 | ~2-4 sec | Significantly lower |
| Medium | ~$0.18 | ~6-10 sec | Mid-tier |
| High | ~$0.30+ | ~15-30 sec | 80.8% (the headline number) |
The naive cost optimization is “use low for everything.” The actual cost optimization accounts for re-runs: if a low-effort answer is wrong 40% of the time and you re-run those at high effort, your total spend is higher than if you just started at medium. The right strategy is per-task-class, not blanket.
Production patterns
Pattern 1: Tiered routing
A router (cheap model or rule-based) classifies each incoming query, then routes to the right Opus 4.8 effort level:
- Simple FAQs → Sonnet 4.6 (cheaper than Opus low).
- Moderate complexity → Opus 4.8 low.
- Hard reasoning → Opus 4.8 high.
This is the standard production approach in 2026.
Pattern 2: Effort escalation
Start every query at low effort. If the model expresses low confidence (Opus 4.8 is “more likely to flag uncertainty” per the release notes), re-run at high effort.
- Saves cost on the 60-70% of queries that work fine at low effort.
- Catches the hard cases without missing them.
- Adds latency on the escalated queries; not appropriate for real-time UIs.
Pattern 3: Orchestrator high, subagents low
In agentic workflows (especially Opus 4.8’s dynamic workflows that spawn hundreds of subagents):
- Orchestrator runs at high effort — it makes the planning decisions.
- Subagents run at low or medium effort — they execute well-defined sub-tasks.
This is the cost-optimal shape for the dynamic-workflow feature.
Pattern 4: Final-answer high, exploration low
For deep research or analytical workflows:
- Exploration/iteration steps at low or medium effort.
- Final synthesis or answer step at high effort.
This pattern preserves quality where it counts (the user-visible output) while keeping the search loop cheap.
How to set effort level
In claude.ai
The effort selector is in the conversation UI for Opus 4.8. Set it before sending a message.
In the Claude API
{
"model": "claude-opus-4-8",
"messages": [...],
"effort": "low" // or "medium" or "high"
}
The default is "high" if effort is omitted.
In Claude Code
Set it via the model selection prompt or in .claude/settings.json. Claude Code defaults to high for Opus 4.8.
Common mistakes to avoid
- Defaulting to low everywhere. You’ll have a bad time on the hard 20% of tasks.
- Defaulting to high everywhere. You’re overpaying by 3-6x on simple tasks.
- Ignoring the mid-conversation system message feature. Opus 4.8 lets you change instructions (including effort context) mid-task without restating the full system prompt — this preserves prompt cache hits.
- Forgetting effort affects latency. Real-time UIs feel sluggish at high effort; budget UI for the worst case.
When to use Sonnet 4.6 instead
If low effort feels right for a task class, also consider Sonnet 4.6:
- Sonnet 4.6 is cheaper than Opus 4.8 at low effort.
- Sonnet 4.6 has strong writing style and instruction-following.
- Sonnet 4.6 wins for high-volume, latency-sensitive customer-facing workloads.
The decision tree: simple task + need Opus voice → Opus low; simple task + voice doesn’t matter → Sonnet 4.6.
Sources
- Anthropic: “Claude Opus 4.8” release announcement, May 28, 2026
- Knowledge Hub Media: “Claude Opus 4.8 everything you need to know”
- Caylent: “Claude Opus 4.8 — what improved, what’s new”
- Anthropic platform docs: claude.com/docs (Opus 4.8 effort levels)
- Apidog: “Claude Opus 4.8 vs GPT-5.5 vs Gemini 3.5”
Published June 21, 2026 by andrew.ooo. See Claude Opus 4.8 vs Opus 4.7 and Claude Opus 4.8 fast mode vs GPT-5.5 vs Gemini 3.5 Flash cost.