What is Claude Opus 4.8 effort level?

Effort level is a user-adjustable control introduced with Claude Opus 4.8 (May 28, 2026) that lets you trade off between speed, cost, and reasoning depth. Three settings are exposed in claude.ai and via the API: low, medium, and high. High is the default for Opus 4.8 and uses the model's full reasoning depth — more internal thinking, more tokens, slower, higher cost, best quality. Low is the new 'fast mode' — 2.5x faster and 3x cheaper than earlier Opus fast modes, suitable for simple tasks. Medium sits in between. The setting changes how many internal reasoning tokens the model spends before producing output.

When should I use Opus 4.8 low effort vs high effort?

Use low effort for short tasks, simple lookups, quick summaries, conversational replies, simple coding fixes, and any workflow where you'd otherwise route to a smaller model (Sonnet 4.6, Haiku). Use high effort for complex reasoning, multi-step planning, long-context analysis, large refactors, ambiguous decisions, and anything where being right matters more than being fast. Use medium when you don't know — it's the safest default if you can't predict task complexity. In production agentic loops, mix: route simple sub-tasks to low and reserve high for the orchestrator or final-answer step.

How much cheaper is Opus 4.8 low effort compared to high?

Anthropic reports the fast mode (low effort) for Opus 4.8 is approximately 3x cheaper than fast modes in earlier Opus versions and 2.5x faster. Concretely, a query that costs $0.30 at high effort might cost $0.05-$0.10 at low effort, with most of the savings coming from fewer reasoning tokens. The exact per-query savings depend on task complexity — a low-effort response on a simple task is dramatically cheaper, but on a complex task that low effort can't solve correctly, you'll re-run at higher effort and end up paying more total.

Does low effort affect Opus 4.8's coding accuracy?

Yes, meaningfully on complex tasks. On simple code generation (single-file, one function, well-defined spec), low effort is fine and dramatically cheaper. On multi-file refactors, ambiguous bug fixes, or anything requiring architectural reasoning, low effort underperforms — Opus 4.8 at high effort is what gets the 80.8% SWE-bench score; low effort drops that meaningfully. In production: keep Opus 4.8 at high effort for the agentic coding workflow itself, optionally drop to medium for sub-tasks like code formatting, test generation from a spec, or doc generation.

Quick Answer

How to Choose Claude Opus 4.8 Effort Level (June 2026)

Published: June 21, 2026

How to Choose Claude Opus 4.8 Effort Level (June 2026)

Claude Opus 4.8 (released May 28, 2026) introduced user-adjustable effort levels — low, medium, and high. It’s one of the highest-impact decisions for your costs and quality in production agentic workflows. Here’s the decision framework with concrete examples.

Last verified: June 21, 2026.

TL;DR

High effort (default): Use for complex reasoning, multi-step planning, hard coding, ambiguous tasks. Best quality, slowest, most expensive.
Medium effort: Safe default when you don’t know task complexity. Balanced trade-off.
Low effort (fast mode): Use for simple tasks, lookups, summaries. ~3x cheaper, ~2.5x faster than earlier Opus fast modes.
Production tip: Mix levels within agentic loops — high for orchestrator, low for simple sub-tasks.

What effort level actually controls

Effort level adjusts how many internal reasoning tokens Claude Opus 4.8 spends before producing user-visible output. More reasoning tokens = more careful, considered answers. Fewer = faster cheaper answers that might miss nuance.

It’s not a different model — it’s the same Opus 4.8 model with different inference budget. This matters because the peak capability is the same; you’re choosing how much of that capability to spend on a given query.

The three settings

High effort (default)

What it does: Full reasoning depth. Opus 4.8 thinks extensively before answering, including planning, self-critique, and verification.
Speed: Slowest of the three.
Cost: Highest (3x more than low for similar input/output lengths).
Quality: Best. This is what scores 80.8% on SWE-bench and ~61 on Intelligence Index.
Default: Yes, this is what you get if you don’t specify.

Medium effort

What it does: Moderate reasoning. Some self-critique, less verification, faster than high.
Speed: ~1.7x faster than high.
Cost: ~1.5x cheaper than high.
Quality: Slightly below high; most users won’t notice the difference on routine tasks.
Best default: If you can’t predict task complexity ahead of time.

Low effort (fast mode)

What it does: Minimal internal reasoning. Direct generation, light self-checking.
Speed: ~2.5x faster than earlier Opus fast modes.
Cost: ~3x cheaper than fast modes in earlier Opus versions; meaningfully cheaper than Opus 4.8 high.
Quality: Significantly below high on complex tasks; comparable on simple tasks.
Best for: Tasks where you’d otherwise route to Sonnet 4.6 or Haiku, but want Opus’s voice/style.

When to use each — decision rules

Use low effort when:

Single-turn lookup (“what’s the syntax for X?”)
Short summary of well-structured input
Conversational reply that doesn’t require reasoning
Bulk processing where throughput > per-item quality
Code formatting, doc generation from clear spec, test scaffolding
Any task where you’d consider routing to a smaller model

Use medium effort when:

You don’t know how complex the task will be
Production workflows that span varied query types
Cost-sensitive but quality-meaningful customer-facing features
General-purpose chatbot use

Use high effort when:

Multi-step reasoning or planning
Ambiguous requirements
Long-context analysis (legal docs, large codebases)
High-stakes outputs (medical, financial, legal, executive comms)
Coding tasks that span multiple files or require architectural reasoning
Final answer step in any agentic workflow

Cost-quality math: a concrete example

A 2K-token-in, 500-token-out coding query on a moderate-complexity bug fix:

Effort	Approx cost	Approx latency	Accuracy on hard SWE-bench tasks
Low	~$0.05	~2-4 sec	Significantly lower
Medium	~$0.18	~6-10 sec	Mid-tier
High	~$0.30+	~15-30 sec	80.8% (the headline number)

The naive cost optimization is “use low for everything.” The actual cost optimization accounts for re-runs: if a low-effort answer is wrong 40% of the time and you re-run those at high effort, your total spend is higher than if you just started at medium. The right strategy is per-task-class, not blanket.

Production patterns

Pattern 1: Tiered routing

A router (cheap model or rule-based) classifies each incoming query, then routes to the right Opus 4.8 effort level:

Simple FAQs → Sonnet 4.6 (cheaper than Opus low).
Moderate complexity → Opus 4.8 low.
Hard reasoning → Opus 4.8 high.

This is the standard production approach in 2026.

Pattern 2: Effort escalation

Start every query at low effort. If the model expresses low confidence (Opus 4.8 is “more likely to flag uncertainty” per the release notes), re-run at high effort.

Saves cost on the 60-70% of queries that work fine at low effort.
Catches the hard cases without missing them.
Adds latency on the escalated queries; not appropriate for real-time UIs.

Pattern 3: Orchestrator high, subagents low

In agentic workflows (especially Opus 4.8’s dynamic workflows that spawn hundreds of subagents):

Orchestrator runs at high effort — it makes the planning decisions.
Subagents run at low or medium effort — they execute well-defined sub-tasks.

This is the cost-optimal shape for the dynamic-workflow feature.

Pattern 4: Final-answer high, exploration low

For deep research or analytical workflows:

Exploration/iteration steps at low or medium effort.
Final synthesis or answer step at high effort.

This pattern preserves quality where it counts (the user-visible output) while keeping the search loop cheap.

How to set effort level

In claude.ai

The effort selector is in the conversation UI for Opus 4.8. Set it before sending a message.

In the Claude API

{
  "model": "claude-opus-4-8",
  "messages": [...],
  "effort": "low" // or "medium" or "high"
}

The default is "high" if effort is omitted.

In Claude Code

Set it via the model selection prompt or in .claude/settings.json. Claude Code defaults to high for Opus 4.8.

Common mistakes to avoid

Defaulting to low everywhere. You’ll have a bad time on the hard 20% of tasks.
Defaulting to high everywhere. You’re overpaying by 3-6x on simple tasks.
Ignoring the mid-conversation system message feature. Opus 4.8 lets you change instructions (including effort context) mid-task without restating the full system prompt — this preserves prompt cache hits.
Forgetting effort affects latency. Real-time UIs feel sluggish at high effort; budget UI for the worst case.

When to use Sonnet 4.6 instead

If low effort feels right for a task class, also consider Sonnet 4.6:

Sonnet 4.6 is cheaper than Opus 4.8 at low effort.
Sonnet 4.6 has strong writing style and instruction-following.
Sonnet 4.6 wins for high-volume, latency-sensitive customer-facing workloads.

The decision tree: simple task + need Opus voice → Opus low; simple task + voice doesn’t matter → Sonnet 4.6.

Sources

Anthropic: “Claude Opus 4.8” release announcement, May 28, 2026
Knowledge Hub Media: “Claude Opus 4.8 everything you need to know”
Caylent: “Claude Opus 4.8 — what improved, what’s new”
Anthropic platform docs: claude.com/docs (Opus 4.8 effort levels)
Apidog: “Claude Opus 4.8 vs GPT-5.5 vs Gemini 3.5”

Published June 21, 2026 by andrew.ooo. See Claude Opus 4.8 vs Opus 4.7 and Claude Opus 4.8 fast mode vs GPT-5.5 vs Gemini 3.5 Flash cost.