AI agents · OpenClaw · self-hosting · automation

Quick Answer

How to Choose Claude Opus 4.8 Effort Level (June 2026)

Published:

How to Choose Claude Opus 4.8 Effort Level (June 2026)

Claude Opus 4.8 (released May 28, 2026) introduced user-adjustable effort levels — low, medium, and high. It’s one of the highest-impact decisions for your costs and quality in production agentic workflows. Here’s the decision framework with concrete examples.

Last verified: June 21, 2026.

TL;DR

  • High effort (default): Use for complex reasoning, multi-step planning, hard coding, ambiguous tasks. Best quality, slowest, most expensive.
  • Medium effort: Safe default when you don’t know task complexity. Balanced trade-off.
  • Low effort (fast mode): Use for simple tasks, lookups, summaries. ~3x cheaper, ~2.5x faster than earlier Opus fast modes.
  • Production tip: Mix levels within agentic loops — high for orchestrator, low for simple sub-tasks.

What effort level actually controls

Effort level adjusts how many internal reasoning tokens Claude Opus 4.8 spends before producing user-visible output. More reasoning tokens = more careful, considered answers. Fewer = faster cheaper answers that might miss nuance.

It’s not a different model — it’s the same Opus 4.8 model with different inference budget. This matters because the peak capability is the same; you’re choosing how much of that capability to spend on a given query.

The three settings

High effort (default)

  • What it does: Full reasoning depth. Opus 4.8 thinks extensively before answering, including planning, self-critique, and verification.
  • Speed: Slowest of the three.
  • Cost: Highest (3x more than low for similar input/output lengths).
  • Quality: Best. This is what scores 80.8% on SWE-bench and ~61 on Intelligence Index.
  • Default: Yes, this is what you get if you don’t specify.

Medium effort

  • What it does: Moderate reasoning. Some self-critique, less verification, faster than high.
  • Speed: ~1.7x faster than high.
  • Cost: ~1.5x cheaper than high.
  • Quality: Slightly below high; most users won’t notice the difference on routine tasks.
  • Best default: If you can’t predict task complexity ahead of time.

Low effort (fast mode)

  • What it does: Minimal internal reasoning. Direct generation, light self-checking.
  • Speed: ~2.5x faster than earlier Opus fast modes.
  • Cost: ~3x cheaper than fast modes in earlier Opus versions; meaningfully cheaper than Opus 4.8 high.
  • Quality: Significantly below high on complex tasks; comparable on simple tasks.
  • Best for: Tasks where you’d otherwise route to Sonnet 4.6 or Haiku, but want Opus’s voice/style.

When to use each — decision rules

Use low effort when:

  • Single-turn lookup (“what’s the syntax for X?”)
  • Short summary of well-structured input
  • Conversational reply that doesn’t require reasoning
  • Bulk processing where throughput > per-item quality
  • Code formatting, doc generation from clear spec, test scaffolding
  • Any task where you’d consider routing to a smaller model

Use medium effort when:

  • You don’t know how complex the task will be
  • Production workflows that span varied query types
  • Cost-sensitive but quality-meaningful customer-facing features
  • General-purpose chatbot use

Use high effort when:

  • Multi-step reasoning or planning
  • Ambiguous requirements
  • Long-context analysis (legal docs, large codebases)
  • High-stakes outputs (medical, financial, legal, executive comms)
  • Coding tasks that span multiple files or require architectural reasoning
  • Final answer step in any agentic workflow

Cost-quality math: a concrete example

A 2K-token-in, 500-token-out coding query on a moderate-complexity bug fix:

EffortApprox costApprox latencyAccuracy on hard SWE-bench tasks
Low~$0.05~2-4 secSignificantly lower
Medium~$0.18~6-10 secMid-tier
High~$0.30+~15-30 sec80.8% (the headline number)

The naive cost optimization is “use low for everything.” The actual cost optimization accounts for re-runs: if a low-effort answer is wrong 40% of the time and you re-run those at high effort, your total spend is higher than if you just started at medium. The right strategy is per-task-class, not blanket.

Production patterns

Pattern 1: Tiered routing

A router (cheap model or rule-based) classifies each incoming query, then routes to the right Opus 4.8 effort level:

  • Simple FAQs → Sonnet 4.6 (cheaper than Opus low).
  • Moderate complexity → Opus 4.8 low.
  • Hard reasoning → Opus 4.8 high.

This is the standard production approach in 2026.

Pattern 2: Effort escalation

Start every query at low effort. If the model expresses low confidence (Opus 4.8 is “more likely to flag uncertainty” per the release notes), re-run at high effort.

  • Saves cost on the 60-70% of queries that work fine at low effort.
  • Catches the hard cases without missing them.
  • Adds latency on the escalated queries; not appropriate for real-time UIs.

Pattern 3: Orchestrator high, subagents low

In agentic workflows (especially Opus 4.8’s dynamic workflows that spawn hundreds of subagents):

  • Orchestrator runs at high effort — it makes the planning decisions.
  • Subagents run at low or medium effort — they execute well-defined sub-tasks.

This is the cost-optimal shape for the dynamic-workflow feature.

Pattern 4: Final-answer high, exploration low

For deep research or analytical workflows:

  • Exploration/iteration steps at low or medium effort.
  • Final synthesis or answer step at high effort.

This pattern preserves quality where it counts (the user-visible output) while keeping the search loop cheap.

How to set effort level

In claude.ai

The effort selector is in the conversation UI for Opus 4.8. Set it before sending a message.

In the Claude API

{
  "model": "claude-opus-4-8",
  "messages": [...],
  "effort": "low" // or "medium" or "high"
}

The default is "high" if effort is omitted.

In Claude Code

Set it via the model selection prompt or in .claude/settings.json. Claude Code defaults to high for Opus 4.8.

Common mistakes to avoid

  • Defaulting to low everywhere. You’ll have a bad time on the hard 20% of tasks.
  • Defaulting to high everywhere. You’re overpaying by 3-6x on simple tasks.
  • Ignoring the mid-conversation system message feature. Opus 4.8 lets you change instructions (including effort context) mid-task without restating the full system prompt — this preserves prompt cache hits.
  • Forgetting effort affects latency. Real-time UIs feel sluggish at high effort; budget UI for the worst case.

When to use Sonnet 4.6 instead

If low effort feels right for a task class, also consider Sonnet 4.6:

  • Sonnet 4.6 is cheaper than Opus 4.8 at low effort.
  • Sonnet 4.6 has strong writing style and instruction-following.
  • Sonnet 4.6 wins for high-volume, latency-sensitive customer-facing workloads.

The decision tree: simple task + need Opus voice → Opus low; simple task + voice doesn’t matter → Sonnet 4.6.

Sources

  • Anthropic: “Claude Opus 4.8” release announcement, May 28, 2026
  • Knowledge Hub Media: “Claude Opus 4.8 everything you need to know”
  • Caylent: “Claude Opus 4.8 — what improved, what’s new”
  • Anthropic platform docs: claude.com/docs (Opus 4.8 effort levels)
  • Apidog: “Claude Opus 4.8 vs GPT-5.5 vs Gemini 3.5”

Published June 21, 2026 by andrew.ooo. See Claude Opus 4.8 vs Opus 4.7 and Claude Opus 4.8 fast mode vs GPT-5.5 vs Gemini 3.5 Flash cost.