What is Cursor 3.6 Auto-review run mode?

Cursor 3.6 Auto-review (released May 29, 2026) is a new run mode that lets Cursor agents work longer with fewer approval prompts. It categorizes Shell, MCP, and Fetch tool calls into three tiers: allowlisted calls (execute immediately), sandboxable calls (run inside a sandbox), and other agent actions (sent to a classifier subagent that either allows, retries differently, or asks for user approval). You configure it in Settings → Agents → Run Mode. Cursor explicitly labels Auto-review as best-effort, not a security guarantee.

How does Cursor 3.6 Auto-review compare to Claude Code permissions?

Claude Code uses a static permissions system: you configure allowed tools, denied tools, and prompt-on-each settings in claude.json or per-project settings. Each tool invocation either runs (allowed), prompts you (default), or fails (denied). Cursor 3.6 Auto-review adds a classifier subagent that makes runtime decisions on unknown calls. Claude Code is more predictable; Cursor 3.6 is more adaptive but introduces classifier-induced uncertainty. Claude Code added a /goal command and pinned background sessions in May 2026 patches, but its permission model remains static.

How does OpenAI Codex sandbox compare to Cursor Auto-review?

OpenAI Codex (cloud and CLI variants in May 2026) sandboxes each task in an ephemeral container. Network, filesystem, and shell are constrained by the container itself, not by per-call approval. Cursor 3.6 Auto-review runs on your local machine and uses tiered classification per call. Codex's model: contain the blast radius via container boundaries. Cursor's model: classify each call and decide on the fly. Codex is safer by construction; Cursor 3.6 keeps you closer to local files but trusts a classifier.

Which should I use for long-running agent sessions?

For multi-hour autonomous work on isolated tasks: Codex cloud sandbox is the safest bet — the container limits damage. For local development where you want fewer prompts but still want to keep an eye on things: Cursor 3.6 Auto-review with a tight allowlist is a strong middle ground. For mission-critical refactors where you want predictable, audit-friendly behavior: Claude Code's static permissions plus dynamic workflows (Opus 4.8, capped at 1,000 subagents) gives you orchestration without classifier surprise. Most serious teams use a mix.

Quick Answer

Cursor 3.6 Auto-review vs Claude Code Permissions vs Codex Sandbox (June 2026)

Published: June 1, 2026

Cursor 3.6 Auto-review vs Claude Code Permissions vs Codex Sandbox (June 2026)

Cursor 3.6 shipped Auto-review on May 29, 2026. The headline pitch from Cursor: “Auto-review is a new run mode that allows Cursor to work for longer with fewer approval prompts and safer execution.” That’s a direct response to approval fatigue — the thing that kills long agent sessions in every IDE that takes safety seriously.

But every major AI coding agent now has a different theory of how to keep agents safe at scale. Here’s the head-to-head.

Last verified: June 1, 2026.

TL;DR

	Cursor 3.6 Auto-review	Claude Code Permissions	OpenAI Codex Sandbox
Released	May 29, 2026	Iterative through May 2026	Cloud + CLI active May 2026
Model	Tiered classifier + sandbox	Static allow/deny lists	Ephemeral container per task
Where it runs	Local machine	Local machine	OpenAI cloud (Codex Cloud) or local (Codex CLI)
Per-call decision	Classifier subagent (runtime)	Static rules (configured ahead)	Container boundaries (set at start)
Approval prompts	Fewer, classifier-mediated	Prompt-per-tool by default	Rarely — container does the gating
Best for	Local dev, low-trust calls	Predictable, audit-friendly	Isolated long-running tasks

What Cursor 3.6 Auto-review actually does

The Cursor changelog entry (May 29, 2026) describes Auto-review as a run mode — a setting that applies to all tool calls during a session. It sits in Settings → Agents → Run Mode, with the option to provide custom instructions to steer the classifier.

Three tiers, applied per Shell, MCP, or Fetch call:

Allowlisted calls — execute immediately. You define these (e.g., git status, ls, npm test patterns).
Sandboxable calls — run inside a sandbox where the call can do work but is contained.
Everything else — sent to a classifier subagent. The classifier decides one of three outcomes: allow, suggest a different approach, or escalate to you for approval.

Cursor explicitly says in its release notes: “Auto-review is best-effort and not a security guarantee.” The classifier can be bypassed — it’s an input gate before execution, not a hard constraint on what the underlying model can do. There’s no schema check on generated code or other artifacts.

The practical impact: in a typical Cursor 3.6 session with Auto-review on, the number of approval prompts drops sharply, and the classifier handles the long tail of one-off shell calls that used to require you to click “Run” 40 times in a row.

What Claude Code permissions do

Claude Code (now running on Opus 4.8 as of May 28, 2026, with Sonnet 4.6 as the cheaper default) uses a static permission model. The relevant config lives in ~/.claude.json and per-project .claude/settings.json:

allow lists — tools or tool patterns that run with no prompt
deny lists — tools that never run
default — prompts on each call (unless overridden)
edit allow/deny — separate rules for filesystem writes
shell allow/deny — patterns for shell commands

There’s no classifier. The rules are exactly what you wrote. If you didn’t allowlist rm -rf, Claude Code will prompt you (or fail, if you denied it). If you allowlisted everything under node_modules, those edits run silently.

May 2026 Claude Code shipped a related but separate set of features: Agent View, pinned background sessions, the /goal command, /code-review (replacing /simplify), fast mode on Opus 4.7 (now Opus 4.8), and worktree flexibility. None of that changed the underlying permissions model — it’s still rule-based.

Why this matters: Claude Code is the most audit-friendly of the three. Every permission decision is traceable to a rule in a file you wrote. There’s no LLM in the loop deciding whether a curl call is safe. For regulated environments or paranoid engineers, that’s a feature.

What OpenAI Codex sandbox does

Codex in May 2026 ships in two flavors:

Codex Cloud — ephemeral cloud containers, each task isolated, network/fs gated by the container
Codex CLI — local, but each task can be configured to run inside a Docker container or restricted sandbox

The key difference: Codex’s safety story is container-shaped. You don’t gate individual calls. You gate the entire task: “this Codex task can read these files, write to this branch, talk to these APIs.” Inside the container, the agent does whatever it needs to.

This is more like the CI/CD model: trust the boundary, audit the artifact. Pull request comes out the other side, you review the diff, you merge or you don’t.

Codex Cloud added review tooling alongside Codex CLI’s terminal-native loop in May 2026. Combined with the upcoming GPT-5.6 (Polymarket prices 80–89% probability of release by June 30, 2026), Codex Cloud is positioned for fully autonomous task runs you supervise via PR rather than per-call prompts.

Side-by-side: a real session

Imagine you’re refactoring a service. You tell each agent: “Rename OrderService.process to OrderService.execute across the repo and update all callers.”

Cursor 3.6 Auto-review:

Agent runs rg "OrderService.process" → allowlisted (read pattern), executes
Agent runs sed -i ... to rename → classifier evaluates → “looks like a rename inside repo, allow”
Agent runs npm test → allowlisted, executes
Agent runs git push → classifier escalates (“push to remote, asking user”)
You approve, push happens

Claude Code:

Agent runs rg → allowed (configured)
Agent runs sed → prompts you, you allow once for this session
Agent runs npm test → allowed
Agent runs git push → denied (you set deny on push), agent prints the command for you to run manually

Codex Cloud:

You hand Codex the task in a branch-isolated container
Container has read access to repo, write access to its branch, no network except npm registry
Agent does everything in one shot inside the container
You get a PR with the diff, you review and merge

Which to use, by scenario

Local development, low-trust calls — Cursor 3.6 Auto-review with a tight allowlist. The classifier handles boring approvals, you still see escalations.

Audit-heavy environments — Claude Code’s static permissions. Every decision is in a file you can grep. Add dynamic workflows (Opus 4.8) for orchestration without giving up the rule-based gates.

Isolated long-running tasks — Codex Cloud. The container does the work, you review the PR. No per-call decision needed.

Mixed teams — Most serious 2026 teams use all three: Cursor 3.6 for inline edits and live debugging, Claude Code for orchestrated multi-file refactors with audit trails, Codex Cloud for “run this overnight in a branch.”

What Cursor 3.6 Auto-review does NOT do

Worth restating because Cursor itself says it: Auto-review is not a security guarantee. The classifier subagent can be tricked. Code generated inside the agent can still do bad things at runtime. The schema of files the agent produces is not validated.

If you need real isolation, you need a real container — which is exactly what Codex Cloud provides and what Claude Code’s static deny rules try to approximate.

Sources

Cursor changelog (Auto-review, May 29, 2026) — official release notes for Cursor 3.6 Auto-review
Cursor changelog: auto-review detail page — tier explanation and configuration
Anthropic: Claude Code release notes (May 2026 patches) — Agent View, /goal, fast mode
Anthropic: Claude Opus 4.8 — dynamic workflows and orchestration

Bottom line

Three different theories of agent safety, all shipped within weeks of each other:

Cursor 3.6: trust a classifier, keep things local, accept best-effort.
Claude Code: trust a rulebook, keep things auditable, accept friction.
Codex: trust a container, keep things isolated, accept latency.

There’s no winner. Pick the one whose tradeoff matches your work. Most teams running serious AI coding in mid-2026 use all three for different parts of the pipeline.