AI agents · OpenClaw · self-hosting · automation

Quick Answer

Cursor 3.6 Auto-review vs Claude Code Permissions vs Codex Sandbox (June 2026)

Published:

Cursor 3.6 Auto-review vs Claude Code Permissions vs Codex Sandbox (June 2026)

Cursor 3.6 shipped Auto-review on May 29, 2026. The headline pitch from Cursor: “Auto-review is a new run mode that allows Cursor to work for longer with fewer approval prompts and safer execution.” That’s a direct response to approval fatigue — the thing that kills long agent sessions in every IDE that takes safety seriously.

But every major AI coding agent now has a different theory of how to keep agents safe at scale. Here’s the head-to-head.

Last verified: June 1, 2026.

TL;DR

Cursor 3.6 Auto-reviewClaude Code PermissionsOpenAI Codex Sandbox
ReleasedMay 29, 2026Iterative through May 2026Cloud + CLI active May 2026
ModelTiered classifier + sandboxStatic allow/deny listsEphemeral container per task
Where it runsLocal machineLocal machineOpenAI cloud (Codex Cloud) or local (Codex CLI)
Per-call decisionClassifier subagent (runtime)Static rules (configured ahead)Container boundaries (set at start)
Approval promptsFewer, classifier-mediatedPrompt-per-tool by defaultRarely — container does the gating
Best forLocal dev, low-trust callsPredictable, audit-friendlyIsolated long-running tasks

What Cursor 3.6 Auto-review actually does

The Cursor changelog entry (May 29, 2026) describes Auto-review as a run mode — a setting that applies to all tool calls during a session. It sits in Settings → Agents → Run Mode, with the option to provide custom instructions to steer the classifier.

Three tiers, applied per Shell, MCP, or Fetch call:

  1. Allowlisted calls — execute immediately. You define these (e.g., git status, ls, npm test patterns).
  2. Sandboxable calls — run inside a sandbox where the call can do work but is contained.
  3. Everything else — sent to a classifier subagent. The classifier decides one of three outcomes: allow, suggest a different approach, or escalate to you for approval.

Cursor explicitly says in its release notes: “Auto-review is best-effort and not a security guarantee.” The classifier can be bypassed — it’s an input gate before execution, not a hard constraint on what the underlying model can do. There’s no schema check on generated code or other artifacts.

The practical impact: in a typical Cursor 3.6 session with Auto-review on, the number of approval prompts drops sharply, and the classifier handles the long tail of one-off shell calls that used to require you to click “Run” 40 times in a row.

What Claude Code permissions do

Claude Code (now running on Opus 4.8 as of May 28, 2026, with Sonnet 4.6 as the cheaper default) uses a static permission model. The relevant config lives in ~/.claude.json and per-project .claude/settings.json:

  • allow lists — tools or tool patterns that run with no prompt
  • deny lists — tools that never run
  • default — prompts on each call (unless overridden)
  • edit allow/deny — separate rules for filesystem writes
  • shell allow/deny — patterns for shell commands

There’s no classifier. The rules are exactly what you wrote. If you didn’t allowlist rm -rf, Claude Code will prompt you (or fail, if you denied it). If you allowlisted everything under node_modules, those edits run silently.

May 2026 Claude Code shipped a related but separate set of features: Agent View, pinned background sessions, the /goal command, /code-review (replacing /simplify), fast mode on Opus 4.7 (now Opus 4.8), and worktree flexibility. None of that changed the underlying permissions model — it’s still rule-based.

Why this matters: Claude Code is the most audit-friendly of the three. Every permission decision is traceable to a rule in a file you wrote. There’s no LLM in the loop deciding whether a curl call is safe. For regulated environments or paranoid engineers, that’s a feature.

What OpenAI Codex sandbox does

Codex in May 2026 ships in two flavors:

  • Codex Cloud — ephemeral cloud containers, each task isolated, network/fs gated by the container
  • Codex CLI — local, but each task can be configured to run inside a Docker container or restricted sandbox

The key difference: Codex’s safety story is container-shaped. You don’t gate individual calls. You gate the entire task: “this Codex task can read these files, write to this branch, talk to these APIs.” Inside the container, the agent does whatever it needs to.

This is more like the CI/CD model: trust the boundary, audit the artifact. Pull request comes out the other side, you review the diff, you merge or you don’t.

Codex Cloud added review tooling alongside Codex CLI’s terminal-native loop in May 2026. Combined with the upcoming GPT-5.6 (Polymarket prices 80–89% probability of release by June 30, 2026), Codex Cloud is positioned for fully autonomous task runs you supervise via PR rather than per-call prompts.

Side-by-side: a real session

Imagine you’re refactoring a service. You tell each agent: “Rename OrderService.process to OrderService.execute across the repo and update all callers.”

Cursor 3.6 Auto-review:

  • Agent runs rg "OrderService.process" → allowlisted (read pattern), executes
  • Agent runs sed -i ... to rename → classifier evaluates → “looks like a rename inside repo, allow”
  • Agent runs npm test → allowlisted, executes
  • Agent runs git push → classifier escalates (“push to remote, asking user”)
  • You approve, push happens

Claude Code:

  • Agent runs rg → allowed (configured)
  • Agent runs sed → prompts you, you allow once for this session
  • Agent runs npm test → allowed
  • Agent runs git push → denied (you set deny on push), agent prints the command for you to run manually

Codex Cloud:

  • You hand Codex the task in a branch-isolated container
  • Container has read access to repo, write access to its branch, no network except npm registry
  • Agent does everything in one shot inside the container
  • You get a PR with the diff, you review and merge

Which to use, by scenario

Local development, low-trust calls — Cursor 3.6 Auto-review with a tight allowlist. The classifier handles boring approvals, you still see escalations.

Audit-heavy environments — Claude Code’s static permissions. Every decision is in a file you can grep. Add dynamic workflows (Opus 4.8) for orchestration without giving up the rule-based gates.

Isolated long-running tasks — Codex Cloud. The container does the work, you review the PR. No per-call decision needed.

Mixed teams — Most serious 2026 teams use all three: Cursor 3.6 for inline edits and live debugging, Claude Code for orchestrated multi-file refactors with audit trails, Codex Cloud for “run this overnight in a branch.”

What Cursor 3.6 Auto-review does NOT do

Worth restating because Cursor itself says it: Auto-review is not a security guarantee. The classifier subagent can be tricked. Code generated inside the agent can still do bad things at runtime. The schema of files the agent produces is not validated.

If you need real isolation, you need a real container — which is exactly what Codex Cloud provides and what Claude Code’s static deny rules try to approximate.

Sources

Bottom line

Three different theories of agent safety, all shipped within weeks of each other:

  • Cursor 3.6: trust a classifier, keep things local, accept best-effort.
  • Claude Code: trust a rulebook, keep things auditable, accept friction.
  • Codex: trust a container, keep things isolated, accept latency.

There’s no winner. Pick the one whose tradeoff matches your work. Most teams running serious AI coding in mid-2026 use all three for different parts of the pipeline.