Tilde.run Review: Versioned Filesystem for AI Agents

TL;DR

Tilde.run is a new agent sandbox that turns every AI agent run into a transaction you can roll back with one command. It mounts code from GitHub, data from S3, and documents from Google Drive as a single versioned ~/sandbox filesystem, audits every outbound network call, and atomically commits — or atomically discards — everything the agent did.

It hit Show HN on May 7, 2026 and pulled 197 points / 132 comments within 48 hours, which is a hard front-page result in a category that gets a new entrant almost weekly. What makes Tilde stand out from the dozen-or-so agent-sandbox launches I’ve covered this year:

Transactional commits. A run either fully commits or fully discards. No half-applied agent disasters.
One filesystem, three backends. GitHub repos, S3 buckets, and Google Drive folders show up as POSIX paths under ~/sandbox. Any tool, any language, no SDK required.
Network policy by default. Cloud metadata endpoints (169.254.169.254), private RFC1918 ranges, and unauthorized hosts are blocked unless explicitly allowed. Every outbound call is logged and tied to the agent that made it.
Built on lakeFS. Same versioning foundation that’s been running petabyte-scale data lakes since 2020 — so the rollback story isn’t theoretical.
Free private preview, install in one curl line, Python SDK and CLI both shipping at launch.

It’s also closed-source SaaS at the moment, which the HN comment thread is — fairly — not thrilled about. More on that below.

If you’re running coding agents, data agents, or any autonomous loop against real production data and are still using “I’ll watch the screen and Ctrl-C if it goes wrong” as your safety strategy, Tilde is the most production-shaped attempt at the rollback-everything pattern that’s landed this year. Here’s what it actually does, how to run it, what’s real, and what’s still hand-wavy.

Quick Reference

Field	Value
Site	tilde.run
Made by	Treeverse (the lakeFS team)
Pricing	Free private preview; consumption-based pricing planned
License	Closed source (managed SaaS)
Install	`curl -fsSL https://tilde.run/install \| sh`
Interface	CLI, Python SDK, MCP-compatible (Claude works with it)
Backends	GitHub, S3, Google Drive (more planned)
Networking	Allow/deny/approve egress policies, full audit log
HN launch	197 points, 132 comments (May 7, 2026)

What Problem Tilde Actually Solves

Most agent sandboxes today fall into two camps:

Container isolation. Run the agent in Docker, wipe it after. Good for code execution, terrible for agents that need persistent state across runs.
Local snapshot. btrfs/ZFS snapshot before the run, roll back on failure. Works, but only on one box and only for the local filesystem — not S3, not GitHub, not Drive.

Tilde sits in a third spot: a managed sandbox where the unit of safety is the entire run as a transaction, and the storage being protected is not just /tmp but your actual production data sources.

The mental model the lakeFS team is reusing is git for data. lakeFS already does atomic, branched, conflict-detecting versioning over object storage at petabyte scale — Tilde wraps that in an agent runner with sandboxing and network policy on top. From a maintainer comment on HN:

Atomic commits are based on snapshotting done by lakeFS under the hood. Each sandbox run produces a new atomic commit to a hidden “main” branch. Updating that branch is optimistically concurrent, with lakeFS checking for conflicts — multiple writers updating the same object.

Optimistic concurrency with object-level conflict detection is exactly how you’d design this if you were serious about multiple agents touching the same data.

How It Works (The Actual Workflow)

A Tilde run has three phases:

1. setup    →  Compose ~/sandbox from GitHub + S3 + Drive sources
2. execute  →  Agent runs in isolated container, all writes staged
3. decide   →  Approve & commit atomically, OR roll back & discard

The compose step is where it gets interesting. You point Tilde at a “repository” definition — really a manifest of source mounts — and it materialises a working directory:

~/sandbox
├── code/        ← github.com/acme/ml-pipeline (read-only by default)
├── data/        ← s3://acme-data/training/
├── docs/        ← gdrive://team-wiki/
└── output/      ← scratch space, fully writable

The agent sees a normal POSIX filesystem. It can cat, grep, ls, write Python files, run pandas — the usual. Under the hood, every write is staged into a copy-on-write snapshot. When the run exits cleanly, the snapshot becomes a new commit on a hidden main branch and is pushed back to the source backends. If anything fails — the agent crashes, exceeds a budget, gets killed — the snapshot is dropped.

Quickstart Code

Install:

curl -fsSL https://tilde.run/install | sh

CLI — one-shot agent run:

tilde exec my-team/documents \
  --image python:3.12 \
  -- /sandbox/code/agent.py --input /sandbox/data/reports
# sandbox running...
# sandbox completed. exit code: 0, commit id: c9d0e1f2

CLI — interactive shell (for debugging):

tilde shell my-team/documents --image python:3.12
# root@sb-7f3a9c01:/sandbox$ _

Python SDK:

import tilde

repo = tilde.repository("my-team/documents")

# Interactive sandbox
with repo.shell(image="python:3.12") as sh:
    sh.run("pip install pandas")
    result = sh.run("python agent.py --input /sandbox/data")
    print(result.stdout.text())

# One-shot execution
result = repo.execute("python agent.py", image="python:3.12")
print(result.stdout.text())

# Walk the audit timeline
for commit in repo.timeline():
    print(commit.id[:8], commit.message)

The Python SDK is intentionally tiny — three primitives (repository, shell, execute) plus timeline for inspection. That’s a good sign. Agent-tooling APIs that ship with 40 classes on day one almost always need to be rewritten by month six.

Network Policy in Practice

The egress audit is the feature that surprised me most. Every HTTP/DNS call out of the sandbox gets logged with timestamp, method, host, decision:

12:04:01  GET   api.openai.com/v1/completions     ALLOW
12:04:03  POST  api.anthropic.com/v1/messages     ALLOW
12:04:05  GET   pypi.org/simple/pandas            ALLOW
12:04:07  POST  evil-exfil.io/upload              DENY
12:04:08  GET   169.254.169.254/metadata          DENY
12:04:09  PUT   registry.npmjs.org/my-pkg         DENY

Default-deny on cloud metadata endpoints is the right call. AWS instance metadata exfiltration via prompt injection is a real attack class — half the prompt-injection PoCs that landed in 2024–2025 ended in “and now the agent has your AWS keys.” Blocking 169.254.169.254 by default removes the easiest version of that bug for free.

The RBAC DSL is similarly minimal:

analyst-policy:
  GetObject(path:"/data/*")               # ALLOW
  ?PutObject(path:"/reports/*")           # require human approval
  !PutObject(path:"/secrets/*")           # DENY

Three sigils — none, ?, ! — for allow / approve / deny. Easy to read, easy to grep, easy to diff in PRs.

Community Reactions (HN Thread Highlights)

The 132-comment thread is a useful corrective to the marketing site. A few representative voices:

On the demo video — top comment is unusually harsh:

Less is more and the first impression matters a lot. We see a new agent sandbox tool on the front-page almost every day. Most have an AI-made landing page design, lots of animations, lots of words. This has become a bad sign for me.

Fair. The demo does spend ~80% of its runtime on “configure permissions” which is the boring part. The interesting part — atomic rollback in action — is a few seconds at the end.

On positioning — the thread converges on a sharp note: showing the bad run is more compelling than showing the good run. “Agent deleted prod, here’s tilde rollback, here’s prod restored” beats “agent obeyed permissions correctly” as a demo.

On closed source — the spiciest exchange:

I had to dig hard to find this is a SaaS sandbox offering, not an actual sandbox I can use locally. There are now at least 3 Apache 2 projects (smolmachines, microsandbox, boxlite) working on sandboxes and at least one of them should be ready for primetime soon.

This is the sharpest critique and it’s well-founded. Tilde’s competitors in OSS — microsandbox, boxlite, and the smolmachines effort — don’t yet match Tilde’s storage-versioning UX, but they’re real. If Tilde stays closed source forever, the sandbox-as-fundamental-building-block argument is going to bite.

On persistence — a user articulates the actual gap Tilde fills:

I want my agent to have persistent storage that stays forever. Like a human with a computer. When the agent spins up again, it has access to the computer with the same files.

This is the killer use case. Most container sandboxes are ephemeral by design. Tilde’s “the sandbox commits back to your real storage” model means the agent’s files survive across runs, and every state is rollback-able. That’s hard to get with Docker + S3 yourself without rebuilding most of what lakeFS already does.

Honest Limitations

Closed source SaaS. This is the biggest one. For sandboxes — the trust boundary in agent systems — running closed binaries is a real concession. The lakeFS team has earned trust on the data-versioning side, but a self-hosted or open-core option will eventually be table stakes.
No pricing yet. Maintainers say “consumption-based, competitive with similar solutions.” Translation: budget unclear, lock-in risk medium until pricing lands. Don’t migrate critical workloads yet.
Atomic commits only cover filesystem state. API calls the agent makes (Stripe charges, emails sent, slack messages) are not transactional. The HN thread asks this explicitly and it has no clean answer — because there isn’t one. If your agent sends an email mid-run and you roll back, the email is still gone.
AWS-only metadata blocking for the first cut. GCP and Azure metadata endpoints will need similar default-deny rules.
Conflict resolution is “pick a side.” Multi-agent merges work at the file level (lakeFS semantics) but there’s no smart 3-way merge for source code. If two agents touch the same .py file, you choose one and rerun the other.
Image bring-your-own. You pass a Docker image (python:3.12, analyst:latest); you’re responsible for keeping that image trusted. Tilde isolates the run, not the image supply chain.
Private preview. Access is gated. Plan for some lead time before a real eval.

When Tilde Is the Right Tool

Strong fit:

Long-running data agents that touch S3 + GitHub + Drive and need atomic rollback (BI/research agents, data labelers, ETL agents).
Coding agents in YOLO mode against shared repos where “agent deleted half the codebase” is a real failure mode you’ve seen.
Any agent flow that needs human-in-the-loop approval gates with auditable per-action policies.
Teams already on lakeFS for data versioning — the mental model carries directly over.

Probably overkill:

Single-developer coding agents on a laptop. git + Claude Code’s built-in approval prompts are enough.
Pure code-execution sandboxes (run Python from chat, throw away). Microsandbox / E2B are simpler.
Air-gapped environments. Closed SaaS doesn’t fit.

Watch this space:

If Tilde ships a self-hosted edition (or open-cores the runner the way lakeFS open-cored its versioning engine), the calculus changes a lot.

How It Compares to Alternatives

Tool	Versioned FS	Multi-source mount	Net policy	Open source	Persistent state
Tilde.run	✅ atomic	✅ GH+S3+Drive	✅ default-deny	❌ closed SaaS	✅
E2B	❌	partial	basic	partial	partial
Microsandbox	❌	❌	basic	✅ Apache 2	❌
SlicerVM	snapshots	❌	✅	❌ paid	✅
Docker + btrfs (DIY)	✅ snapshots	❌	manual	✅	✅
InstaVM	❌	partial	basic	❌ paid	✅

Tilde’s unique slot is the multi-source versioned mount — the GitHub + S3 + Drive composition into one filesystem. Nothing else on the list does that today.

FAQ

Is Tilde open source? No. It’s a managed SaaS in private preview. The maintainers have not announced an open-source or self-hosted edition. The underlying versioning engine (lakeFS) is Apache 2.0, but the Tilde sandbox runner is not.

Does Tilde work with Claude Code / Claude Agent Skills? Yes. The marketing site shows a Claude integration where you tell Claude in plain English to spin up a sandbox and run the agent. Under the hood Claude calls the Tilde CLI (or the SDK via MCP). Any agent framework that can shell out can drive Tilde.

How does atomic commit really work for non-S3 backends like Google Drive? Tilde uses lakeFS as the consistent layer. Writes during the run go into a lakeFS branch; on commit, lakeFS publishes the new state and Tilde’s adapters push the deltas back to GitHub (as a branch + PR), S3 (as object writes), or Drive (as file replaces). Optimistic concurrency catches conflicts at the object level. There’s no global cross-backend two-phase commit — if a Drive write succeeds and an S3 write later fails on the same commit, the run is marked failed and the lakeFS branch is dropped. The Drive write is then orphaned and visible in audit, but won’t be referenced from any committed state.

Can I roll back API side effects (emails, Stripe charges)? No. Only filesystem state is transactional. Side effects through the network (HTTP POSTs that aren’t to your storage backends) are logged but not reversible. This is the same limitation every sandbox in this category has — distributed transactions across third-party APIs aren’t a solved problem.

How is this different from just using git? Three things. (1) git is per-repo; Tilde versions code + data + docs + scratch as one transaction. (2) git doesn’t do egress policy; Tilde blocks unauthorized network calls before they exfiltrate data. (3) git has no notion of “agent runs” as first-class objects with audit identity, approval gates, or RBAC. You could build all of this on top of git, but you’d be reimplementing lakeFS.

What does it cost? Free during the private preview. The maintainers say final pricing will be consumption-based and “competitive with similar solutions” but haven’t committed to numbers. Don’t move critical workloads until pricing is public.

Should I wait for the open-source competitors? Depends on your timeline. If you need the multi-source versioned filesystem feature today, Tilde is the only thing that does it. If you can wait six months and don’t need cross-source atomicity, microsandbox + lakeFS yourself + a network policy daemon will get you 80% of the way there for $0.

Bottom Line

Tilde.run is the first agent sandbox that takes the transactional part of “transactional sandbox” seriously, and it does it by reusing battle-tested infra (lakeFS) instead of inventing new versioning primitives. The closed-source-SaaS posture is a real concern for a category where trust matters, and the demo undersells the genuinely interesting capability — but the underlying design is sound and the API is small enough to integrate in an afternoon.

If you’re already living the “agent ate prod data” nightmare and your current safety story is “Ctrl-C and pray,” Tilde is worth the private-preview signup. If you’re building a sandbox-the-world platform play, watch closely — and watch even more closely if and when an OSS edition lands.

Show HN thread: news.ycombinator.com/item?id=48037724 Site: tilde.run Built by: the lakeFS team at Treeverse