lab · 2026-06-01

An Agent Needs Its Own Computer

#agents #sandboxes #harnesses

A model in a chat window can tell you how to fix the bug. It cannot fix it. It has no disk, no shell, nothing that survives the end of the message. Give it tools and it grows hands, and the only question left is where those hands land. Claude Code lands them on your machine, and that feels like a gift: the agent inherits your repo, your shell, your logged-in credentials, ready to go. The inheritance is the problem. It can also reach your SSH keys, your other repos, and the half-finished work in your downloads folder, and a wrong command there is wrong on your real disk. So it stops and asks before each step, and you sit in the chair waving each one through. Fine for pair programming. Useless for the agents I wanted to build.

I built Ralph and Piper to run without me there. Ralph takes a Linear ticket and comes back with a pull request. Piper takes a text from a contractor who has never opened a terminal and comes back with a PDF invoice. Neither can stop to ask permission, because no one is watching, and neither can run on my laptop, because I am asleep or the contractor is in a driveway. One thing makes both work: a sandbox, a disposable computer the agent owns, with nothing of mine inside it.

The cage is the unlock

A sandbox looks like a safety cage, a box that keeps the agent from wrecking the host. Run it for a while and the cage turns out to be the feature. The box holds the blast radius of any command, so you can hand the agent every tool unlocked, raw shell included, and skip the confirmations. Inside Ralph's sandbox the delegated Claude Code runs with no permission prompts at all. The box is the boundary, so a dialog in front of each command would guard a door that opens onto nothing. Drop the prompts and the agent works the way you do, trying a command, reading the error, trying the next, for an hour, with no one to wave it through.

The box is cattle, not a pet, the same line Anthropic reaches for to describe the containers behind its managed agents. You do not name it or nurse it when it dies; you throw it away and start another. Each conversation gets its own, so ten run at once without colliding, triggered from a Slack message, a GitHub comment, a text. That is what changes the shape of the work. Harvey's engineers, who moved their coding agents off laptops and into the cloud for the same reason, put the payoff plainly: you stop watching a terminal and wake up to reviewable pull requests.

A real computer, not a code runner

A sandbox sounds like a place to run code. Piper uses it to make things. A contractor texts "finished the Henderson attic, $1,800, Venmo," and Piper turns that line into a record and a file: it reads the photo of a handwritten estimate, does the math instead of guessing it, renders a real PDF invoice, and texts it back as a link that expires in an hour. The contractor never sees the sandbox, only the invoice. A chat model can describe what belongs on that invoice. Producing it and sending it takes a computer.

Same box, opposite jobs

Both agents run on the same kind of machine, with the same shell, internet, and filesystem inside. What each one guards is the opposite. Ralph's box guards a codebase from a confident mistake: the manager reads the git diff as ground truth instead of trusting what Claude Code claims it changed, boots the dev server, and drives a headless browser at it, because a clean compile says nothing about whether the thing works. Piper's box guards a customer's trust in a file: the contractor will see the invoice, so the machine exists to get one document right. It is the same computer either way.

The last step

The model brings the intelligence. The sandbox brings a body and somewhere to use it out of your sight. Line the three up: a chat model advises, Claude Code acts while you watch every move, a sandboxed agent acts while you do nothing. A better model will not carry you across that last gap; a disposable computer will, one that holds nothing of yours, locked down enough to leave alone and cheap enough to throw away. Build it on purpose, and you hand the agent a task and walk away.

References

ZongZi Wang and Gabe Pereyra, "Building Spectre", Harvey, 2026.
"Scaling Managed Agents: Decoupling the Brain from the Hands", Anthropic, 2026.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

;)