|
| 1 | +# Agent Sandboxes |
| 2 | + |
| 3 | +Status: Proposal |
| 4 | +Audience: Product, platform operators, prospective customers |
| 5 | +Date: 2026-04-08 |
| 6 | + |
| 7 | +## What we're building |
| 8 | + |
| 9 | +A new Datum compute experience purpose-built for **AI agents that need |
| 10 | +their own isolated, ready-to-use environment**. Instead of asking users to |
| 11 | +assemble a workload, pick regions, and configure scaling, we're shipping a |
| 12 | +small set of resources that let anyone — or any agent — say: |
| 13 | + |
| 14 | +> "Give me a copy of the Python data-science sandbox, just for this session." |
| 15 | +
|
| 16 | +…and get a fully initialized, isolated environment back in the time it takes |
| 17 | +to load a web page. |
| 18 | + |
| 19 | +We call the new capability **Agent Sandboxes**. |
| 20 | + |
| 21 | +## Why we're building it |
| 22 | + |
| 23 | +The teams building AI agents today are stuck between two bad options: |
| 24 | + |
| 25 | +- **Run everything in one shared container.** Cheap and fast, but anything |
| 26 | + the agent does — installing packages, executing model-generated code, |
| 27 | + touching files — leaks into the next session. One bad command can break |
| 28 | + the environment for everyone. |
| 29 | +- **Spin up a full cloud workload per session.** Properly isolated, but |
| 30 | + slow to start, expensive to keep idle, and wildly over-engineered for |
| 31 | + "I just need a Python process for the next 20 minutes." Users have to |
| 32 | + learn deployment concepts that have nothing to do with their problem. |
| 33 | + |
| 34 | +Neither option fits how agents actually work. An agent platform typically |
| 35 | +needs **many small, short-lived, strongly isolated environments**, created |
| 36 | +on demand, often hundreds or thousands per day, with state that survives a |
| 37 | +pause and disappears when the session ends. |
| 38 | + |
| 39 | +This is also where the broader ecosystem is heading. The Kubernetes |
| 40 | +community has started a project — [agent-sandbox][upstream] — specifically |
| 41 | +to standardize this shape of compute. Datum is well-positioned to offer a |
| 42 | +best-in-class version of it: our underlying infrastructure (Unikraft-based |
| 43 | +microVMs with snapshot/restore) makes "instant, isolated, stateful" the |
| 44 | +default rather than the exception. |
| 45 | + |
| 46 | +The product opportunity is to turn agent sandboxes into a **catalog Datum |
| 47 | +ships and curates**, the way cloud providers ship machine images — except |
| 48 | +allocation is measured in milliseconds and idle copies cost almost nothing. |
| 49 | + |
| 50 | +## What becomes available |
| 51 | + |
| 52 | +Three new resources, layered so that the simple case stays simple and the |
| 53 | +advanced case stays possible. |
| 54 | + |
| 55 | +### 1. `Sandbox` — one isolated environment |
| 56 | + |
| 57 | +The core building block. A `Sandbox` represents a single, isolated, stateful |
| 58 | +environment running one image. It has a stable name, stable network address, |
| 59 | +and persistent storage that survives pause and resume. |
| 60 | + |
| 61 | +This is the lowest-level resource. Most users never touch it directly. |
| 62 | + |
| 63 | +### 2. `SandboxTemplate` — a reusable, curated environment definition |
| 64 | + |
| 65 | +A **named, versioned blueprint** for a kind of sandbox: which image to run, |
| 66 | +how much CPU/memory, which ports to expose, how long to keep it warm, what |
| 67 | +isolation level to use. Datum ships and maintains a catalog of these out |
| 68 | +of the box — `python-agent-runtime`, `node-agent-runtime`, `code-interpreter`, |
| 69 | +`headless-browser`, `jupyter-datascience`, and so on. Customers and partners |
| 70 | +can publish their own templates into their own namespaces using the same |
| 71 | +mechanism. |
| 72 | + |
| 73 | +Templates are the product surface. They are what users browse, pick, and |
| 74 | +build against. |
| 75 | + |
| 76 | +### 3. `SandboxClaim` — "give me a copy of that template" |
| 77 | + |
| 78 | +The user-facing request. A `SandboxClaim` says "I want a fresh sandbox based |
| 79 | +on template X, with these small overrides." The platform produces a per-claim |
| 80 | +`Sandbox` that is a fully independent copy — its own identity, its own |
| 81 | +storage, its own lifecycle. |
| 82 | + |
| 83 | +A claim is typically 5–10 lines and can be created by an agent without any |
| 84 | +documentation. It is the resource an agent platform creates per session, |
| 85 | +per user, or per task. |
| 86 | + |
| 87 | +### Behind the scenes: warm pools |
| 88 | + |
| 89 | +Each `SandboxTemplate` can keep a pool of pre-initialized copies ready to |
| 90 | +go. When a claim arrives, the platform hands out a warm copy and refills |
| 91 | +the pool in the background. The user sees a sub-second allocation; the |
| 92 | +operator sees a tunable knob on the template. |
| 93 | + |
| 94 | +Warm pools are not a separate resource the user or operator has to manage — |
| 95 | +they're a property of the template. |
| 96 | + |
| 97 | +## How this fits the existing platform |
| 98 | + |
| 99 | +Datum already has a `Workload` resource for declarative, multi-region, |
| 100 | +horizontally scaled applications. `Workload` is and remains the right tool |
| 101 | +for production services. Agent sandboxes are a *different* shape of compute: |
| 102 | + |
| 103 | +| | **Workload** | **Agent Sandbox** | |
| 104 | +|---|---|---| |
| 105 | +| Cardinality | Many replicas across regions | One environment per session | |
| 106 | +| Lifetime | Long-running | Minutes to hours, then gone | |
| 107 | +| Scaling | Horizontal, automatic | None — each sandbox is its own unit | |
| 108 | +| State | Usually external (DB, cache) | Local, persistent across pause | |
| 109 | +| Allocation time | Seconds to minutes | Milliseconds (from warm pool) | |
| 110 | +| Who creates it | A human, once | An agent, thousands of times | |
| 111 | + |
| 112 | +The two live side-by-side. We are not replacing `Workload`; we are adding |
| 113 | +the right primitive for the use case it was never designed for. |
| 114 | + |
| 115 | +As part of this work, the underlying repository is being renamed from |
| 116 | +`workload-operator` to **`compute`** to reflect that it now owns |
| 117 | +more than one top-level concept on the Datum compute platform. |
| 118 | + |
| 119 | +--- |
| 120 | + |
| 121 | +## User journeys |
| 122 | + |
| 123 | +### Journey A — The agent platform (consumer) |
| 124 | + |
| 125 | +**Persona.** Maya is building an AI coding assistant. When a user asks her |
| 126 | +agent to "analyze this CSV and plot the results," the agent needs to write |
| 127 | +and execute Python in a fresh, isolated environment, then throw it away. |
| 128 | + |
| 129 | +**Today, without agent sandboxes.** Maya stands up a Kubernetes cluster, |
| 130 | +writes a custom controller that creates pods per session, figures out how |
| 131 | +to give each pod its own storage, builds a queue of pre-warmed pods to |
| 132 | +hide cold starts, writes a janitor to clean up dead sessions, and worries |
| 133 | +constantly about whether one user's `pip install` can affect another's. |
| 134 | +Months of work before her agent runs its first line of code in production. |
| 135 | + |
| 136 | +**With agent sandboxes.** |
| 137 | + |
| 138 | +1. Maya browses the Datum sandbox catalog and picks `python-data-science`. |
| 139 | + She reads the one-page description: Python 3.12, pandas, numpy, |
| 140 | + matplotlib pre-installed, 2 GB RAM, 10 GB scratch disk, isolated per copy. |
| 141 | +2. In her agent code, when a session starts, she creates a `SandboxClaim` |
| 142 | + referencing that template. Five lines of YAML, or one API call. |
| 143 | +3. Within tens of milliseconds, the claim reports `Ready` with an endpoint |
| 144 | + her agent can connect to. The environment is fully initialized — the |
| 145 | + Python interpreter is warm, libraries are loaded, ready for the first |
| 146 | + command. |
| 147 | +4. Her agent uses the sandbox: writing files, executing code, generating |
| 148 | + plots. Everything stays inside that one copy. |
| 149 | +5. When the user's session ends — or after 15 minutes of inactivity — |
| 150 | + the sandbox is deleted. Storage goes with it. No cleanup code on |
| 151 | + Maya's side. |
| 152 | +6. If Maya wants something not in the catalog, she pushes her own image |
| 153 | + and Datum builds a custom template for her in her own namespace. The |
| 154 | + per-claim experience is identical. |
| 155 | + |
| 156 | +**What Maya never has to think about:** regions, scaling, image building, |
| 157 | +warm pools, cluster sizing, isolation backends, snapshot management, |
| 158 | +garbage collection, or the difference between a "container" and a "VM." |
| 159 | + |
| 160 | +### Journey B — The internal team (operator) |
| 161 | + |
| 162 | +**Persona.** Devon is on the Datum platform team. He owns the catalog of |
| 163 | +sandbox templates Datum ships to customers. A new team has asked for a |
| 164 | +`headless-browser` sandbox for agents that need to scrape and screenshot |
| 165 | +web pages. |
| 166 | + |
| 167 | +**With agent sandboxes.** |
| 168 | + |
| 169 | +1. Devon writes a Dockerfile for the headless-browser environment: |
| 170 | + Chromium, Playwright, a small HTTP wrapper. Standard stuff. |
| 171 | +2. He creates a `SandboxTemplate` in the Datum catalog namespace pointing |
| 172 | + at that image. He sets resource sizing, the ports to expose, a default |
| 173 | + idle timeout of 10 minutes, and a warm pool size of 10. |
| 174 | +3. Datum's build pipeline picks up the new template, builds the image |
| 175 | + for the appropriate isolation backend, validates it, and starts the |
| 176 | + warm pool. Devon watches the template's status go from `Building` to |
| 177 | + `Ready`. |
| 178 | +4. Devon runs a few test claims against it, confirms the browser works, |
| 179 | + sets the template to `Published`. It now appears in the customer-facing |
| 180 | + catalog. |
| 181 | +5. A week later, traffic has grown. Devon raises the warm pool from 10 |
| 182 | + to 50 by editing one field on the template. No customer change needed. |
| 183 | +6. A security advisory drops for Chromium. Devon publishes |
| 184 | + `headless-browser:1.1` as a new template version. New claims get the |
| 185 | + patched version automatically; existing live sandboxes keep running on |
| 186 | + the old version until their sessions end. No fleet-wide restart. |
| 187 | +7. Datum's billing and observability surfaces show per-template usage: |
| 188 | + how many claims, how long they live, how often the warm pool runs dry, |
| 189 | + how much storage they consume. Devon uses this to right-size the pool |
| 190 | + and report ROI. |
| 191 | + |
| 192 | +**What Devon never has to think about:** writing a controller, managing |
| 193 | +pods, hand-rolling a warm-pool scheduler, building a per-copy storage |
| 194 | +system, or coordinating rollouts across regions. |
| 195 | + |
| 196 | +### Journey C — The end customer of the agent (incidental) |
| 197 | + |
| 198 | +**Persona.** Priya is using Maya's coding assistant. She doesn't know what |
| 199 | +Datum is and never will. |
| 200 | + |
| 201 | +What she experiences: she asks the agent to do something. The agent |
| 202 | +responds in roughly the same time it would take any chatbot. Behind the |
| 203 | +scenes, a sandbox was claimed, used, paused, and cleaned up — but to |
| 204 | +Priya, it just felt like the assistant worked. Her data didn't leak into |
| 205 | +anyone else's session, and the assistant didn't get slower as more people |
| 206 | +used it. |
| 207 | + |
| 208 | +That invisible reliability is the actual product. |
| 209 | + |
| 210 | +--- |
| 211 | + |
| 212 | +## What success looks like |
| 213 | + |
| 214 | +- **Time-to-first-sandbox** for a new agent platform: under one hour from |
| 215 | + signup, with no infrastructure code written. |
| 216 | +- **Claim-to-ready latency** against a catalog template: under 50 ms at |
| 217 | + the 95th percentile. |
| 218 | +- **Idle cost** of a paused sandbox: an order of magnitude lower than |
| 219 | + a comparable always-on container. |
| 220 | +- **Catalog breadth**: Datum ships at least the top 5 agent runtimes |
| 221 | + (Python, Node, code interpreter, headless browser, notebook) in the |
| 222 | + initial release, with a clear path for customer-published templates. |
| 223 | +- **Operator ergonomics**: a new sandbox template can be added to the |
| 224 | + Datum catalog by one engineer in under a day. |
| 225 | + |
| 226 | +## Open product questions |
| 227 | + |
| 228 | +- Which templates ship in the launch catalog, and in what order? |
| 229 | +- What is the pricing shape — per claim, per active sandbox-minute, |
| 230 | + per warm-pool slot, or some combination? |
| 231 | +- Do we expose customer-published templates in v1, or hold them for v2? |
| 232 | +- How do we surface template versioning and deprecation to consumers |
| 233 | + who may have thousands of live claims at any moment? |
| 234 | + |
| 235 | +## What's *not* in scope |
| 236 | + |
| 237 | +- Replacing `Workload` for long-running, multi-region production services. |
| 238 | +- A general-purpose VM or container product. Agent sandboxes are |
| 239 | + opinionated on purpose: one image, one copy, one session. |
| 240 | +- A development IDE or notebook UI. Datum provides the runtime; the |
| 241 | + agent platform or developer tool provides the experience on top. |
| 242 | + |
| 243 | +[upstream]: https://github.com/kubernetes-sigs/agent-sandbox |
0 commit comments