Sandbox

Run agent code in an isolated Linux virtual machine — safely, locally, and with full dev environment capabilities.

The Sandbox is a shared Linux container powered by Apple's Containerization framework. It gives every Osaurus agent access to a real Linux environment with shell, package managers, compilers, and file system access — all running natively on Apple Silicon with zero risk to your Mac.

Why Sandbox?

Safe Execution

Agents can run arbitrary code, install packages, and modify files without any risk to the host macOS system. The VM is a disposable, resettable environment. If something goes wrong, reset the container and start fresh — your Mac is never affected.

Real Dev Environment

Agents gain a full Linux environment with shell access, Python (pip), Node.js (npm), system packages (apk), compilers, and standard POSIX tools. This far exceeds what macOS-sandboxed tools can offer, enabling agents to build, test, and run real software.

Multi-Agent Isolation

Each agent gets its own Linux user and home directory. One agent's files, processes, and installed packages cannot interfere with another's. Run multiple specialized agents simultaneously — a Python data analyst, a Node.js web developer, and a system administration agent — without cross-contamination.

Lightweight Plugin Ecosystem

Sandbox plugins are simple JSON recipes. No compiled dylibs, no Xcode, no code signing required. Anyone can write, share, and import plugins that install dependencies, seed files, and define custom tools — dramatically lowering the barrier to extending agent capabilities.

Local-First

Everything runs on-device using Apple's Virtualization framework. No Docker, no cloud VMs, no network dependency. The container boots in seconds and runs with native performance on Apple Silicon.

Seamless Host Bridge

Despite running in isolation, agents inside the VM retain full access to Osaurus services — inference, memory, secrets, agent dispatch, and events — via a vsock bridge. The sandbox is isolated but not disconnected.

Requirements

macOS 26+ (Tahoe) — required for Apple's Containerization framework
Apple Silicon (M1 or newer)

Getting Started

1. Open the Sandbox Tab

Open the Management window (⌘ Shift M) → Sandbox.

2. Provision the Container

Click Provision to download the Linux kernel and initial filesystem, then boot the container. This is a one-time setup that takes about a minute.

3. Start Using Sandbox Tools

Once the container is running, sandbox tools are automatically registered for the active agent. The agent can now execute commands, read/write files, install packages, and more — all inside the VM.

4. Install Plugins (Optional)

Switch to the Plugins tab to browse, import, or create sandbox plugins that extend your agents with custom tools.

Architecture

┌──────────────────────────────────────────────────────────────┐
│                        macOS Host                            │
│                                                              │
│  ┌──────────────┐     ┌──────────────────────────────┐       │
│  │   Osaurus    │     │   Linux VM (Alpine)          │       │
│  │              │     │                              │       │
│  │  SandboxMgr ─┼─────┤→ /workspace (VirtioFS)      │       │
│  │              │     │→ /output    (VirtioFS)       │       │
│  │  HostAPI  ←──┼─vsock─→ /run/osaurus-bridge.sock  │       │
│  │  Bridge      │     │                              │       │
│  │              │     │  agent-alice  (Linux user)   │       │
│  │  ToolReg  ←──┼─────┤  agent-bob    (Linux user)  │       │
│  │              │     │  ...                         │       │
│  └──────────────┘     └──────────────────────────────┘       │
└──────────────────────────────────────────────────────────────┘

Key components:

Component	Description
Linux VM	Alpine Linux with Kata Containers 3.17.0 ARM64 kernel, 8 GiB root filesystem
VirtioFS Mounts	`/workspace` maps to `~/.osaurus/container/workspace/`, `/output` maps to `~/.osaurus/container/output/`
NAT Networking	Container gets `10.0.2.15/24` via `VZNATNetworkDeviceAttachment`
Vsock Bridge	Unix socket relayed via vsock connects the container to the Host API Bridge server
Per-Agent Users	Each agent gets a Linux user `agent-{name}` with home at `/workspace/agents/{name}/`
Host API Bridge	HTTP server on the host, accessible from the container via `osaurus-host` CLI shim

Configuration

Configure the container via the Management window → Sandbox → Container tab → Resources section.

Setting	Range	Default	Description
CPUs	1–8	2	Virtual CPU cores allocated to the VM
Memory	1–8 GB	2 GB	RAM allocated to the VM
Network	outbound / none	outbound	NAT networking for outbound internet access
Auto-Start	on / off	on	Automatically start the container when Osaurus launches

Changes require a container restart to take effect.

Config file: ~/.osaurus/config/sandbox.json

{
  "autoStart": true,
  "cpus": 2,
  "memoryGB": 2,
  "network": "outbound"
}

Built-in Tools

When the container is running, sandbox tools are automatically registered for the active agent. Read-only tools are always available. Write and execution tools require autonomous_exec to be enabled on the agent.

Always Available (Read-Only)

Tool	Description
`sandbox_read_file`	Read a file's contents from the sandbox (supports line ranges, tail, char cap)
`sandbox_list_directory`	List files and directories (supports recursive listing via `tree`)
`sandbox_search_files`	Search file contents with ripgrep (regex, glob filters, context lines, case-insensitive)
`sandbox_find_files`	Find files by name glob pattern (e.g. `.py`, `test_`)

Requires Autonomous Exec

Tool	Description
`sandbox_write_file`	Write content to a file (creates parent directories)
`sandbox_edit_file`	Edit a file by exact string replacement — `old_string` must match exactly once
`sandbox_move`	Move or rename files and directories
`sandbox_delete`	Delete files or directories
`sandbox_exec`	Run a shell command (configurable timeout, max 300s)
`sandbox_exec_background`	Start a background process with log file output
`sandbox_exec_kill`	Kill a background process by PID
`sandbox_install`	Install system packages via `apk` (runs as root)
`sandbox_pip_install`	Install Python packages via `pip install --user`
`sandbox_npm_install`	Install Node.js packages via `npm install`
`sandbox_run_script`	Run a script file (auto-detects Python, Node, Bash, etc.)
`sandbox_whoami`	Get agent identity, home directory, installed plugins, and disk usage
`sandbox_processes`	List running processes for this agent
`share_artifact`	Share a file as a downloadable artifact
`sandbox_secret_check`	Check whether a secret exists for this agent (never reveals the value)
`sandbox_secret_set`	Store a secret securely — pass `value` directly or omit to prompt the user
`sandbox_plugin_register`	Register an agent-created plugin (requires `pluginCreate` permission)

All file paths are validated on the host side before container execution. Path traversal attacks are blocked by SandboxPathSanitizer.

Sandbox Plugins

Sandbox plugins are JSON recipes that extend agent capabilities inside the container. They can install system dependencies, seed files, define custom tools, and configure secrets — all without compiling code.

Plugin Format

{
  "name": "Python Data Tools",
  "description": "Data analysis toolkit with pandas and matplotlib",
  "version": "1.0.0",
  "author": "your-name",
  "dependencies": ["python3", "py3-pip"],
  "setup": "pip install --user pandas matplotlib seaborn",
  "files": {
    "helpers.py": "import pandas as pd\nimport matplotlib\nmatplotlib.use('Agg')\nimport matplotlib.pyplot as plt\n"
  },
  "tools": [
    {
      "id": "analyze_csv",
      "description": "Load a CSV file and return summary statistics",
      "parameters": {
        "file": {
          "type": "string",
          "description": "Path to the CSV file"
        }
      },
      "run": "cd $HOME/plugins/python-data-tools && python3 -c \"import pandas as pd; df = pd.read_csv('$PARAM_FILE'); print(df.describe().to_string())\""
    }
  ],
  "secrets": ["OPENAI_API_KEY"],
  "permissions": {
    "network": "outbound",
    "inference": true
  }
}

Plugin Properties

Property	Type	Required	Description
`name`	string	Yes	Display name
`description`	string	Yes	Brief description
`version`	string	No	Semantic version
`author`	string	No	Author name
`source`	string	No	Source URL (e.g., GitHub repo)
`dependencies`	string[]	No	System packages installed via `apk add` (runs as root)
`setup`	string	No	Setup command run as the agent's Linux user
`files`	object	No	Files seeded into the plugin folder (key = relative path, value = contents)
`tools`	SandboxToolSpec[]	No	Custom tool definitions
`secrets`	string[]	No	Secret names the plugin requires (user prompted on install)
`permissions`	object	No	Network policy and inference access

Per-Agent Installation

Plugins are installed per agent. Each agent can have a different set of plugins installed, and each installation is isolated in its own directory within the agent's workspace.

Install flow:

Validate plugin file paths
Start the container (if not running)
Create the agent's Linux user
Install system dependencies via apk
Create plugin directory and seed files via VirtioFS
Configure secrets from Keychain
Run the setup command
Register plugin tools

Managing plugins:

Open Management window → Sandbox → Plugins tab
Import plugins from JSON files, URLs, or GitHub repos
Create new plugins with the built-in editor
Install plugins to specific agents
Export and duplicate plugins for sharing

Plugin Tools

Each tool in a plugin's tools array becomes an AI-callable tool. The tool name is {pluginId}_{toolId}.

Parameters are passed as environment variables with the prefix PARAM_:

Parameter Name	Environment Variable
`file`	`$PARAM_FILE`
`query`	`$PARAM_QUERY`
`output_format`	`$PARAM_OUTPUT_FORMAT`

The run field is a shell command executed as the agent's Linux user with the working directory set to the plugin folder.

Secret Management

Agents can check for and store secrets (API keys, tokens) using sandbox_secret_check and sandbox_secret_set. Secrets are stored in the macOS Keychain, scoped per agent.

Two Storage Paths

Path	When	How
Direct	Agent already has the value (e.g., received via Host API or Telegram bot)	Pass `value` parameter to `sandbox_secret_set`
Prompt	Agent needs the user to provide the value (Chat or Work UI)	Omit `value` — a secure overlay appears with `SecureField` input

The prompt path keeps secret values out of the conversation history and LLM context entirely. The execution loop pauses via withCheckedContinuation until the user submits or cancels.

Prompt Flow

Agent calls sandbox_secret_set without value
Tool returns a secret_prompt marker (JSON with key, description, instructions)
The execution loop (Chat or Work) intercepts the marker and shows SecretPromptOverlay
User enters the secret value in a SecureField and submits (or cancels via button/ESC)
The value is stored in Keychain and the tool result is rewritten to {"stored": true, "key": "..."} (or cancelled)
Execution resumes with the sanitized result — the LLM never sees the secret

Robustness

SecretPromptState tracks a resolved flag, making submit() and cancel() idempotent
onDisappear on the overlay calls cancel() as a safety net if the view is dismissed unexpectedly
All session reset paths (cancelExecution, finishExecution, etc.) dismiss pending prompts before clearing state

Agent-Created Plugins

Agents can create, package, and register new plugins at runtime via the sandbox_plugin_register tool. This enables agents to extend their own capabilities during a session and share plugins for future use.

Requirements

autonomous_exec.enabled must be true on the agent
pluginCreate must be true in the agent's autonomous exec config

Workflow

Agent writes script files to ~/plugins/{plugin-id}/scripts/ (or any subdirectory)
Agent writes a plugin.json manifest defining the plugin name, description, tools, and dependencies
Agent calls sandbox_plugin_register with the plugin_id
The tool reads plugin.json, auto-packages all files in the directory into plugin.files, validates the plugin, and installs it
Plugin tools are hot-registered into the active session via CapabilityLoadBuffer — immediately usable without restart
A non-blocking toast notifies the user with a "Remove" action for later review

File Auto-Packaging

When sandbox_plugin_register loads a plugin directory, it recursively collects all files (excluding plugin.json itself) and merges them into the plugin's files map. Files explicitly defined in plugin.json take precedence over auto-discovered ones. This means agents only need to write files to disk and provide a minimal plugin.json — no manual files map is needed.

Plugin Persistence

Registered plugins are saved to the SandboxPluginLibrary and survive app restarts. They can be managed, exported, or removed from the Sandbox → Plugins tab.

Host API Bridge

The Host API Bridge connects the container to Osaurus services on the host. Inside the container, the osaurus-host CLI communicates with the bridge server over a vsock-relayed Unix socket.

Command	Description
`osaurus-host secrets get <name>`	Read a secret from the macOS Keychain
`osaurus-host config get <key>`	Read a plugin config value
`osaurus-host config set <key> <value>`	Write a plugin config value
`osaurus-host inference chat -m <message>`	Run a chat completion through Osaurus
`osaurus-host agent dispatch <id> <task>`	Dispatch a task to an agent
`osaurus-host agent memory query <text>`	Search agent memory
`osaurus-host agent memory store <text>`	Store a memory entry
`osaurus-host events emit <type> [payload]`	Emit a cross-plugin event
`osaurus-host plugin create`	Create a plugin from stdin JSON
`osaurus-host log <message>`	Append to the sandbox log buffer

All requests include the calling Linux username for identity verification.

Security

Path Sanitization

All file paths from tool arguments are validated by SandboxPathSanitizer before any container execution. Directory traversal attempts (..) are rejected, and paths are resolved relative to the agent's home directory.

Per-Agent Isolation

Each agent runs as a separate Linux user (agent-{name}). Standard Unix file permissions prevent agents from accessing each other's files and processes.

Network Policy

Container networking can be set to outbound (NAT with internet access) or none (completely isolated). Plugins can declare their own network requirements in the permissions field.

Rate Limiting

SandboxExecLimiter — Limits the number of commands an agent can run per conversation turn
SandboxRateLimiter — General rate limiting for sandbox operations and Host API bridge calls

Diagnostics

The Sandbox UI includes built-in diagnostic checks accessible from the Container tab. Click Run Diagnostics to verify the container is functioning correctly.

Check	What It Verifies
Exec	Can execute commands in the container
NAT	Outbound network connectivity
Agent User	Agent's Linux user exists and can run commands
APK	Package manager is functional
Vsock Bridge	Host API bridge is reachable from the container

Container Management

Start / Stop

Start — Boots the container (provisions first if needed)
Stop — Gracefully shuts down the container

Reset

Removes the container and re-provisions from scratch. All agent workspaces and installed plugins are preserved (they live in the VirtioFS-mounted /workspace).

Remove

Completely removes the container and all associated assets (kernel, init filesystem). Agent workspaces are preserved.

Access these operations from the Container tab → Danger Zone section.

Storage Paths

Path	Description
`~/.osaurus/container/`	Container root directory
`~/.osaurus/container/kernel/vmlinux`	Linux kernel
`~/.osaurus/container/initfs.ext4`	Initial filesystem
`~/.osaurus/container/workspace/`	Mounted as `/workspace` in the VM
`~/.osaurus/container/workspace/agents/{name}/`	Per-agent home directory
`~/.osaurus/container/output/`	Mounted as `/output` in the VM
`~/.osaurus/sandbox-plugins/`	Plugin library (JSON recipes)
`~/.osaurus/agents/{agentId}/sandbox-plugins/installed.json`	Per-agent installed plugin records
`~/.osaurus/config/sandbox.json`	Sandbox configuration
`~/.osaurus/config/sandbox-agent-map.json`	Linux username to agent UUID mapping

FilesExpand file tree

SANDBOX.md

Latest commit

History

SANDBOX.md

File metadata and controls

Sandbox

Why Sandbox?

Safe Execution

Real Dev Environment

Multi-Agent Isolation

Lightweight Plugin Ecosystem

Local-First

Seamless Host Bridge

Requirements

Getting Started

1. Open the Sandbox Tab

2. Provision the Container

3. Start Using Sandbox Tools

4. Install Plugins (Optional)

Architecture

Configuration

Built-in Tools

Always Available (Read-Only)

Requires Autonomous Exec

Sandbox Plugins

Plugin Format

Plugin Properties

Per-Agent Installation

Plugin Tools

Secret Management

Two Storage Paths

Prompt Flow

Robustness

Agent-Created Plugins

Requirements

Workflow

File Auto-Packaging

Plugin Persistence

Host API Bridge

Security

Path Sanitization

Per-Agent Isolation

Network Policy

Rate Limiting

Diagnostics

Container Management

Start / Stop

Reset

Remove

Storage Paths