Skip to content

feat: add sandbox annotations and volume mounts#8058

Merged
rubenfiszel merged 63 commits intomainfrom
sandbox-volumes
Mar 5, 2026
Merged

feat: add sandbox annotations and volume mounts#8058
rubenfiszel merged 63 commits intomainfrom
sandbox-volumes

Conversation

@rubenfiszel
Copy link
Contributor

@rubenfiszel rubenfiszel commented Feb 23, 2026

Summary

Adds sandbox annotations and persistent volumes for Windmill scripts, with a first-class Claude Code sandbox template that combines both features.


Sandbox Annotations

Scripts can opt into nsjail/restricted execution with a single annotation in the script header:

# sandbox
def main():
    ...
// sandbox
export async function main() { ... }
  • Python/Bun: Forces nsjail sandboxing even when global sandboxing is disabled. The worker checks nsjail availability and errors clearly if unavailable.
  • Deno: Forces restricted permissions (no --allow-all).
  • Bun sandbox fix: Runs as non-root inside nsjail so tools like Claude Code that check process.getuid() work correctly.

Docker: Claude Code CLI

All three Dockerfiles (full, slim, slim-ee) now install the Claude Code CLI at /usr/bin/claude, accessible inside nsjail sandboxes (which mount /usr but not /root).


Volumes

Volumes provide persistent, workspace-scoped storage that survives across job executions. Files are stored in the workspace's configured S3/object storage and synced to/from the worker filesystem at job start/end.

Annotation syntax

# volume: mydata /tmp/data
# volume: models /opt/models
def main():
    # /tmp/data and /opt/models are populated from S3 and synced back after execution
    ...
// volume: agent-memory .claude
export async function main() {
    // .claude/ directory persists across runs
}

Dynamic volume names

Volume names support interpolation for dynamic per-input or per-workspace volumes:

  • $workspace → replaced with the current workspace ID
  • $args[param_name] → replaced with the value of a script input parameter
  • $args[config.env] → supports nested object access

Example: // volume: cache-$args[env] .cache with input env = "prod" → volume name cache-prod

How it works

  1. Parse: Volume annotations are extracted from the script header comment block
  2. Validate: Name format, target path safety (no .., restricted absolute prefixes), max 10 volumes per job, no duplicate names/targets
  3. Download: Files are fetched from S3 (volumes/{workspace_id}/{volume_name}/) with:
    • Local filesystem cache on workers with LRU eviction (configurable max size)
    • MD5-based change detection to skip unchanged files
    • Symlink preservation (stored as JSON metadata in S3)
    • Parallel downloads (8 concurrent) for both SQL and HTTP agent worker paths
  4. Mount: For nsjail sandboxes, volumes are bind-mounted read-write into the sandbox. For non-sandboxed execution, files are placed directly at the target path.
  5. Sync back: After job execution, changed/new files are uploaded, deleted files are removed from S3, and the volume metadata (size, file count, timestamps) is updated in the database.

Lease system

Volumes use an exclusive lease to prevent concurrent writes:

  • Lease acquired atomically via INSERT ON CONFLICT with 60-second expiry
  • Worker renews lease every 10 seconds during execution
  • Lease released on commit (successful sync-back) or expiry (worker crash)
  • Permission check happens before lease acquisition to prevent unauthorized retries from blocking legitimate users
  • HTTP agent workers retry lease acquisition up to 120 times (1s interval) with job cancellation checks

Two execution paths

  • SQL workers (direct DB/S3 access): Use download_volume and sync_volume_back directly with the workspace's S3 client
  • HTTP agent workers (proxy through API): Call volume REST endpoints (POST /begin, GET /file/*path, PUT /file/*path, POST /commit) which proxy to S3 through the API server with JWT-authenticated worker identity

Database schema

CREATE TABLE volume (
    workspace_id VARCHAR(50) NOT NULL REFERENCES workspace(id) ON DELETE CASCADE,
    name VARCHAR(255) NOT NULL,
    size_bytes BIGINT NOT NULL DEFAULT 0,
    file_count INTEGER NOT NULL DEFAULT 0,
    created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
    created_by VARCHAR(255) NOT NULL,
    updated_at TIMESTAMPTZ,
    updated_by VARCHAR(255),
    description TEXT NOT NULL DEFAULT '',
    lease_until TIMESTAMPTZ,
    leased_by VARCHAR(255),
    last_used_at TIMESTAMPTZ,
    extra_perms JSONB NOT NULL DEFAULT '{}',
    PRIMARY KEY (workspace_id, name)
);

Volumes are also registered as an asset kind (AssetKind::Volume) for the unified asset system, with granular permissions support via extra_perms.

API endpoints

Under /api/w/:workspace/volumes:

  • GET /list — List all volumes in the workspace
  • DELETE /:name — Delete a volume (checks for active lease, removes S3 objects)
  • POST /:name/begin — Acquire lease, return manifest + symlinks
  • GET /:name/file/*path — Download a file (lease required)
  • PUT /:name/file/*path — Upload a file (lease required)
  • POST /:name/commit — Release lease, update metadata, handle deletions + symlinks

Agent worker endpoints are mounted under /api/w/:workspace/agent_workers/volumes with JWT authentication.

Security

  • Workspace-namespaced S3 paths: volumes/{workspace_id}/{name}/ prevents cross-workspace collisions when sharing a bucket
  • Symlink validation: Rejects absolute targets, .. traversal, and targets that resolve outside the volume directory
  • Lease ownership verification: Download/upload endpoints verify the requesting worker holds the active lease
  • Path validation: Volume file paths reject null bytes, .. segments, and absolute paths
  • Volume name validation: 2-255 chars, alphanumeric start/end, no path separators or traversal

Claude Code Sandbox Template

A new "Claude Sandbox" script template that combines both features:

// sandbox
// volume: claude .claude

import { query } from "@anthropic-ai/claude-agent-sdk";

This creates a sandboxed environment where:

  • // sandbox ensures nsjail isolation — Claude Code runs in a restricted filesystem with no network access to internal services
  • // volume: claude .claude persists the .claude/ directory (session state, CLAUDE.md, skills) across runs, enabling:
    • Session resumption: Claude Code sessions survive job completion and can be resumed in subsequent runs
    • Persistent memory: Agent instructions, learned context, and configuration carry over
    • Token tracking: Usage counters accumulate across sessions

The template accepts an AgentInstructions input (a Record<string, string> of file paths to content) for injecting CLAUDE.md and skill files at runtime, making it composable with Windmill's fileset resources.


Frontend

  • Volume asset UI: New VolumeDetailDrawer and VolumesDrawer components for browsing, managing permissions, and deleting volumes in the asset explorer
  • Claude icon: New ClaudeIcon component used in the script template picker
  • Asset explorer: Volumes appear alongside other asset types with size/file count display
  • Share modal: Volume permission management via the standard ShareModal
  • Script editor: "Claude Sandbox" template option in the new script picker

Files changed

New crate: windmill-worker-volumes

  • lib.rs — Types (VolumeMount, VolumeState, FileEntry, SyncStats), annotation parser, name/target validation, dynamic name interpolation, 30+ unit tests
  • volume_oss.rs — OSS stubs for download_volume, sync_volume_back, volume_nsjail_mount
  • volume_ee.rs (EE) — Full implementation: S3 sync, filesystem cache with LRU eviction, MD5 change detection, parallel downloads/uploads, symlink handling, nsjail mount generation

Backend modifications

  • windmill-worker/src/worker.rs — Volume lifecycle integration in run_language_executor: parse → validate → download → mount → execute → sync back. Both SQL and HTTP agent worker paths.
  • windmill-worker/src/python_executor.rs#sandbox annotation parsing + nsjail enforcement
  • windmill-worker/src/bun_executor.rs//sandbox annotation + nsjail enforcement + non-root fix
  • windmill-worker/src/deno_executor.rs//sandbox annotation + restricted permissions
  • windmill-api/src/volumes_oss.rs — Volume REST API with EE switch
  • windmill-api/src/volumes_ee.rs (EE) — Full endpoint implementations
  • windmill-api/src/lib.rs — Agent worker volume routing + inject_agent_authed middleware
  • windmill-api-agent-workers/src/lib.rsextract_worker_name on OSS AgentCache
  • windmill-api-agent-workers/src/ee.rs (EE) — extract_worker_name on EE AgentCache
  • Migration: 20260226000000_add_volumesvolume table + AssetKind::volume

Frontend

  • VolumeDetailDrawer.svelte, VolumesDrawer.svelte — Volume management UI
  • ClaudeIcon.svelte — Claude logo icon
  • claude_sandbox.ts.template — Claude Code sandbox script template
  • Asset explorer, share modal, script editor integrations

Docker

  • Dockerfile, DockerfileSlim, DockerfileSlimEe — Claude Code CLI installation

Test plan

  • cargo check passes (CE and EE)
  • cargo test -p windmill-worker-volumes — 30+ unit tests pass
  • Integration tests for volume S3 roundtrip (SQL worker path)
  • Integration tests for agent worker HTTP volume endpoints
  • Manual test: # sandbox in Python script → nsjail used
  • Manual test: // volume: test /tmp/data → files persist across runs
  • Manual test: Claude sandbox template → session resumes across runs

🤖 Generated with Claude Code

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@cloudflare-workers-and-pages
Copy link

cloudflare-workers-and-pages bot commented Feb 23, 2026

Deploying windmill with  Cloudflare Pages  Cloudflare Pages

Latest commit: c149262
Status: ✅  Deploy successful!
Preview URL: https://a057d48f.windmill.pages.dev
Branch Preview URL: https://sandbox-volumes.windmill.pages.dev

View logs

rubenfiszel and others added 28 commits February 24, 2026 08:22
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add volume as a recognized asset kind in openflow spec and asset parser
- Add file_count column to volume table with migration
- Track file count during volume sync-back operations
- List volumes from both volume table and asset table (UNION query)
- Add volumes button on assets page with drawer showing volume list
- Add explore button in volumes drawer to open S3 file picker at volume prefix
- Fix S3 file picker to dynamically set rootPath for folder navigation
- Fix JobAssetsViewer to fetch code by script_hash when raw_code is missing
- Add "sandbox mode (nsjail)" log message to bun, python, and deno executors

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Restore create_wm_deployers_group to its original 20260224000000 timestamp
from main, and move add_volumes to 20260226000000 to avoid collision.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Volume names in annotations can now use $workspace and $args[key]
placeholders (same syntax as tag interpolation), resolved at runtime
from job args.

Examples:
  # volume: $workspace-data /tmp/data
  # volume: cache-$args[env] /tmp/cache

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add allowDelete prop to S3FilePicker that enables the delete button
even in readOnlyMode. Enabled on the assets page volume explorer.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ndbox template

- Add AI generation button (ResourceGen) in resource editor and ApiConnectForm
  to generate resource values from a prompt, with fileset-aware mode
- Fix FilesetEditor not updating when args change externally (e.g. from AI gen)
- Fix folder deletion in fileset editor by ensuring intermediate folder nodes
  always get trailing / in their path
- Error instead of silently skipping volume mounts when no workspace S3 storage
  is configured
- Update claude sandbox template with agent_instructions support

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…enterprise

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
# Conflicts:
#	backend/.sqlx/query-08f288d2781d823e109a9e5b8848234ca7d1efeee9661f3901f298da375e73f7.json
#	backend/ee-repo-ref.txt
#	frontend/package-lock.json
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Comment on lines +4216 to +4228
let volume_mounts = {
let comment_prefix = match language {
ScriptLang::Python3
| ScriptLang::Bash
| ScriptLang::Powershell
| ScriptLang::Ansible
| ScriptLang::Ruby => "#",
ScriptLang::Deno
| ScriptLang::Bun
| ScriptLang::Bunnative
| ScriptLang::Nativets
| ScriptLang::Go => "//",
_ => "",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: The comment prefix mapping here doesn't include all languages that have annotations. For example, ScriptLang::Ruby uses "#" but Ruby doesn't seem to have sandbox annotations defined. Also, if a language like PHP (which isn't listed) ever needs volume support, the empty string "" from the catch-all would make parse_volume_annotations parse nothing since no line starts with an empty prefix. Consider adding a comment to clarify which languages are intentionally supported for volumes.

"What is the fastest OSS workflow engine?" :
"What did I ask you before?"

process.env.ANTHROPIC_API_KEY = anthropic.apiKey;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Security consideration: The template sets process.env.ANTHROPIC_API_KEY = anthropic.apiKey as an environment variable. Since this runs inside a sandbox (nsjail), this is somewhat contained, but worth noting that any child process spawned by the Claude agent will also inherit this key. The permissionMode: 'bypassPermissions' and allowDangerouslySkipPermissions: true on line 44-45 also mean the Claude agent has full unrestricted access within the sandbox. This is expected for the use case but ensure the sandbox nsjail config is tight enough (no network to exfiltrate the key, etc.). The nsjail config does have iface_no_lo: true and clone_newnet: false, which allows network access — so the API key could theoretically be exfiltrated via network by the agent.

@rubenfiszel
Copy link
Contributor Author

/ai-fast re-review this

@claude
Copy link
Contributor

claude bot commented Mar 3, 2026

Claude finished @rubenfiszel's task in 2m 52s —— View job


Re-review: feat: add sandbox annotations and volume mounts

  • Gather context and understand current state of the branch
  • Review backend changes (Rust) — volumes subcrate, worker integration, API
  • Review frontend changes (Svelte) — volume UI, Claude sandbox template, ResourceGen
  • Review database migrations and tests
  • Post detailed review findings

Overall Assessment

Well-structured PR adding per-script sandbox forcing and volume mounts via annotations. The architecture follows existing codebase patterns: new windmill-worker-volumes subcrate with OSS/EE split, clean annotation parsing via the #[annotations] macro, DB migration, API routing, and frontend UI. Since the previous review, the volume_settings column was removed from the migration (good cleanup), and EE-only volume code was refactored into volume_ee.rs.

What looks good

  • OSS/EE separation follows the standard *_oss.rs / *_ee.rs pattern with #[cfg(feature = "private")]
  • Annotation parsing via #[annotations] macro for sandbox: bool is clean and minimal
  • Volume lifecycle is well-designed: parse → interpolate → acquire lease → download → mount → execute → sync back → release
  • Lease mechanism (30s TTL, 10s renewal, cleanup on failure) handles concurrency correctly
  • Test coverage: 13+ unit tests for parsing with good edge-case coverage (malformed input, blank lines, wrong prefix, paths with spaces)
  • Frontend volume UI (VolumesDrawer, VolumeDetailDrawer) is functional and consistent with existing patterns
  • $args[...] interpolation with nested object traversal for dynamic volume naming is a nice touch
  • ResourceGen component for AI-assisted resource generation is a useful addition

Issues

1. Performance — Regex compiled on every call (windmill-worker-volumes/src/lib.rs:49)

interpolate_volume_name compiles a new Regex per invocation. This runs per-volume per-job. Use LazyLock:

static RE: std::sync::LazyLock<regex::Regex> = std::sync::LazyLock::new(|| {
    regex::Regex::new(r#"\$args\[((?:\w+\.)*\w+)\]"#).unwrap()
});
let re = &*RE;

Fix this →

2. Security — path traversal on volume target (worker.rs:4401-4409)

When sandboxing is disabled, volume.target from user annotations is joined to job_dir without validation. A target like ../../etc/something could escape the job directory:

let resolved = std::path::Path::new(job_dir).join(&volume.target);
// No validation that resolved stays within job_dir
std::os::unix::fs::symlink(&state.local_dir, &resolved).ok();

Consider canonicalizing and validating the resolved path stays within job_dir, or restricting targets to simple relative paths (no ..).

Fix this →

3. Reliability — 120-second busy-wait blocks worker (worker.rs:4296-4337)

The lease acquisition loop sleeps for up to 120 seconds with no way to detect job cancellation. Consider checking canceled_by between iterations so cancellation isn't delayed by 2 minutes.

4. No validation on volume names (lib.rs:84-108)

The parser accepts any non-whitespace as a volume name. After interpolation, names are used in S3 paths (volumes/{name}/...), env var names (WM_VOLUME_{NAME}), and filesystem paths. Consider restricting to [a-zA-Z0-9_.$\[\]-] to prevent issues with special characters.

Minor observations

  • Down migration doesn't remove the volume enum value from asset_kind (Postgres limitation, but worth noting)
  • VolumeDetailDrawer.svelte:26 fetches all volumes then filters client-side — fine for initial impl but could be optimized with a dedicated endpoint later
  • Claude sandbox template exposes ANTHROPIC_API_KEY as env var with permissionMode: 'bypassPermissions' and allowDangerouslySkipPermissions: true inside a sandbox that allows network access (clone_newnet: false). This is intentional for the use case but the API key could theoretically be exfiltrated by the Claude agent via network
  • common.rs change for FilesystemStorage (lines 948-953): the early return for filesystem storage changes behavior — it now returns an ObjectStoreResource::Filesystem directly instead of going through the S3 resource construction path. This is correct for the volume use case but verify existing callers of get_workspace_s3_resource_path handle the new return type

New additions since last review

  • ResourceGen.svelte: Clean AI-assisted resource/fileset generation component using the copilot API. Good UX with the popover pattern and abort support. Minor note: the $effect with setTimeout for auto-focus (line 117-119) could use onMount instead, but it works.
  • FilesetEditor.svelte changes: Bidirectional sync between args and files with lastWrittenArgs tracking to detect external changes — solid approach to avoid infinite loops.
  • S3FilePickerInner.svelte: The allowDelete prop and rootPath refactoring to support volume browsing is well done. The [...new Set(displayedFileKeys)].sort() dedup is a good fix.

| Branch

rubenfiszel and others added 13 commits March 4, 2026 06:56
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…absolute mount paths

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
# Conflicts:
#	frontend/src/routes/(root)/(logged)/assets/+page.svelte
EE files (_ee.rs) are gitignored and should only be tracked in
windmill-ee-private. The symlink was accidentally committed,
causing CI to fail with "not writing through dangling symlink".

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add extract_worker_name() stub to OSS AgentCache so lib.rs doesn't
need #[cfg(feature = "private")] gates. Updates ee-repo-ref.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit updates the EE repository reference after PR #433 was merged in windmill-ee-private.

Previous ee-repo-ref: fee39059a9fc290754aea7ab18f2adff0c81242c

New ee-repo-ref: bd5e01ced1b319d11c7b24556285996e48fe793b

Automated by sync-ee-ref workflow.
@windmill-internal-app
Copy link
Contributor

🤖 Updated ee-repo-ref.txt to bd5e01ced1b319d11c7b24556285996e48fe793b after windmill-ee-private PR #433 was merged.

rubenfiszel and others added 9 commits March 4, 2026 22:53
…overflow

The run_language_executor async function grew ~1000 lines with volume
handling code, making the generated future struct too large for the
default stack. Extract volume setup and sync-back into 4 separate
async functions (setup_volumes_sql_worker, setup_volumes_http_worker,
sync_volumes_sql_worker, sync_volumes_http_worker) so each has its
own future struct, reducing the parent function's stack frame.

Fixes test_workflow_as_code stack overflow.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add POST /volumes/create endpoint with CLOUD_HOSTED limit
- Add GET /volumes/storage endpoint for volume storage name (non-admin)
- Add "New volume" popover button in VolumesDrawer
- Fix volume explore to use volumes/{workspace}/{name}/ prefix
- Fix volume explore to use correct secondary storage
- Keep VolumesDrawer open when exploring (stacked drawers)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@rubenfiszel rubenfiszel merged commit 5f0ef93 into main Mar 5, 2026
6 of 8 checks passed
@rubenfiszel rubenfiszel deleted the sandbox-volumes branch March 5, 2026 06:19
@github-actions github-actions bot locked and limited conversation to collaborators Mar 5, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant