Skip to content

feat: deepbash#229

Open
hntrl wants to merge 29 commits intomainfrom
hunter/deepbash
Open

feat: deepbash#229
hntrl wants to merge 29 commits intomainfrom
hunter/deepbash

Conversation

@hntrl
Copy link
Member

@hntrl hntrl commented Feb 14, 2026

This PR introduces deepbash, an in-process WASM-based bash execution backend for deepagents. Instead of shelling out to Docker containers, remote sandboxes, or the host OS, agents execute bash commands inside a WebAssembly sandbox with a virtual filesystem all within a single node process

The Problem

We need to give models access to a shell. The current approaches all have fundamental tradeoffs:

Approach Startup Security Portability State Introspection
Docker containers ~2-5s cold start Good (isolation) Needs Docker daemon None (opaque)
Remote sandboxes (E2B, Daytona, Modal) Network latency + cold start Good (isolation) Needs credentials + network None
Host bash (unsandboxed) Instant Dangerous OS-dependent None
deepbash (this PR) ~200ms Sandbox (WASM) Anywhere Node.js runs Full

The core tension: virtual backends (in-memory filesystem) can't expose their state to child processes spawned via execute(). A real bash process can't see your virtual FS. Containers solve this but add latency, infrastructure dependencies, and opacity. We wanted something that runs in-process, gives subprocess access to a shared virtual filesystem, and lets the host introspect every file operation.

The Solution: WASIX in WebAssembly

WASIX extends the WASI (WebAssembly System Interface) spec with POSIX features that standard WASI lacks — critically, subprocess spawning (proc_spawn) and Berkeley sockets (sock_open, sock_connect). Processes inside a WASIX sandbox share a virtual filesystem, meaning bash -c "python script.py && cat output.json" works because both bash and python see the same virtual files.

We forked the wasmer-js runtime (~5K lines of Rust) into a purpose-built package that strips networking, the Wasmer registry client, and deployment infrastructure, replacing them with fully offline package resolution from locally bundled .webc assets. The result is a self-contained WASM module (~5.4 MB) that boots a complete bash + coreutils environment with zero network calls.

Note

This is a pretty exploratory effort. I do think this is promising, but a fundamental limitation of this is that we can only bring binaries to the sandbox that can run in WASM (we can run python, but we can't run V8, rust, go, and a lot of other tooling which might make this incompatible for general purpose coding work)

Something I want to explore next is the same syscall introspection using https://github.com/butter-dot-dev/bvisor -- it's a little bit heavier but can give us the same backend introspection and without the process limits we have here

Architecture

flowchart TB
    subgraph Harness["Agent Harness (Node.js)"]
        direction TB
        subgraph Top[" "]
            direction LR
            AgentLoop["Agent Loop<br>(LLM, tools, langgraph)"]
            Mounts["BackendProtocol Mounts<br>/work → StateBackend<br>/data → FilesystemBackend"]
        end

        subgraph Backend["DeepbashBackend (extends BaseSandbox)"]
            VFS["In-memory virtual FS<br>(Map‹string, Uint8Array›)"]
            MountSys["Composable mount system"]
            SnapDiff["Snapshot/diff sync"]
            RPC["RPC-based subagent spawning"]
        end

        subgraph Runtime["deepbash-runtime (Rust → WASM, forked wasmer-js)"]
            WASIX["WASIX syscall layer<br>(wasmer-wasix 0.601.0)"]
            Workers["Multi-threaded execution<br>via Web Workers"]
            Offline["Offline package resolution<br>(InMemorySource)"]
            Assets["Bundled assets:<br>bash.webc + coreutils.webc"]
        end

        AgentLoop --> Backend
        Mounts --> Backend
        Backend --> Runtime
    end

    subgraph Executables["Bundled WASIX Executables"]
        Bash["bash (1.8 MB .webc)<br>GNU Bash 5.2"]
        Coreutils["coreutils (4.7 MB .webc)<br>100+ POSIX utilities"]
        Subagent["subagent (110 KB .wasm)<br>RPC CLI for agent spawning"]
    end

    Runtime --> Executables
Loading

Execution Flow

When an agent calls execute("grep -r TODO src/ | wc -l"):

sequenceDiagram
    participant Agent as Agent Loop
    participant DB as DeepbashBackend
    participant Mounts as Mounted Backends
    participant WASIX as WASIX Runtime (WASM)
    participant RPC as /.rpc/requests/

    Agent->>DB: execute("grep -r TODO src/ | wc -l")

    rect rgb(240, 248, 255)
        Note over DB,Mounts: Step 1-2: Snapshot
        DB->>Mounts: globInfo("**/*") for each mount
        Mounts-->>DB: File listings
        DB->>Mounts: downloadFiles([...])
        Mounts-->>DB: File contents (Uint8Array)
        DB->>DB: Populate WASIX Directory objects
    end

    rect rgb(240, 255, 240)
        Note over DB,RPC: Step 3: Mount subagent infra
        DB->>DB: /usr/local/sbin/subagent → subagent.wasm
        DB->>DB: /.rpc/requests/ → empty spool dir
    end

    rect rgb(255, 248, 240)
        Note over DB,WASIX: Step 4-5: Execute in WASM
        DB->>WASIX: entrypoint.run({ args: ["-c", cmd], mount: {...} })
        WASIX->>WASIX: bash forks grep → forks wc
        Note right of WASIX: All subprocesses share virtual FS
        WASIX->>RPC: subagent spawn "task" → writes JSON
        WASIX-->>DB: { stdout, stderr, code }
    end

    rect rgb(248, 240, 255)
        Note over DB,Mounts: Step 7: Diff & sync back
        DB->>DB: Walk Directory post-execution
        DB->>DB: Byte-level diff vs pre-snapshot
        DB->>Mounts: uploadFiles(changed/new files)
    end

    rect rgb(255, 240, 245)
        Note over DB,RPC: Step 8: Collect spawn requests
        DB->>RPC: Read *.json files
        RPC-->>DB: SpawnRequest objects
    end

    DB-->>Agent: { output, exitCode, truncated, spawnRequests }
Loading

Key Design Decisions

1. Why fork wasmer-js instead of using @wasmer/sdk?

The @wasmer/sdk npm package works, but it has problems for our use case:

  • Network dependency on cold start: bash.webc dependencies (coreutils) are resolved by querying registry.wasmer.io via GraphQL and downloading ~4.5 MB from cdn.wasmer.io on every cold start
  • Opaque internals: The SDK's Directory class is non-extensible — you can't inject custom filesystem handlers (not implemented yet, but the idea is we can pass direct backend protocol methods to the runtime)
  • Unused bulk: Networking gateway, registry client, app deployment code, WebSocket support — none of which we need

The fork (deepbash-runtime) preserves the battle-tested execution machinery (Command.run()task_dedicatedrun_command) while replacing the registry/networking layer with InMemorySource + a registerLocalPackage() API. The hot path is now:

flowchart LR
    subgraph Before["BEFORE (@wasmer/sdk)"]
        direction LR
        B1[bash.webc] --> B2[GraphQL query] --> B3[HTTP download] --> B4[parse Container]
    end
    subgraph After["AFTER (deepbash-runtime)"]
        direction LR
        A1[bash.webc] --> A2[InMemorySource lookup] --> A3[local bytes]
    end

    style After fill:#e6ffe6,stroke:#4caf50
    style Before fill:#fff3e0,stroke:#ff9800
Loading

2. Why WASIX instead of standard WASI?

WASIX is Wasmer's extension of WASI that adds features standard WASI lacks:

Feature WASI 0.2 (standard) WASIX
proc_spawn (subprocesses) No Yes
sock_* (Berkeley sockets) Partial Full
Threads (pthread_create) No Yes
fd_pipe (inter-process pipes) No Yes
proc_signal (signals) No Yes

The subprocess story is the dealbreaker. Without proc_spawn, bash -c "grep foo *.py | wc -l" can't work — bash needs to fork child processes that share the filesystem. WASIX is the only WASM standard that supports this, and Wasmer is the only runtime that implements it. Both are MIT licensed.

3. Subagent spawning: filesystem RPC, not stdout markers

Instead of parsing stdout for magic markers (fragile, can collide with real output), we use a filesystem-based RPC protocol:

Inside the sandbox, the subagent CLI (a 110 KB Rust binary compiled to WASM) writes JSON request files:

# Agent runs this inside WASIX bash:
subagent spawn "analyze the auth module for security vulnerabilities"
# Writes to /.rpc/requests/spawn-0.json:
{
  "id": "spawn-0",
  "method": "spawn",
  "args": { "task": "analyze the auth module for security vulnerabilities" },
  "timestamp": "0"
}

On the host side, after command execution completes, DeepbashBackend reads /.rpc/requests/*.json, parses the SpawnRequest objects, and returns them alongside the command output. The deepagents middleware then creates actual subagents from these requests.

Why this approach:

  • No stdout pollution — spawn requests travel through the filesystem, not stdout
  • Language-agnostic — any WASI-compiled program can write to /.rpc/requests/
  • Idempotent — monotonic IDs prevent double-processing
  • Inspectable — you can ls /.rpc/requests/ to see pending spawns
  • Extensible — the method field supports future RPC operations beyond spawn

4. Composable filesystem mounts via BackendProtocol

Rather than a monolithic virtual filesystem, deepbash supports mounting any BackendProtocol implementation into the WASIX sandbox at arbitrary paths:

const backend = await DeepbashBackend.create({
  mounts: {
    "/work": stateBackend,        // In-memory state (ephemeral)
    "/data": filesystemBackend,   // Real disk (persistent)
    "/memories": storeBackend,    // LangGraph Store (database-backed)
  }
});

Before execution, files are downloaded from each mounted backend and populated into WASIX Directory objects. After execution, a byte-level diff detects changes and uploads modified/new files back to the original backend. The WASIX process sees a unified filesystem while each mount path is backed by a different storage system.

This means an agent can cat /data/config.json, edit it, and cp it to /work/ — the change to /data/config.json flows back to the real filesystem, while the copy in /work/ stays in ephemeral state.

5. Interactive shell mode (daemon + attach)

Beyond batch execute() calls, deepbash supports persistent interactive shell sessions:

const session = await backend.shell();

// Stream-based I/O
session.stdout.pipeTo(process.stdout);
await session.writeLine("echo hello");
await session.writeLine("cd /work && ls -la");

// When done, sync changes and get exit code
const { exitCode } = await session.wait();

This enables the daemon/attach architecture demonstrated in examples/: a background process owns the WASIX sandbox, and multiple clients can attach to it for interactive shell access. The shell process persists across commands, maintaining environment variables, working directory, and process state.

The Rust Layer

deepbash contains two Rust crates:

rust/runtime/ — The WASIX runtime (~5K lines, forked from wasmer-js)

Core dependencies:

  • wasmer 6.1.0 — WASM engine (JS backend)
  • wasmer-wasix 0.601.0 — WASIX syscall implementation
  • webc 10.0.1 — WEBC container parser
  • virtual-fs 0.601.0 — Virtual filesystem traits

Key modifications from upstream:

  • src/lib.rs — Added registerLocalPackage() + setSdkUrl() APIs
  • src/package_loader.rs — Checks local GLOBAL_CONTAINERS store before HTTP
  • src/runtime.rsInMemorySource replaces BackendSource for dependency resolution
  • src/net.rs — Gutted (empty module)
  • Removed: src/registry/, wasmer-backend-api dep, bincode dep, WebSocket support

rust/subagent-cli/ — The subagent RPC binary (~165 lines)

A minimal Rust program compiled to wasm32-wasip1 that provides the subagent spawn <task> command inside the WASIX environment. Writes JSON files to /.rpc/requests/ using only POSIX filesystem operations. Compiled output: 110 KB.

How DeepbashBackend Fits the Framework

deepbash implements SandboxBackendProtocol from the deepagents framework by extending BaseSandbox:

classDiagram
    class BackendProtocol {
        <<interface>>
        read()
        write()
        edit()
        grep()
        glob()
    }
    class SandboxBackendProtocol {
        <<interface>>
        execute()
        id
    }
    class BaseSandbox {
        <<abstract>>
        read/write/grep/glob via shell
        execute()*
        uploadFiles()*
        downloadFiles()*
    }
    class DaytonaSandbox {
        remote container
    }
    class ModalSandbox {
        remote container
    }
    class DenoSandbox {
        Deno Deploy
    }
    class DeepbashBackend {
        in-process WASM ← this PR
    }

    BackendProtocol <|-- SandboxBackendProtocol
    SandboxBackendProtocol <|-- BaseSandbox
    BaseSandbox <|-- DaytonaSandbox
    BaseSandbox <|-- ModalSandbox
    BaseSandbox <|-- DenoSandbox
    BaseSandbox <|-- DeepbashBackend
Loading

BaseSandbox provides default implementations for all file operations (read, write, edit, grep, glob) by composing pure POSIX shell commands and routing them through execute(). This means DeepbashBackend only needs to implement three methods: execute(), uploadFiles(), downloadFiles(). Everything else — the entire BackendProtocol surface — works automatically through the inherited shell-based implementations.

The integration with createDeepAgent is a one-liner:

import { createDeepAgent } from "deepagents";
import { DeepbashBackend } from "deepbash";

const agent = createDeepAgent({
  backend: await DeepbashBackend.create(),
  model: "claude-sonnet-4-5-20250929",
});

// The agent now has execute(), read(), write(), edit(), grep(), glob()
// all running against the in-process WASIX sandbox
await agent.invoke({
  messages: [{ role: "user", content: "Find all TODO comments in the codebase" }]
});

File Changes Summary

New package: libs/deepbash/ (~15,000 lines added)

Area Files Description
TypeScript src/backend.ts, src/types.ts, src/index.ts, src/node.ts Backend class, types, exports
Rust runtime rust/runtime/src/** (~50 files) Forked wasmer-js execution engine
Rust subagent CLI rust/subagent-cli/src/main.rs RPC command binary
Assets assets/bash.webc, assets/coreutils.webc, assets/subagent.wasm Bundled WASIX executables
Tests tests/backend.test.ts, tests/backend.int.test.ts, tests/mounts.int.test.ts, tests/spawn.int.test.ts Unit + integration tests
Examples examples/daemon.ts, examples/attach.ts, examples/test.ts Interactive shell demos
Build rollup.config.mjs, scripts/download-assets.sh, package.json Build pipeline + asset download

Verification

# TypeScript typecheck
pnpm --filter deepbash typecheck

# Unit tests (no WASM runtime needed)
pnpm --filter deepbash test

# Integration tests (requires WASM assets)
pnpm --filter deepbash test:int

# Build (TypeScript + Rollup)
pnpm --filter deepbash build

# Build Rust → WASM (requires nightly toolchain + wasm-pack)
cd libs/deepbash && bash scripts/build-wasm.sh

# Download bundled assets (bash.webc, coreutils.webc)
cd libs/deepbash && bash scripts/download-assets.sh

# Full monorepo check
pnpm test && pnpm build

What's Next

  • Path-level permissions: The custom FileSystem trait enables read-only mounts, write-blocklists, etc. (architecture supports it, not yet implemented)
  • Escape-to-real-bash: For tools that can't run in WASM (npm, cargo, gcc), an opt-in escape hatch to the host shell
  • Real-time FS introspection: Replace snapshot/diff with direct Rust FileSystem trait callbacks for streaming file change events during execution
  • stack_checkpoint / stack_restore: WASIX syscalls for pausing and resuming execution mid-command
  • Python bindings: so this can be utilized by the python version of the harness aswell

hntrl added 29 commits February 13, 2026 17:47
@changeset-bot
Copy link

changeset-bot bot commented Feb 14, 2026

⚠️ No Changeset found

Latest commit: 457c06e

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant