Designed features not yet implemented. Each links to its design spec. Create a plan file in this directory before starting implementation.
Based on parallel agents research.
Add a yoloai batch command (or similar) that creates multiple sandboxes from a task list. Input could be a file with one prompt per line, a markdown file with structured specs, or inline arguments. Each task gets its own sandbox against the same workdir. All sandboxes start in parallel.
Example: yoloai batch ./project tasks.md creates N sandboxes, one per task in the file.
Design considerations:
- Naming: auto-generate names from task index or allow a prefix (
--prefix feat-) - Prompt delivery: each sandbox gets its task as
--prompt-fileor--prompt - Options: inherit shared flags (agent, model, profile, aux dirs) from the batch command
- Output: summary table of created sandboxes
Currently all status writers (agent hooks, status-monitor.py, sandbox-setup.py) share a single agent-status.json, using the source field to distinguish writes. A cleaner approach: each mechanism writes to its own status file (hook-status.json, monitor-status.json, etc.), and the status getter reads them in priority order, preferring the most recently updated. This eliminates the source field hack and makes the IPC contract explicit per writer. Not part of the structured logging change — tracked separately.
Current implementation works but is fragile. See idle detection research for full audit, external research, and architecture proposal for a pluggable detector framework.
Replace the current test agent (plain bash) with a proper test harness process that simulates real agent workflows: startup sequence, accepting input, simulating work, transitioning to idle, and controllable exit. Should support mimicking different detection strategies (hook-based, pattern-based, context signals) via environment variables or commands, enabling integration testing of the full idle detection pipeline. Spec TBD.
Chain sandboxes sequentially so the output of one becomes the input of the next. Each stage runs an agent with its own prompt on the workdir as modified by prior stages.
Example: yoloai chain ./project pipeline.yaml runs stages in order, applying each stage's changes before starting the next.
Pipeline definition (YAML or similar) specifies an ordered list of stages, each with:
- Prompt or prompt file
- Agent and model (optional, inherit from defaults)
- Whether to pause for user review between stages (
--stepflag for interactive, default is unattended)
Data flow: stage N's workdir changes are applied (auto-apply) to produce stage N+1's starting state. Intermediate diffs are preserved for inspection. If a stage's agent exits with an error or the user rejects a stage's diff in --step mode, the pipeline stops.
Design considerations:
- Compose with batch: independent pipelines could run in parallel
- Resume: if a pipeline stops mid-way, allow resuming from the failed stage
- Naming: sandboxes could be named
<pipeline>-stage-1,<pipeline>-stage-2, etc. - Keep intermediate sandboxes around for inspection, or clean up on success (
--cleanup)
Enrich yoloai ls output for multi-sandbox workflows:
- Agent type and model
- Runtime duration (how long the sandbox has been running)
- Workdir dirty state (has uncommitted changes)
Keep default output concise; add --long or -l flag for the full dashboard view.
Block until the agent in a named sandbox exits, then return the agent's exit code. Useful for CI/CD pipelines and scripting. Without wait, polling yoloai list --json is the only way to detect completion.
yoloai wait <name> [--timeout <duration>]
- Blocks until the sandbox's tmux pane is dead (agent has exited)
- Returns the agent's exit code as yoloai's exit code (0 = done, non-zero = failed)
--timeout: fail with exit code 124 (matchingtimeout(1)) if the agent hasn't exited within the duration- Related to the deferred
yoloai run(#56 in OPEN_QUESTIONS) —runwould be sugar on top ofwait
See OPEN_QUESTIONS.md §77.
Add and remove bind mounts on a running sandbox without tearing it down. Preserves agent context when a mid-conversation need for an additional directory is discovered.
See spec in commands.md ### yoloai sandbox <name> mount.
Mechanism: nsenter --mount --target <container-pid> to enter the container's mount namespace and bind mount without restart. Requires root. Docker/Podman on Linux only (Tart and Seatbelt cannot support this structurally).
Persistence: added to live_mounts in meta.json (new DirMeta slice field); applied as regular Docker mounts on next start.
Allow users to override the auto-resolved detector stack via profile-level config. A detectors list in profile config.yaml would replace the automatically computed stack, letting users disable noisy detectors or change priority order. No CLI flag — config file only.
See idle detection research §3.9 Q1.
When applying changes, also fetch any new git tags from the sandbox's copy of the workdir so that tags created by the agent (e.g. version bumps, release tags) land on the host. Currently apply syncs file changes but does not transfer tags.
Document what filesystem paths, network endpoints, and IPC each supported agent tries to access under sandboxing. Use this to improve SBPL profile generation and agent definitions. Agent Safehouse publishes per-agent investigation reports that could serve as a starting reference.
See competitors research §9.
Allow profile config to declare named Docker volumes for package manager caches (npm, pip, cargo, etc.) that persist across sandboxes. Currently each sandbox starts with a cold cache. Shared volumes would avoid re-downloading dependencies when creating new sandboxes with the same profile.
Inspired by amazing-sandbox, which mounts ~15 named volumes for various package manager caches.
Design considerations:
- Profile config syntax: e.g.
cache_volumes: {npm: /root/.npm, pip: /root/.cache/pip, cargo: /usr/local/cargo} - Volumes are named per-profile to avoid cross-profile conflicts (e.g.
yoloai-base-npm) - Optional:
yoloai prune --cachesto clean up cache volumes - Consider whether the base profile should ship with sensible defaults for common caches
- Read-write mount; acceptable since these are caches, not project files
The containerd backend currently requires running the entire yoloai binary as root because CNI network namespace creation (netns.NewNamed, bridge plugin, IPAM) requires CAP_SYS_ADMIN + CAP_NET_ADMIN. This is terrible UX — users shouldn't need sudo for the main binary.
Fix: extract CNI/netns operations into a small privileged helper binary (yoloai-netsetup or similar). The main binary calls it via exec, passing namespace name and config path. The helper is either setuid root or granted file capabilities (setcap cap_net_admin,cap_sys_admin+ep). This follows the same pattern Podman uses for newuidmap/newgidmap.
The helper should handle:
setup <nsname> <containerName> <cniConfDir>— create netns, run CNI ADD, return JSON stateteardown <nsname> <containerName> <cniConfDir>— run CNI DEL, delete netns
The main binary retains ownership of sandbox directories (written as the calling user), so yoloai destroy and git ops work without permission errors.
See linux-vm-backends research for full analysis.
Five architectural issues that cause friction when adding new backends or agents. Each is independent. See plan for full spec.
Summary of issues:
meta.Backendstring comparisons (== "seatbelt") scattered outside the dispatch layer — should usemeta.HostFilesystem(aBackendCaps-derived field stored at creation time)- Agent-specific switch statements in
sandbox/create.go— should use anApplySettingsfunction field onagent.Definition - Exit-code typed errors in
internal/cli/— nearly all CLI errors exit 1 via plainfmt.Errorf; the typed error system (sandbox/errors.go) exists but is unused where it matters - Several sentinel errors in
sandbox/errors.go(ErrDockerUnavailable,ErrMissingAPIKey,ErrContainerNotRunning,ErrNoChanges) appear unused - (See plan for issues 5–10)
log.txt in the sandbox directory grows unbounded. There is no rotation or size cap. For long-running sandboxes or sessions that produce a lot of output, this can accumulate gigabytes of log data.
Options: size-based rotation (cap at N MB, keep last N files), integration with logrotate, or a --max-log-size config key. Low priority but worth addressing before GA.
No concurrency controls exist. Multiple simultaneous yoloai new calls with the same sandbox name, or concurrent yoloai start/destroy on the same sandbox, are not guarded. Could result in corrupted meta.json, double container creation, or partial state.
Fix: file-based lock per sandbox directory (e.g., meta.lock), held during operations that mutate sandbox state. Low priority for single-user CLIs but worth doing before any CI/CD integration.
All agents need a systematic audit of actual network traffic: capture traffic during full sessions (startup, auth, operation, token refresh, telemetry) and verify the allowlist covers everything. Gemini was missing oauth2.googleapis.com for OAuth token refresh; other agents likely have similar gaps.
Most important for --network-isolated mode where missing domains cause silent failures.
See OPEN_QUESTIONS.md §97.
Model aliases drift as providers release new models. Gemini's aliases already drifted once. Need a process to stay current: periodic manual review cadence, automated checks against provider APIs/docs, or pinning to stable -latest identifiers where available.
See OPEN_QUESTIONS.md §98.
Three unresolved questions needed before Codex network isolation is production-ready:
- Proxy support (#37): Whether Codex's static Rust binary honors
HTTP_PROXY/HTTPS_PROXYenv vars is unverified. Critical for--network-isolatedmode — if it ignores proxy env vars, iptables-only enforcement is the only option. - Required network domains (#38): Only
api.openai.comis confirmed. Additional domains (telemetry, model downloads) may be required. Needs traffic capture during a full Codex session. - TUI behavior in tmux (#39): Interactive mode (
codex --yolowithoutexec) behavior inside tmux is unverified. May affect idle detection and prompt delivery.
See OPEN_QUESTIONS.md §37–39.