feat: Docker containerization + parallel multi-agent execution by mtgibbs · Pull Request #103 · snarktank/ralph

mtgibbs · 2026-02-13T19:03:42Z

Hey! I really like this paradigm and have been playing with it a bit! I wanted to add containerization to it for a bit of safety and try to define a dockerfile.ralph convention to help projects onboard. This way we can sandbox our AI assistants to avoid them writing or deleting on our file system with little oversight while they work independently. Worse case is that an agent nukes itself in the process of doing work. Thanks for taking a look! Below are the claude assisted changes and some information on how it works:

What's Added

Container sandbox (docker/) — 4 new files

Dockerfile — Base image: node:20-slim + Claude Code, non-root agent user (UID 1001), iptables for firewall
agent-loop.sh — Container entrypoint: initializes firewall, copies auth, clones from bare repo, claims stories via git atomic push, runs Claude in a loop, pushes results
init-firewall-builder.sh — iptables whitelist: Claude API + user-specified domains via --allow-domain. Everything else is denied.
init-firewall-researcher.sh — Full internet access for research-role agents

Parallel orchestrator (parallel/) — 10 new files

ralph-parallel.sh — Host-side orchestrator: builds image (auto-detecting Dockerfile.ralph), creates Docker networks, launches N containers, monitors health, recovers stale story claims, detects PRD completion and shuts down
stop.sh / status.sh — Graceful shutdown and live dashboard
CLAUDE-parallel.md — Parallel-aware prompt guiding agents through the claim/implement/push cycle
lib/ — Auth (env var > file > 1Password), Docker helpers, network setup, logging

Existing file changes — 3 files touched

.gitignore — Added .ralph/, agent_logs/, per-agent progress files
AGENTS.md / README.md — Documented parallel mode, CLI options, quick start

Key Design Decisions

Git as the coordination layer — A shared bare repo + atomic push for story claiming. No external database, no lock server. If two agents race to claim the same story, one push wins and the other retries with a different story.
Per-agent progress files — Each agent writes progress-agent-N.txt instead of all appending to one progress.txt, avoiding merge conflicts.
Dockerfile.ralph convention — Projects declare their runtime needs by adding a Dockerfile.ralph that extends the base image. Ralph auto-detects and builds it. Resolution: --image flag > Dockerfile.ralph > default base.
Configurable firewall via --allow-domain — No hardcoded package registries. Users whitelist what their project needs (registry.npmjs.org, pypi.org, etc.). Only api.anthropic.com and statsig.anthropic.com are always-allowed.
Volume-based auth — Claude credentials live in a Docker volume (ralph-claude-auth), populated once via claude login. Agents copy credentials at startup — no host token files mounted into containers.

Usage

# One-time auth setup
docker run -it --entrypoint bash \
  -v ralph-claude-auth:/home/agent/.claude \
  ralph-agent:latest
# Inside: claude login && exit

# Run 3 agents against a project
./parallel/ralph-parallel.sh \
  --project /path/to/my-project \
  --allow-domain registry.npmjs.org \
  --agents 3

# Monitor / stop
./parallel/status.sh --project /path/to/my-project
./parallel/stop.sh --project /path/to/my-project

🤖 Generated with Claude Code

Add parallel mode that runs N containerized Claude Code agents simultaneously against the same PRD, with network sandboxing, resource limits, and git-based story claiming. New directories: - docker/ — Dockerfile, container entrypoint, iptables firewall scripts - parallel/ — orchestrator, stop, status, parallel prompt, lib helpers Upstream ralph.sh and all existing files are untouched. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Switch from env-var token passing to Docker volume-based auth: - Mount ralph-claude-auth volume at /claude-auth:ro - agent-loop.sh copies credentials to writable ~/.claude/ - Add check_auth_volume() to verify volume before launch - Remove CLAUDE_CODE_OAUTH_TOKEN env var requirement Add --project DIR flag to orchestrator, status, and stop scripts so ralph can target external project directories. Bug fixes discovered during smoke test: - Fix UID 1000 conflict in Dockerfile (node:20-slim uses 1000) - Fix macOS seq counting down when count=0 (guard with -le 0) - Fix PARALLEL_PROMPT path resolution for external projects Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Allow projects to specify a custom Docker image via --image flag, enabling project-specific tooling (e.g., Deno, Python) without modifying the base ralph-agent image. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- Redirect all git output in claim_story() to stderr so only the story ID goes to stdout (prevents garbage in CLAIMED_STORY) - Wrap claim_story call in if-statement to prevent set -e from killing the script when claim returns non-zero - Fix setup_workspace to reset to current branch on restart, not hard-coded origin/main (preserves feature branches) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Deno projects need access to jsr.io (Deno's package registry) for dependency resolution and type checking. Without this, agents can't run `deno task check` inside builder containers. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…city - Mount only stop_requested file instead of entire .ralph/ directory, preventing agents from reading plaintext auth tokens - Switch stop signal from file-existence to file-content (-s not -f) since the file must exist for Docker bind-mount - Remove hardcoded --platform linux/arm64 so builds work on any arch - Replace hardcoded npm/jsr/deno firewall whitelist with --allow-domain flag, making the firewall language-agnostic - Use treeless bare clone (--filter=blob:none) to avoid exposing old file content that may contain secrets - Add SETENV to sudoers so RALPH_EXTRA_DOMAINS passes through sudo - Document custom image contract and extension pattern in README Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- Add push.autoSetupRemote to git config so first push on a new branch automatically sets up tracking - Skip git pull --rebase when remote branch doesn't exist yet (new branch from prd.json branchName) - Use file:// prefix for bare clone so --filter=blob:none takes effect (git ignores filters on local path clones) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Projects can now be "ralph-ready" by adding a Dockerfile.ralph to their root. When detected, ralph automatically builds a project-specific image (tagged ralph-agent-<project>:latest) without needing --image. Resolution order: --image flag > Dockerfile.ralph > default base image. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

greptile-apps · 2026-02-13T19:08:14Z

Greptile Overview

Greptile Summary

This PR adds Docker containerization and parallel multi-agent execution to Ralph, enabling multiple Claude Code agents to work simultaneously on PRD stories in isolated sandboxes. The implementation uses git atomic push for story claiming coordination, iptables firewall for network restrictions, and per-agent progress files to avoid merge conflicts.

Key additions:

Docker image based on node:20-slim with Claude Code, non-root user, and firewall capabilities
Agent loop script that clones from bare repo, claims stories atomically, runs Claude, and pushes results
Orchestrator that launches N containers, monitors health, recovers stale claims, and detects completion
Network firewall restricting builder agents to Claude API + whitelisted domains (configurable via --allow-domain)
Dockerfile.ralph convention for project-specific runtime requirements
Comprehensive documentation with examples, auth setup, and debugging guide

Issues found:

Date parsing incompatibility: ralph-parallel.sh:357-361 uses macOS-specific date -j syntax that will fail on Linux (the actual container platform)
Incomplete token refresh feature: documented in README but check_token_refresh_file() never called from orchestrator
Static DNS resolution in firewall may cause connectivity loss if IPs change
--dangerously-skip-permissions flag used with autonomous agents (security trade-off documented in PR description)

Architecture highlights:

Git as coordination layer eliminates need for external lock server or database
Bare repo (.ralph/repo.git) enables reliable multi-agent pushing without conflicts on checked-out branches
Resource limits (--memory, --cpus, --pids-limit) prevent runaway containers
Graceful shutdown with 120s timeout before force-kill
Automatic stale claim recovery (30min threshold)

Confidence Score: 4/5

Safe to merge with one critical fix needed for Linux compatibility
The implementation is well-architected with solid error handling, security sandboxing, and comprehensive documentation. The git-based coordination layer is elegant and the Docker containerization achieves the stated security goals. However, the date parsing bug in ralph-parallel.sh will cause runtime failures on Linux (the primary target platform), and the token refresh feature is incomplete. These issues are fixable but prevent a score of 5.
Pay close attention to parallel/ralph-parallel.sh (date parsing bug on line 357) and verify token refresh implementation if that feature is needed

Important Files Changed

Filename	Overview
docker/Dockerfile	Clean base image setup with proper non-root user, minimal dependencies, and secure sudo configuration
docker/agent-loop.sh	Comprehensive agent loop with git-based story claiming, firewall init, and robust error handling; minor git config concerns
docker/init-firewall-builder.sh	Solid iptables firewall with DNS whitelisting; DNS resolution happens before lockdown
parallel/ralph-parallel.sh	Feature-rich orchestrator with agent management, stale claim recovery, auto image building; date compatibility issue exists
parallel/lib/docker-helpers.sh	Clean container lifecycle management with proper volume mounts and resource limits
parallel/CLAUDE-parallel.md	Excellent parallel-aware prompt with clear claim protocol, conflict resolution, and push protocol

Sequence Diagram

sequenceDiagram
    participant H as Host (ralph-parallel.sh)
    participant D as Docker
    participant A1 as Agent Container 1
    participant A2 as Agent Container 2
    participant BR as Bare Repo (.ralph/repo.git)
    participant Claude as Claude API

    H->>D: Build ralph-agent image
    H->>D: Create networks (builder/researcher)
    H->>D: Launch agent containers
    D->>A1: Start agent-loop.sh
    D->>A2: Start agent-loop.sh
    
    A1->>A1: Init firewall (iptables)
    A2->>A2: Init firewall (iptables)
    A1->>A1: Copy Claude credentials
    A2->>A2: Copy Claude credentials
    
    A1->>BR: git clone
    A2->>BR: git clone
    
    loop Agent Work Loop
        A1->>BR: git pull --rebase
        A1->>A1: Find unclaimed story in prd.json
        A1->>A1: Set claimed_by field
        A1->>BR: git push (atomic claim)
        
        alt Push succeeds
            A1->>Claude: Run claude with story
            Claude-->>A1: Implementation
            A1->>A1: Run tests
            A1->>A1: Update prd.json (passes: true)
            A1->>BR: git push (with retries)
            A1->>A1: Append to progress-agent-1.txt
        else Push fails (concurrent claim)
            A1->>BR: git reset --hard HEAD~1
            A1->>BR: git pull --rebase
            A1->>A1: Retry with different story
        end
        
        A2->>BR: git pull --rebase
        A2->>A2: Claim different story
        A2->>BR: git push (atomic claim)
        A2->>Claude: Run claude with story
        Claude-->>A2: Implementation
        A2->>BR: git push results
        
        H->>BR: Check if all stories complete
        H->>H: Recover stale claims (>30min)
        H->>D: Health check containers
    end
    
    H->>BR: All stories have passes: true
    H->>D: Stop all containers
    H->>BR: Sync bare repo to project dir

_{Last reviewed commit: 2c82c08}

greptile-apps

_{16 files reviewed, 6 comments}

_{Edit Code Review Agent Settings | Greptile}

greptile-apps · 2026-02-13T19:08:17Z

parallel/ralph-parallel.sh

+        # Parse claimed_at timestamp (macOS date -j, fallback to GNU date -d)
+        local claimed_epoch
+        claimed_epoch=$(date -j -f "%Y-%m-%dT%H:%M:%SZ" "$claimed_at" +%s 2>/dev/null \
+            || date -d "$claimed_at" +%s 2>/dev/null \
+            || echo "0")


date -j is macOS-specific BSD date syntax. Will fail on Linux containers (the actual target platform for this code).

Suggested change

# Parse claimed_at timestamp (macOS date -j, fallback to GNU date -d)

local claimed_epoch

claimed_epoch=$(date -j -f "%Y-%m-%dT%H:%M:%SZ" "$claimed_at" +%s 2>/dev/null \

|| date -d "$claimed_at" +%s 2>/dev/null \

|| echo "0")

# Parse claimed_at timestamp (use GNU date in Linux containers)

local claimed_epoch

claimed_epoch=$(date -d "$claimed_at" +%s 2>/dev/null || echo "0")

Thanks! That was an oversight on my part! I'll do better next time!

greptile-apps · 2026-02-13T19:08:18Z

docker/agent-loop.sh

+# --- Step 4: Set git identity ---
+setup_git_identity() {
+    git config user.name "$AGENT_ID"
+    git config user.email "${AGENT_ID}@ralph-agent.local"
+    git config pull.rebase true
+    git config push.autoSetupRemote true
+}


Git config pull.rebase and push.autoSetupRemote set globally for agent user affects all repos. If agent ever works with multiple repos or submodules, this could cause unexpected behavior.

greptile-apps · 2026-02-13T19:08:20Z

docker/init-firewall-builder.sh

+# Resolve and allow each whitelisted domain
+for domain in "${ALLOWED_DOMAINS[@]}"; do
+    # Resolve all IPs for the domain
+    ips=$(dig +short "$domain" 2>/dev/null | grep -E '^[0-9]+\.' || true)
+    for ip in $ips; do
+        iptables -A OUTPUT -p tcp -d "$ip" --dport 443 -j ACCEPT
+        echo "[firewall] Allowed: $domain -> $ip:443"
+    done
+
+    # Also resolve CNAME targets (CDNs etc)
+    cnames=$(dig +short "$domain" 2>/dev/null | grep -v -E '^[0-9]+\.' || true)
+    for cname in $cnames; do
+        cname_ips=$(dig +short "$cname" 2>/dev/null | grep -E '^[0-9]+\.' || true)
+        for ip in $cname_ips; do
+            iptables -A OUTPUT -p tcp -d "$ip" --dport 443 -j ACCEPT
+            echo "[firewall] Allowed: $domain (via $cname) -> $ip:443"
+        done
+    done
+done


DNS resolution is static at firewall init time. If IPs change after containers start (CDN rotation, DNS updates), agents lose access until container restart.

greptile-apps · 2026-02-13T19:08:20Z

parallel/ralph-parallel.sh

+if [ ! -d "$BARE_REPO" ]; then
+    log_info "Creating bare repo for agent coordination..."
+    mkdir -p "$PROJECT_DIR/.ralph"
+    git clone --bare --filter=blob:none "file://$PROJECT_DIR" "$BARE_REPO"


Using --filter=blob:none for bare repo may cause issues if agents need to access file content from history. Consider implications for projects with large binary assets.

_{Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!}

greptile-apps · 2026-02-13T19:08:21Z

docker/agent-loop.sh

+    claude --dangerously-skip-permissions \
+        --print \
+        --model "$CLAUDE_MODEL" \
+        -p "$PROMPT" \
+        &> "$LOGFILE" || {
+        echo "[$AGENT_ID] Claude exited with error (code: $?). Check log: $LOGFILE"
+    }


Using --dangerously-skip-permissions with AI agents operating autonomously is risky. While sandboxing provides some protection, agents can still execute arbitrary commands within container.

So as a POC I've used this, but I actually think there's probably some middle ground using --permission-mode dontAsk (Claude Docs - Permissions ). If this containerization is something we want to explore further, I'll go define some conventions around what permissions each agent can have. @snarktank

parallel/README.md

- Swap date parsing order to try GNU date -d first, macOS date -j as fallback (orchestrator runs on the host which could be either OS) - Remove dead token_refresh docs and unused check_token_refresh_file() function — auth model is now volume-based, not token-file-based Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

mtgibbs · 2026-02-13T19:20:51Z

Thanks for the review! Addressed the actionable items and wanted to document our reasoning on the rest:

Fixed

date -j is macOS-specific — Good catch. Swapped the order so GNU date -d is tried first with macOS date -j as fallback. The orchestrator runs on the host (could be either OS), so both paths are needed. Fixed in 991e2b9.

Token refresh documented but not wired up — Correct. This was leftover from an earlier token-file auth model. We've since moved to volume-based auth (claude login into a Docker volume), so the token_refresh mechanism doesn't apply. Removed the dead function and docs in 991e2b9.

Acknowledged (acceptable as-is)

Git config set globally for agent user — Each container runs exactly one user working on exactly one repo. There are no submodules or secondary repos in this pattern. The container is ephemeral and torn down after use, so global git config has no side effects.

DNS resolution is static at firewall init — True. If CDN IPs rotate mid-session, the agent would lose access until the container restarts. In practice, agent iterations are short-lived (minutes) and the orchestrator auto-restarts crashed containers, which re-resolves DNS. We considered re-resolving periodically but it adds complexity for a very unlikely failure mode.

--filter=blob:none implications — This is intentional. Agents work on HEAD and don't need old file blobs — the treeless clone gives them commit history and branch refs for orientation without exposing file content from old commits (which could contain leaked secrets). Projects with large binary assets that agents need to reference could use --image with a custom clone strategy, but for PRD-based feature work this is the right tradeoff.

--dangerously-skip-permissions is risky — Agreed, and this is exactly why the containerization exists. The flag is required for headless operation (no human to approve tool calls). The container is the mitigation: no host filesystem access, restricted network, resource limits, non-root user. Worst case is an agent damages its own container, which is ephemeral and disposable.

Previously both agent-loop.sh AND Claude claimed stories. The script would claim US-001, then Claude would read CLAUDE-parallel.md's claim protocol and grab US-002 and US-003 before doing any work. Now: - agent-loop.sh injects {{CLAIMED_STORY}} into the prompt via sed - CLAUDE-parallel.md tells Claude which story is pre-assigned - Claude is explicitly told not to claim additional stories - Claiming is solely the script's responsibility Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

mtgibbs and others added 8 commits February 12, 2026 23:41

feat: add --image flag for custom Docker images

0c4418b

Allow projects to specify a custom Docker image via --image flag, enabling project-specific tooling (e.g., Deno, Python) without modifying the base ralph-agent image. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

greptile-apps bot reviewed Feb 13, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Docker containerization + parallel multi-agent execution#103

feat: Docker containerization + parallel multi-agent execution#103
mtgibbs wants to merge 10 commits intosnarktank:mainfrom
mtgibbs:main

mtgibbs commented Feb 13, 2026 •

edited

Loading

Uh oh!

greptile-apps bot commented Feb 13, 2026

Uh oh!

greptile-apps bot left a comment

Uh oh!

greptile-apps bot Feb 13, 2026

Uh oh!

mtgibbs Feb 13, 2026

Uh oh!

greptile-apps bot Feb 13, 2026

Uh oh!

mtgibbs Feb 13, 2026

Uh oh!

greptile-apps bot Feb 13, 2026

Uh oh!

greptile-apps bot Feb 13, 2026

Uh oh!

greptile-apps bot Feb 13, 2026

Uh oh!

mtgibbs Feb 13, 2026

Uh oh!

Uh oh!

mtgibbs commented Feb 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

mtgibbs commented Feb 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What's Added

Key Design Decisions

Usage

Uh oh!

greptile-apps bot commented Feb 13, 2026

Greptile Overview

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Sequence Diagram

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Feb 13, 2026

Choose a reason for hiding this comment

Uh oh!

mtgibbs Feb 13, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Feb 13, 2026

Choose a reason for hiding this comment

Uh oh!

mtgibbs Feb 13, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Feb 13, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Feb 13, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Feb 13, 2026

Choose a reason for hiding this comment

Uh oh!

mtgibbs Feb 13, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

mtgibbs commented Feb 13, 2026

Fixed

Acknowledged (acceptable as-is)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

mtgibbs commented Feb 13, 2026 •

edited

Loading