docs: detail multi-OS testing

zah · zah · commit 63908ef1074e · 2025-08-25T19:36:49.000+03:00
diff --git a/docs/agent-time-travel.md b/docs/agent-time-travel.md
@@ -4,11 +4,15 @@
 
 Agent Time-Travel lets a user review an agent’s coding session and jump back to precise moments in time to intervene by inserting a new chat message. Seeking to a timestamp restores the corresponding filesystem state using filesystem snapshots (FsSnapshots). The feature integrates across CLI, TUI, WebUI, and REST, and builds on the snapshot provider model referenced by other docs (see `docs/fs-snapshots/overview.md`).
 
+### Implementation Phasing
+
+The initial implementation will focus on supporting regular FsSnapshot on copy-on-write (CoW) Linux filesystems (such as ZFS, Btrfs, and NILFS2), using a session recorder based on Claude Code hooks. An end-to-end prototype will be developed for the entire Agent Time-Travel system, including session recording, timeline navigation, and snapshot/seek/branch operations, to validate the core workflow and user experience. Once this prototype is functional, we will incrementally add support for additional recording and snapshotting mechanisms, including user-space overlay filesystems for macOS and Windows, and advanced recording integrations.
+
 ### Goals
 
 - Enable scrubbing through an agent session with exact visual terminal playback and consistent filesystem state.
 - Allow the user to pause at any moment, inspect the workspace at that time, and create a new SessionBranch with an injected instruction.
-- Provide first-class support for ZFS/Btrfs/NILFS2 where available; offer robust fallbacks on APFS (macOS), VSS (Windows), and non‑CoW Linux.
+- Provide first-class support for ZFS/Btrfs/NILFS2 where available; offer robust fallbacks on non‑CoW Linux, macOS and Windows through file system in user space CoW overlays.
 - Expose a consistent API and UX across WebUI, TUI, and CLI.
 
 ### Non-Goals
@@ -24,15 +28,15 @@ Agent Time-Travel lets a user review an agent’s coding session and jump back t
 - **FsSnapshot**: A SessionMoment that has an associated filesystem snapshot reference (snapshot created near‑synchronously with the moment).
 - **SessionFrame**: A visual state at a specific timestamp; the player can seek and render the SessionFrame.
 - **SessionTimeline**: The ordered set of events (logs, SessionMoments, FsSnapshots, resizes) across a session.
-- **SessionBranch**: A new session created from an FsSnapshot’s filesystem state with an injected chat message.
+- **SessionBranch**: A new session created from a SessionMoment and its associated FsSnapshot’s filesystem state with an injected chat message.
 
 ### Architecture Overview
 
-- **Recorder**: Captures terminal output as an asciinema session recording (preferred) or ttyrec; emits SessionMoments at logical boundaries (e.g., per-command).
-- **FsSnapshot Manager**: Creates and tracks filesystem snapshots; maintains mapping {timestamp → snapshotId}.
-- **Snapshot Provider Abstraction**: Chooses provider per host (ZFS → Btrfs → APFS/VSS → NILFS2/Overlay → copy; FSKit/WinFsp overlays on macOS/Windows). See Provider Matrix below.
-- **SessionTimeline Service (REST)**: Lists FsSnapshots/SessionMoments, seeks, and creates SessionBranches; streams session timeline events via SSE.
-- **Players (WebUI/TUI)**: Embed the session recording; render SessionMoments; orchestrate seek/SessionBranch actions.
+- **Recorder**: Captures terminal output as an asciinema session recording (preferred) or ttyrec; emits SessionMoments at logical boundaries (e.g., per-command). The initial prototype will use a recorder based on Claude Code hooks.
+- **FsSnapshot Manager**: Creates and tracks filesystem snapshots; maintains mapping {moment → snapshotId}.
+- **Snapshot Provider Abstraction**: Chooses provider per host (ZFS → Btrfs → NILFS2 → Overlay → copy; FSKit/WinFsp overlays on macOS/Windows). See Provider Matrix below.
+- **SessionTimeline Service (REST)**: Lists FsSnapshots/SessionMoments, seeks, and creates SessionBranches; streams session recording events via SSE.
+- **Players (WebUI/TUI)**: Embed the session recording; render streaming SessionRecordings in real-time and allows seeking to arbitrary SessionFrames; orchestrate SessionBranch actions.
 - **Workspace Manager**: Mounts read-only snapshots for inspection and prepares writable clones/upper layers for SessionBranches.
 
 ### SessionRecording and SessionTimeline Model
@@ -57,12 +61,12 @@ Agent Time-Travel lets a user review an agent’s coding session and jump back t
     - Overlay fallback: lower = base tree, upper/work on fast storage (tmpfs or RAM-backed NILFS2/zram/brd) for ephemeral SessionBranches.
     - Copy fallback: `cp --reflink=auto` when possible; otherwise deep copy (last resort).
   - macOS:
-    - APFS snapshots: read-only, instantaneous; mountable for inspection. For SessionBranch, create an overlay-style writable workspace using a read-only snapshot as lower with a writable upper (FSKit backend when available) or fast copy-on-write file clones where feasible.
+    - User-space overlay: Use FSKit to provide a copy-on-write overlay filesystem for both inspection and SessionBranching, as APFS snapshots are not fast enough for our needs.
   - Windows:
-    - VSS shadow copies: read-only snapshots at volume level; expose snapshot content for inspection. For SessionBranch, materialize a writable workspace via differencing VHD(X) layered over the snapshot materialization or by copying-on-write using a WinFsp-backed overlay.
+    - User-space overlay: Use WinFsp to provide a copy-on-write overlay filesystem for both inspection and SessionBranching, as VSS snapshots are not fast enough for our needs.
 
 - **SessionBranch Semantics**:
-  - Writable clones are native on ZFS/Btrfs. On APFS/VSS, SessionBranching is emulated via overlay or virtual disk differencing over the read-only snapshot view.
+  - Writable clones are native on ZFS/Btrfs. On macOS and Windows, SessionBranching is implemented via user-space overlay filesystems (FSKit/WinFsp) rather than native snapshotting.
   - SessionBranches are isolated workspaces; original session remains immutable.
 
 ### User‑Space Filesystem Overlay (macOS and Windows)
@@ -78,6 +82,7 @@ Agent Time-Travel lets a user review an agent’s coding session and jump back t
   - bash: `trap DEBUG` + `PROMPT_COMMAND` pair to delimit commands.
   - fish: `fish_preexec`/`fish_postexec` equivalents.
 - **Runtime Integration**: The runner emits session timeline events (SSE) at milestones; the snapshot manager aligns nearest FsSnapshot ≤ timestamp.
+- **Multi‑OS Sync Fence**: When multi‑OS testing is enabled, each execution cycle performs `fs_snapshot_and_sync` on the leader (create FsSnapshot, then fence Mutagen sessions to followers) before invoking `run_everywhere`. See `docs/multi-os-testing.md`.
 - **Advanced (future)**: eBPF capture of PTY I/O and/or FS mutations; rr-based post‑facto reconstruction of session recordings; out of scope for v1 but compatible with this model.
 
 ### REST API Extensions
@@ -130,7 +135,7 @@ Agent Time-Travel lets a user review an agent’s coding session and jump back t
 
 ### WebUI UX
 
-- **Player Panel**: Embed `<asciinema-player>` with `poster`, SessionMoments, and a scrubber. Time cursor shows nearest FsSnapshot and label.
+- **Player Panel**: Embed `<asciinema-player>` with SessionMoments and a scrubber. Time cursor shows nearest FsSnapshot and label.
 - **Pause & Intervene**: On pause, surface “Inspect snapshot” and “SessionBranch from here”.
 - **Inspect Snapshot**: Mounts read‑only view; open a lightweight file browser and offer “Open IDE at this point”.
 - **SessionBranch From Here**: Dialog to enter an injected message and name; creates a new session (SessionBranch); link both sessions for side‑by‑side comparison.
@@ -156,17 +161,17 @@ Agent Time-Travel lets a user review an agent’s coding session and jump back t
 
 - **Keystrokes**: If input capture is enabled, redact known password prompts (heuristics based on ECHO off and common prompts). Make input capture opt‑in.
 - **Access Control**: SessionTimeline/seek/SessionBranch require the same permissions as session access; snapshot mounts use least‑privilege read‑only where applicable.
-- **Data Retention**: Separate retention for recordings vs snapshots; defaults minimize data exposure. Encrypt at rest when stored remotely.
+- **Data Retention**: Separate retention for session recordings vs snapshots; defaults minimize data exposure. Encrypt at rest when stored remotely.
 
 ### Performance, Retention, and Limits
 
 - **Snapshot Rate Limits**: Min interval between FsSnapshots; coalesce within a small window (e.g., 250–500 ms) to avoid bursty commands creating many snapshots.
 - **Retention**: Policies by count/age/size. Prune unreferenced checkpoints (e.g., NILFS2) and expired provider snapshots.
-- **Storage**: Cast files compressed; offload to object storage. Mounts are short‑lived and garbage‑collected.
+- **Storage**: Session recording files compressed; offload to object storage. Mounts are short‑lived and garbage‑collected.
 
 ### Failure Modes and Recovery
 
-- **Snapshot Creation Fails**: Create a SessionMoment with `fsSnapshot=false` and reason; continue recording; allow manual retry.
+- **Snapshot Creation Fails**: Create a SessionMoment with `fsSnapshot=false` and reason; continue session recording; allow manual retry.
 - **Seek Failure**: Report provider error and suggest nearest valid FsSnapshot.
 - **Provider Degraded**: Fall back per provider preference, with explicit event logged to the session timeline.
 
@@ -175,14 +180,14 @@ Agent Time-Travel lets a user review an agent’s coding session and jump back t
 - **ZFS**: Snapshots and clones — ideal for FsSnapshots and SessionBranches.
 - **Btrfs**: Subvolume snapshots — ideal for FsSnapshots and SessionBranches.
 - **NILFS2**: Continuous checkpoints; promote to snapshots; mount via `cp=<cno>`; SessionBranch via overlay.
-- **APFS**: Read‑only snapshots; SessionBranch via overlay or file clones (no native writable clone of snapshot).
-- **VSS**: Read‑only shadow copies; SessionBranch via differencing VHD/overlay.
+- **APFS**: Not targeted; APFS snapshots are not fast enough for our needs. Use FSKit overlay instead.
+- **VSS**: Not targeted; VSS snapshots are not fast enough for our needs. Use WinFsp overlay instead.
 - **Overlay/Copy**: Universal fallbacks when CoW is unavailable.
 
 ### Open Issues and Future Work
 
 - eBPF PTY and FS hooks for automatic, runner‑independent capture.
-- rr‑based post‑facto reconstruction of casts and fine‑grained FsSnapshots.
+- rr‑based post‑facto reconstruction of session recordings and fine‑grained FsSnapshots.
 - IPBT integration for advanced session timeline browsing on ttyrec recordings.
 - FSKit backend maturation on macOS for robust overlay SessionBranching without kexts.
 - Windows containers integration to provide stronger per‑session isolation when SessionBranching.
diff --git a/docs/cli-spec.md b/docs/cli-spec.md
@@ -128,6 +128,12 @@ Mirrors `docs/configuration.md` including provenance, precedence, and Windows be
   - Serves the WebUI for local use; in `--local` it binds to `127.0.0.1` and hides admin features.
 
 #### 9) Utilities
+#### 10) Followers and Multi‑OS
+
+- `aw followers list` — List configured follower hosts and tags.
+- `aw followers sync-fence [--timeout <sec>] [--tag <k=v>]... [--host <name>]... [--all]` — Perform a synchronization fence, ensuring followers match the leader workspace state.
+- `aw run-everywhere <action> [args...] [--tag <k=v>]... [--host <name>]... [--all]` — Invoke project’s `.agents/run_everywhere` on selected followers.
+
 
 - `aw doctor` — Environment diagnostics (snapshot providers, multiplexer availability, docker/devcontainer, git).
 - `aw completion [bash|zsh|fish|pwsh]` — Shell completions.
diff --git a/docs/devcontainer-design.md b/docs/devcontainer-design.md
@@ -124,6 +124,7 @@ Each agent’s exact mapping is captured in `docs/agents/<tool>.md` and validate
 - Cold/warm build benchmarks with and without caches.
 - Credential probes for each agent (non‑destructive): `gh auth status`, short `curl` to model/provider endpoints when keys present.
 - Time‑travel hook smoke tests: run a few commands and verify SessionMoments are emitted.
+- Multi‑OS smoke tests: verify Mutagen sessions, fence latency, and `run_everywhere` execution on tagged followers.
 - Cross‑platform matrix: Linux, macOS (Docker Desktop), Windows (WSL2/Hyper‑V).
 
 ### Migration Plan
diff --git a/docs/multi-os-testing.md b/docs/multi-os-testing.md
@@ -0,0 +1,120 @@
+## Multi‑OS Testing — Leader/Followers, Sync, and run_everywhere
+
+### Summary
+
+Enable agents to validate builds and tests across multiple operating systems in parallel with a simple, reliable flow:
+
+- The Linux host acts as the leader workspace (preferred for CoW FsSnapshots and orchestration).
+- One or more follower workspaces (macOS, Windows, Linux) mirror the leader via Mutagen high‑speed file sync.
+- Each execution cycle fences the filesystem state (FsSnapshot + sync) and then invokes project‑defined commands everywhere via `run_everywhere`.
+
+### Goals
+
+- Deterministic, low‑latency propagation of file changes from leader to followers.
+- Atomic test execution view based on a consistent leader FsSnapshot.
+- Simple project integration via a single `run_everywhere` entrypoint and tagging.
+- Minimal OS‑specific logic inside agents; orchestration handled by the runner.
+- Avoid the complexity of filesystem snapshots on followers. The snapshots of the leader are sufficient to restore any filesystem state on the followers as well.
+
+### Terminology
+
+- **Leader**: The primary workspace on Linux (snapshot‑enabled when possible).
+- **Followers**: Secondary workspaces on other OSes, receiving file updates via Mutagen.
+- **Sync Fence**: An explicit operation ensuring all follower file trees match the leader FsSnapshot before execution.
+- **run_everywhere**: Project command that runs an action (e.g., build/test) on selected hosts and returns output of the command execution to the agent running on the leader.
+
+### Architecture
+
+1) Workspace Topology
+   - Leader path (e.g., `/workspaces/proj`) is the source of truth.
+   - Mutagen sessions map leader→follower working directories with optimized ignores.
+   - Followers are prepared using container/VM/native shells; Windows may still use the `S:` drive mapping even when not using the WinFsp overlay (which is not required in a follower configuration).
+
+2) Execution Cycle
+   - Agent edits files on the leader.
+   - Runner executes `fs_snapshot_and_sync`:
+     - Create a leader FsSnapshot (native CoW when available; FSKit/WinFsp overlay fallback otherwise).
+     - Issue a sync fence: wait until Mutagen confirms followers are in sync with the leader snapshot content.
+   - The agent is instructed to invoke `run_everywhere` with appropriate selectors in the agent instructions inserted automatically by agents-workflow.
+
+3) Selectors
+   - `--host <name>`: run on a single follower by host name.
+   - `--tag <tag>`: run on all followers tagged with `<tag>` (e.g., `os=windows`, `gpu=nvidia`).
+   By default, the supplied command is executed on all configured followers (the default).
+
+### Snapshot Strategy
+
+- Leader on CoW FS (ZFS/Btrfs/NILFS2):
+  - Only the leader creates FsSnapshots; followers rely on sync fence to reflect that exact state.
+- Leader without CoW (Windows‑only/macos‑only projects):
+  - Use user‑space overlay (FSKit/WinFsp) for the leader to provide efficient CoW behavior.
+  - Followers still rely on sync fence; no follower snapshots required.
+
+### Mutagen Integration
+
+- Use Mutagen to establish persistent, resilient sync sessions (bidirectional disabled; leader→followers only).
+- Sync ignores: `node_modules`, `.venv`, `target`, `build`, large caches unless explicitly needed; per‑project config via `.agents/mutagen.yml`.
+- Sync fence API: wait for `watchState == consistent` across all selected followers with a timeout and backoff.
+
+### Project Contract: run_everywhere
+
+The `run_everywhere` command is available in the dev environment of the project as part of the agent-workflow suite (which is pre-installed in the base docker images provided by agents-workflow (see `devcontainer-design.md`)):
+
+- Parameter parsing for `--host`, `--tag`, `--all`, and pass‑through of the command/action (e.g., `build`, `test`, `lint`).
+- Host catalog discovery (local file `.agents/hosts.json`, REST query, or env).
+- Per‑host command adapters:
+  - Linux: bash/zsh; container or native.
+  - macOS: zsh; FSKit overlay mount path.
+  - Windows: PowerShell or bash in MSYS; WinFsp overlay under `S:`.
+- Exit code aggregation: return non‑zero if any selected host fails.
+
+Illustrative usage:
+
+```bash
+# Run tests on all followers
+.agents/run_everywhere test --all
+
+# Run build only on Windows hosts
+.agents/run_everywhere build --tag os=windows
+
+# Run lint on a specific host
+.agents/run_everywhere lint --host win-12
+```
+
+### REST Extensions (high‑level)
+
+- `GET /api/v1/followers` → list configured followers (host, os, tags, status).
+- `POST /api/v1/followers/sync-fence` → perform sync fence; returns states per follower.
+- `POST /api/v1/run-everywhere` → body: { action, args, selectors }; streams per‑host logs via SSE.
+
+### CLI Additions (high‑level)
+
+- `aw followers list` — show followers and status.
+- `aw followers sync-fence [--timeout s] [--tag ... | --host ... | --all]`
+- `aw run-everywhere <action> [args...] [--tag ... | --host ... | --all]`
+
+### Time‑Travel Integration
+
+- The leader’s `fs_snapshot_and_sync` is inserted between edit operations and tool execution.
+- SessionMoments are emitted before/after the fence; the FsSnapshot id is linked to the post‑fence SessionMoment.
+- Seeking to that SessionMoment restores leader FsSnapshot; followers are re‑synced by issuing a fence before re‑execution.
+
+### Devcontainer/Runner Notes
+
+- Followers can be provisioned via devcontainers or native shells with the same project devshell.
+- Credentials and environment normalization follow the base image’s credential propagation rules.
+- Health checks verify Mutagen sessions and per‑host readiness before execution.
+
+### Failure Modes
+
+- Fence timeout: abort run_everywhere; report lagging followers and suggest narrowing selectors.
+- Partial host failure: aggregate failures and return non‑zero; provide per‑host logs and artifacts.
+- Sync divergence: force rescan/rebuild of stale directories; optionally clear ignores for critical paths.
+
+### Open Questions
+
+- Artifact collection and centralization strategy across followers.
+- Test sharding and orchestration policies (e.g., split tests by tag or runtime).
+- Security posture for follower access (SSH, certificates, RBAC via REST).
+
+
diff --git a/docs/prompt-engineering.md b/docs/prompt-engineering.md
diff --git a/docs/rest-service.md b/docs/rest-service.md