metacraft-labs
diff --git a/‎.agents/codex-setup‎
Lines changed: 3 additions & 1 deletion b/‎.agents/codex-setup‎
Lines changed: 3 additions & 1 deletion
diff --git a/‎.agents/tasks/2025/07/01-1700-fs-snapshots-step1‎
Lines changed: 3 additions & 0 deletions b/‎.agents/tasks/2025/07/01-1700-fs-snapshots-step1‎
Lines changed: 3 additions & 0 deletions
diff --git a/‎AGENTS.md‎
Lines changed: 4 additions & 3 deletions b/‎AGENTS.md‎
Lines changed: 4 additions & 3 deletions
diff --git a/‎agent-task.gemspec‎
Lines changed: 1 addition & 1 deletion b/‎agent-task.gemspec‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎docs/fs-snapshots-workflow.md‎
Lines changed: 0 additions & 100 deletions b/‎docs/fs-snapshots-workflow.md‎
Lines changed: 0 additions & 100 deletions
diff --git a/‎docs/fs-snapshots/milestone_1.md‎
Lines changed: 19 additions & 0 deletions b/‎docs/fs-snapshots/milestone_1.md‎
Lines changed: 19 additions & 0 deletions
diff --git a/‎docs/fs-snapshots/milestone_2.md‎
Lines changed: 9 additions & 0 deletions b/‎docs/fs-snapshots/milestone_2.md‎
Lines changed: 9 additions & 0 deletions
diff --git a/‎docs/fs-snapshots/milestone_3.md‎
Lines changed: 13 additions & 0 deletions b/‎docs/fs-snapshots/milestone_3.md‎
Lines changed: 13 additions & 0 deletions
diff --git a/‎docs/fs-snapshots/milestone_4.md‎
Lines changed: 9 additions & 0 deletions b/‎docs/fs-snapshots/milestone_4.md‎
Lines changed: 9 additions & 0 deletions
diff --git a/‎docs/fs-snapshots/milestone_5.md‎
Lines changed: 13 additions & 0 deletions b/‎docs/fs-snapshots/milestone_5.md‎
Lines changed: 13 additions & 0 deletions
@@ -8,6 +8,8 @@ apt-get install -y --no-install-recommends \
   fossil \
   mercurial \
   just \
-  ruby-full
+  ruby-full \
+  zfsutils-linux \
+  btrfs-progs
 
 sudo gem install rubocop
@@ -0,0 +1,3 @@
+Carry out the first step of the plan in docs/fs-snapshots-workflow.md
+
+If you find out that you miss any software, add it to .agents/codex-setup and the nix flake. Make sure that the flake can be loaded by running any command under nix develop (you'll have to install Nix first)
@@ -6,9 +6,10 @@
 
 ## 🧪 Testing Tips
 
-- **Fix individual tests first**: `ruby -Itest test/test_file.rb -n test_method_name`
-- **Parent directory tests**: Require repos to be on agent task branches (not main)
-- **Full logs**: Available in `test/logs/` (path shown after test runs)
+When the test suite fails and you want to test potential fixes, try running only the affected
+tests firsts:
+
+`ruby -Itest test/test_file.rb -n test_method_name`
 
 ## Keeping notes
 
 
@@ -8,6 +8,6 @@ Gem::Specification.new do |spec|
   spec.files         = Dir['bin/*', 'lib/**/*.rb', 'LICENSE', 'README.md', 'codex-setup']
   spec.executables   = Dir['bin/*'].select { |f| File.file?(f) }.map { |f| File.basename(f) }
   spec.require_paths = ['lib']
-  spec.required_ruby_version = '>= 3.0.0'
+  spec.required_ruby_version = '>= 2.6.0'
   spec.metadata['rubygems_mfa_required'] = 'true'
 end
@@ -0,0 +1,19 @@
+**Milestone 1: Core Filesystem Abstraction Layer**
+*Implementation:* Build fundamental filesystem operation primitives as the foundation. Create a `SnapshotProvider` abstraction with concrete implementations for each supported method:
+
+* **Detection Logic:** Implement filesystem type detection by examining `/proc/mounts`, checking for ZFS/Btrfs tools availability, and falling back gracefully through the hierarchy (ZFS → Btrfs → OverlayFS → Copy).
+* **ZFS Provider:** Implement `zfs snapshot` and `zfs clone` operations with proper dataset path resolution and cleanup. Handle permissions and error cases (e.g., insufficient privileges, quota limits).
+* **Btrfs Provider:** Implement `btrfs subvolume snapshot` with automatic subvolume creation if needed. Handle the case where the repository is not yet a subvolume.
+* **OverlayFS Provider:** Create overlay mounts with proper `lowerdir`, `upperdir`, and `workdir` structure. Handle sudo requirements and privilege escalation gracefully.
+* **Copy Provider:** Implement fast copying using reflinks where available (`cp --reflink=auto`) or falling back to hard links and finally regular copying.
+
+*Testing Strategy:* Create real filesystems within files using loop devices for comprehensive testing. This approach provides authentic filesystem behavior without requiring pre-configured test systems:
+
+* **ZFS Testing:** Create ZFS pools using loop devices with `zpool create test_pool /path/to/file.img`. Create datasets, test snapshot/clone operations, verify CoW behavior, and test error conditions like insufficient space or permissions.
+* **Btrfs Testing:** Create Btrfs filesystems in files with `mkfs.btrfs /path/to/file.img`, mount via loop devices, create subvolumes, and test snapshot operations. Verify that non-subvolume directories are automatically converted when needed.
+* **OverlayFS Testing:** Create multiple loop-mounted filesystems to test overlay mounting with different combinations of lower/upper/work directories. Test with both writable and read-only lower layers.
+* **Copy Testing:** Test on various filesystem types (ext4, xfs, etc.) created in loop devices to verify reflink support detection and fallback behavior.
+* **Error Condition Testing:** Test quota limits, permission errors, disk full scenarios, and concurrent access patterns using the loop device filesystems.
+* **Performance Testing:** Measure snapshot creation/deletion times and space usage with real filesystems to establish baseline performance characteristics.
+
+*CI Integration:* The test suite will create temporary filesystem images during test runs, eliminating the need for pre-configured CI environments with specific filesystems. Tests can run on any Linux system with loop device support (standard in most CI environments).
@@ -0,0 +1,9 @@
+**Milestone 2: Mock Agent Integration Testing**
+*Implementation:* Create a realistic mock agent that simulates actual AI agent behavior without requiring real AI services:
+
+* **Mock Agent Behavior:** The mock agent will perform realistic file operations (reading source files, creating output files, modifying existing files), include configurable work duration with sleep calls, and generate logs of its activities. It will simulate the patterns of real agents like Codex or Goose.
+* **Docker Test Environment:** Build a minimal Alpine Linux Docker image containing the mock agent and Ruby runtime. This image will serve as a controlled environment for testing isolation and concurrency.
+* **Parallel Execution Tests:** Write integration tests that launch multiple mock agents simultaneously in separate isolated workspaces. Verify that agents cannot see each other's changes and that all file modifications remain properly isolated.
+* **Performance Testing:** Measure snapshot creation time, workspace cleanup time, and resource usage under concurrent load to ensure the system scales appropriately.
+
+*Integration Testing:* These tests will use real filesystem operations but controlled environments. Test on multiple filesystems (ext4, btrfs) in CI. Verify isolation guarantees and measure performance characteristics.
@@ -0,0 +1,13 @@
+**Milestone 3: Credential Management System**
+*Implementation:* Based on research of actual AI agent tools, implement a comprehensive credential management system:
+
+* **Credential Pattern Detection:** Support for different agent types with their specific credential requirements:
+  - **Codex:** `OPENAI_API_KEY` environment variable, `~/.codex/auth.json` and config files
+  - **GitHub Copilot:** `GITHUB_TOKEN` environment variable, `~/.config/gh/hosts.yml` for GitHub CLI auth
+  - **Goose:** `OPENAI_API_KEY`, `ANTHROPIC_API_KEY` environment variables
+  - **Gemini:** `GEMINI_API_KEY` environment variable
+  - **Claude:** `ANTHROPIC_API_KEY` environment variable
+* **Secure Mounting:** For containerized execution, mount credential files read-only and propagate environment variables securely. For VM execution, sync credential files safely while preserving file permissions (0600 for auth files).
+* **Colima/VM Support:** Research confirms that Colima and similar VM solutions support both environment variable propagation and bind mounts for credential files. The system will use Docker's `--env-file` and `--mount` options for secure credential injection.
+
+*Testing:* Test credential mounting with dummy credentials in controlled environments. Verify that credentials are accessible to agents. Test both file-based and environment variable credentials.
@@ -0,0 +1,9 @@
+**Milestone 4: SSH/Remote Execution Framework**
+*Implementation:* Build remote execution capabilities using Docker containers with SSH servers for realistic testing:
+
+* **SSH Test Infrastructure:** Extend the test-purpose docker image with a configured SSH server. Use this for testing remote execution without requiring actual remote machines.
+* **File Synchronization:** Implement both one-shot sync (using `rsync`) and persistent sync (using Mutagen) approaches. Handle edge cases like network interruptions and large file transfers.
+* **Remote Filesystem Detection:** Extend the filesystem detection logic to work over SSH connections. Cache detection results to avoid repeated SSH calls.
+* **Error Handling:** Comprehensive error handling for network failures, authentication issues, insufficient remote permissions, and missing remote tools.
+
+*Integration Testing:* Use Docker containers as SSH targets to test the complete remote workflow. Launch a container, establish SSH connection, sync files, create snapshots remotely, execute agents, and retrieve results. Test concurrent remote executions and verify isolation.
@@ -0,0 +1,13 @@
+**Milestone 5: Full Integration and CI/CD Pipeline**
+*Implementation:* Integrate all components and establish comprehensive CI testing:
+
+* **CI Matrix Enhancement:** Add test jobs for different OS/filesystem combinations:
+  - Ubuntu with btrfs support
+  - Ubuntu with overlay-only (simulating basic ext4 systems)
+  - macOS with Docker/Colima simulation
+  - Windows with WSL2/Docker simulation
+* **End-to-End Testing:** Test complete workflows from `agent-task` CLI invocation through workspace creation, agent execution, and cleanup.
+* **Performance Monitoring:** Add benchmarks for snapshot creation/destruction, file sync performance, and concurrent agent execution. Set performance regression thresholds.
+* **Documentation and Examples:** Complete user documentation with setup instructions for each platform, credential configuration guides, and troubleshooting sections.
+
+*Integration Testing:* The CI pipeline will run the full test suite across the matrix of supported platforms and configurations. This includes both unit tests of individual components and integration tests of complete workflows. All tests will run against real filesystem operations and network conditions to catch issues that mocks might miss.
Original file line number	Diff line number	Diff line change
`@@ -0,0 +1,3 @@`
	`1`	`+Carry out the first step of the plan in docs/fs-snapshots-workflow.md`
	`2`	`+`
	`3`	`+If you find out that you miss any software, add it to .agents/codex-setup and the nix flake. Make sure that the flake can be loaded by running any command under nix develop (you'll have to install Nix first)`