Add --fail-fast flag for early cancellation on test failure#111
Add --fail-fast flag for early cancellation on test failure#111DanverImbue merged 18 commits intomainfrom
Conversation
There was a problem hiding this comment.
Vet found 1 issue.
[commit_message_mismatch] (severity 3/5) (confidence 0.92)
The diff includes test output artifacts (test-results-fail-fast/logs/batch-*.log) that appear to be generated from a manual test run and should not be committed to the repository. These contain machine-specific paths (e.g., /Users/jacobkirmayer/imbue/offload/.claude/worktrees/fail-fast) and are transient build artifacts.
| @@ -236,7 +245,7 @@ mod tests { | |||
| let fw = PytestFramework::new(config)?; | |||
| let record = TestRecord::new("tests/test_a.py::test_one", "test-group"); | |||
| let tests = vec![TestInstance::new(&record)]; | |||
There was a problem hiding this comment.
[test_coverage] (severity 2/5) (confidence 0.85)
The diff adds fail-fast flag support to the framework execution commands (pytest -x, cargo nextest --fail-fast, vitest --bail), but there are no unit tests verifying that these flags are correctly added to the command when fail_fast: true is passed. The existing tests in pytest.rs, vitest.rs only pass false for the new parameter. Tests should verify the fail_fast: true path produces the correct command arguments.
| @@ -236,7 +245,7 @@ mod tests { | |||
| let fw = PytestFramework::new(config)?; | |||
| let record = TestRecord::new("tests/test_a.py::test_one", "test-group"); | |||
| let tests = vec![TestInstance::new(&record)]; | |||
There was a problem hiding this comment.
[test_coverage] (severity 3/5) (confidence 0.90)
The diff adds fail-fast flag threading through multiple framework implementations (pytest -x, cargo nextest --fail-fast, vitest --bail), but no tests verify that these framework-specific flags are correctly added to the command when fail_fast=true. The existing tests were only updated to pass false. Tests for fail_fast=true should be added for each framework.
67f342c to
b933322
Compare
b933322 to
b1e4350
Compare
There was a problem hiding this comment.
Vet found 1 issue.
[test_coverage] (severity 3/5) (confidence 0.80)
The user request mentions '4 new unit tests for MasterJunitReport::has_any_failures()' but the diff does not include any tests for has_any_failures(). The diff adds tests for framework-level fail-fast flags (pytest -x, nextest --fail-fast, vitest --bail) but the has_any_failures() method tests mentioned in the test plan are missing from the diff.
…e-106) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add --fail-fast bool flag to the Run CLI command and thread it through the full call chain: run_tests() → dispatch_framework() → run_all_tests() → Orchestrator::new() → SpawnConfig. The flag is plumbed but not yet active; cancellation logic follows in the next commit. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…cation If --fail-fast works, the run finishes in seconds. Without it, 10 minutes. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When --fail-fast is enabled, inject the framework's native stop-on-failure flag into the sandbox command: pytest gets -x, cargo nextest gets --fail-fast (replacing --no-fail-fast), vitest gets --bail. The default framework is unchanged since the user controls run_command directly. Also bump ratchets budget for examples/tests_fail time.sleep usage. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
b1e4350 to
2b6d941
Compare
There was a problem hiding this comment.
Vet found 1 issue.
[commit_message_mismatch] (severity 3/5) (confidence 0.90)
The user request explicitly mentions '4 new unit tests for MasterJunitReport::has_any_failures() (empty, all-pass, with-failure, flaky-not-failure)' in the test plan, and issue code-105 describes adding a 'has_any_failures() method to MasterJunitReport in junit.rs'. However, the diff does not contain any changes to junit.rs - neither the has_any_failures() method implementation nor the 4 unit tests for it. The diff only contains framework-level tests for the fail-fast flag in command generation. The fail-fast cancellation logic in spawn.rs relies on BatchOutcome::Failure matching, but there are no integration-level tests verifying the fail-fast cancellation behavior in spawn_task either.
Summary
--fail-fastCLI flag tooffload runthat cancels all remaining batches and terminates sandboxes when a test failure is detectedCancellationTokeninfrastructure — in-flight executions abort viaselect!againsttoken.cancelled(), queued batches are skipped at pull timeMasterJunitReportbefore cancellation triggers, so the failing test's results are always captured in the finaljunit.xmlTest plan
MasterJunitReport::has_any_failures()(empty, all-pass, with-failure, flaky-not-failure)cargo fmt --checkpassescargo clippy --all-targets --all-featurespassescargo nextest runpasses (132/132)offload run --fail-fastwith a failing test confirms early cancellation and correctjunit.xmloutput🤖 Generated with Claude Code