diff --git a/.beads/issues.jsonl b/.beads/issues.jsonl index 9827e796..cc480beb 100644 --- a/.beads/issues.jsonl +++ b/.beads/issues.jsonl @@ -4,6 +4,17 @@ {"id":"code-101","title":"Investigate and instrument slow acceptance tests for optimization","description":"## Directive\n\nIMPORTANT: Before doing ANY git or VCS operations, you MUST activate the jujutsu skill by running /jujutsu. This is a jujutsu-managed repository. Using raw git commands will corrupt data.\n\nWhen this bead is complete, mark the final revision with a branch: danver/investigate-slow-acceptance-tests\n\n## Problem\n\nFive acceptance tests individually take 47-68 seconds each, forming the hard floor on execution time. No amount of parallelism or scheduling improvement can reduce the wall clock below the duration of the slowest single test. These tests are the dominant bottleneck in Run 3.\n\n## Evidence from Trace Analysis\n\nFrom Run 3 (26,210 tests, 200 sandboxes):\n\n| Batch | Tests | Duration | Sandbox |\n|-------|-------|----------|---------|\n| batch_3 | 1 | 67.6s | 17 |\n| batch_1 | 1 | 58.1s | 2 |\n| batch_0 | 1 | 48.0s | 0 |\n| batch_4 | 1 | 48.0s | 32 |\n| batch_2 | 1 | 47.6s | 3 |\n\nThese 5 tests consume 269 sandbox-seconds. The longest (67.6s) is 72% of the entire 93.8s execution window. Even a 2x improvement on just the slowest test would save ~34s off the critical path.\n\n## Context\n\nThese tests are in the `mng` repository (the repo that uses offload to run its tests), not in the offload repository itself. The offload tool runs whatever tests it discovers -- it does not control their content. However, we can instrument offload to help identify what makes these tests slow.\n\n## Required Changes\n\n### 1. Add per-test timing to batch output\n\nCurrently, offload knows the total batch duration but not individual test durations within a batch. For single-test batches this is fine, but for multi-test batches the per-test breakdown is invisible.\n\nIn `src/orchestrator/runner.rs`, after downloading the JUnit XML results, parse the `time` attribute from each `\u003ctestcase\u003e` element and log the top-N slowest tests:\n\n```rust\n// After downloading junit.xml, log slowest tests\nlet mut test_times: Vec\u003c(\u0026str, f64)\u003e = Vec::new();\n// Parse \u003ctestcase name=\"...\" time=\"...\"\u003e elements\n// Sort by time descending\n// Log top 5 slowest\nfor (name, time) in test_times.iter().take(5) {\n info!(\"[SLOW TEST] {}: {:.1}s\", name, time);\n}\n```\n\n### 2. Add a `--slow-test-threshold` CLI flag\n\nAdd a `--slow-test-threshold` flag (default: 30s) that causes offload to emit a warning for any test exceeding the threshold:\n\n```\nWARNING: Test 'test_full_acceptance_flow' took 67.6s (threshold: 30s)\n```\n\nThis makes slow tests visible in CI output without requiring trace analysis.\n\n### 3. Add slow test data to the Perfetto trace\n\nIn the trace output, add per-test duration events. Currently the trace has batch-level events (`exec_batch`, `download_results`). Add individual test events within the exec thread:\n\n```rust\n// For each testcase in the junit XML:\ntracer.complete_event(\n test_name,\n \"test\",\n sandbox_pid,\n TID_EXEC,\n test_start_us,\n test_duration_us,\n);\n```\n\nThis requires parsing the JUnit XML for individual test times and mapping them back to the trace timeline. The start time can be approximated (batch_start + cumulative_previous_test_times).\n\n### 4. Add a summary section to the run output\n\nAfter the existing summary (passed/failed/flaky counts), add a \"Slowest Tests\" section:\n\n```\nSlowest tests:\n 1. test_full_acceptance_flow 67.6s\n 2. test_end_to_end_pipeline 58.1s\n 3. test_modal_integration 48.0s\n ...\n```\n\nUse the JUnit XML `time` attributes as the source of truth.\n\n### 5. Write tests\n\n- Test that the slow test warning is emitted when a test exceeds the threshold\n- Test that the slow test summary is correctly sorted and limited to top N\n- Test that per-test trace events are emitted correctly\n\n## Expected Impact\n\n- No direct wall-clock improvement (this is instrumentation)\n- Enables the mng team to identify and profile the specific slow tests\n- The slow test warnings in CI output will create visibility and pressure to fix them\n- Per-test trace events enable deeper analysis in Perfetto UI\n\n## Files to Modify\n- src/orchestrator/runner.rs (add per-test timing extraction from JUnit XML)\n- src/main.rs (add --slow-test-threshold flag)\n- src/report.rs or src/report/junit.rs (add slow test summary to output)\n- src/trace.rs (possibly add per-test trace events)","status":"open","priority":2,"issue_type":"task","created_at":"2026-03-05T22:07:16.863726-08:00","created_by":"danver","updated_at":"2026-03-05T22:07:16.863726-08:00"} {"id":"code-102","title":"Add vitest duplicate test name check to onboarding skill","description":"Update SKILL.md to detect vitest framework and check for duplicate space-separated test IDs during onboarding. If duplicates are found, the agent must stop and ask the user if they want the agent to deduplicate them by renaming tests more verbosely. Convey that this is a blocking requirement for using Offload.","status":"done","priority":0,"issue_type":"task","owner":"jacob.kirmayer@imbue.com","created_at":"2026-03-16T10:16:55.275063-07:00","created_by":"Jacob Kirmayer","updated_at":"2026-03-16T10:22:43.276909-07:00"} {"id":"code-103","title":"Add offload collect verification step to onboarding skill","description":"Update SKILL.md Step 10 (Run Offload Locally and Verify) to instruct agents to use 'offload collect' first to verify discovery works before running full 'offload run'. The agent should iterate on offload collect until discovery succeeds before attempting execution.","status":"done","priority":0,"issue_type":"task","owner":"jacob.kirmayer@imbue.com","created_at":"2026-03-16T10:54:33.111537-07:00","created_by":"Jacob Kirmayer","updated_at":"2026-03-16T10:56:07.309012-07:00"} +{"id":"code-102","title":"Strip pytest framework config to bare minimum","description":"Remove python, extra_args, markers fields from PytestFrameworkConfig. Make command required (not Option). Keep test_id_format as internal constant. Update all related code: schema.rs, pytest.rs, main.rs init template, example TOML configs, tests, README.","status":"done","priority":1,"issue_type":"task","created_at":"2026-03-10T11:56:28.4364-07:00","created_by":"Jacob Kirmayer","updated_at":"2026-03-10T12:03:22.088725-07:00"} +{"id":"code-103","title":"Add CostEstimate struct to provider.rs with cpu_seconds and estimated_cost_usd fields. Include Display impl. The struct should be Clone, Debug, Default.","status":"done","priority":0,"issue_type":"task","created_at":"2026-03-11T11:24:05.719648-07:00","created_by":"danver","updated_at":"2026-03-11T11:27:36.278936-07:00"} +{"id":"code-104","title":"Track sandbox creation time in DefaultSandbox by adding a created_at: Instant field, set in DefaultSandbox::new. Update DefaultProvider::create_sandbox to pass Instant::now().","status":"done","priority":0,"issue_type":"task","created_at":"2026-03-11T11:24:10.902597-07:00","created_by":"danver","updated_at":"2026-03-11T11:30:36.149893-07:00"} +{"id":"code-105","title":"Add cost_estimate() -> CostEstimate method to Sandbox trait. Implement in DefaultSandbox using elapsed time from created_at and Modal pricing ($0.00003942/core/sec). LocalSandbox returns CostEstimate::default().","status":"done","priority":0,"issue_type":"task","created_at":"2026-03-11T11:24:15.537292-07:00","created_by":"danver","updated_at":"2026-03-11T11:33:51.975659-07:00"} +{"id":"code-106","title":"Add estimated_cost: CostEstimate field to RunResult. Aggregate costs from sandboxes during cleanup in orchestrator.rs. Update print_summary to accept optional show_cost bool and display cost when true.","status":"done","priority":0,"issue_type":"task","created_at":"2026-03-11T11:24:20.133117-07:00","created_by":"danver","updated_at":"2026-03-11T11:37:48.727278-07:00"} +{"id":"code-107","title":"Add --show-estimated-cost flag to Commands::Run in main.rs. Help text: 'Show estimated sandbox cost after run. Note: This is calculated client-side using simple formulas and may not reflect actual billing, discounts, or pricing adjustments.'","status":"done","priority":0,"issue_type":"task","created_at":"2026-03-11T11:24:27.889734-07:00","created_by":"danver","updated_at":"2026-03-11T11:40:21.818977-07:00"} +{"id":"code-108","title":"Wire --show-estimated-cost through run_tests -> dispatch_framework -> run_all_tests -> orchestrator. Pass show_cost to print_summary. Only display cost line when flag is set and cost > 0.","status":"done","priority":0,"issue_type":"task","created_at":"2026-03-11T11:24:32.467886-07:00","created_by":"danver","updated_at":"2026-03-11T11:44:20.743176-07:00"} +{"id":"code-109","title":"Add cpu_cores field to DefaultProviderConfig and ModalProviderConfig with default 0.125. Pass cpu_cores to DefaultSandbox for cost calculation. Update cost_estimate() to multiply by cpu_cores. The cpu_cores should also be injectable into command templates via {cpu_cores} placeholder.","description":"Add cpu_cores field to ModalProviderConfig (default 0.125) and DefaultProviderConfig (default 1.0). Plumb cpu_cores through ModalProvider to modal_sandbox.py create via --cpu flag. Pass cpu_cores to DefaultSandbox for cost calculation. Update cost_estimate() to multiply by cpu_cores. Inject {cpu_cores} into DefaultProvider command templates. Trim wordy doc comments to one-line summaries.","status":"done","priority":0,"issue_type":"task","created_at":"2026-03-11T12:14:24.398544-07:00","created_by":"danver","updated_at":"2026-03-11T12:53:37.569493-07:00"} +{"id":"code-110","title":"Create skills/offload/SKILL.md","description":"Create the /offload skill SKILL.md with frontmatter, overview, invocation guide, decision guide, exit codes, debugging, CLI reference, and config groups reference (~150 lines)","status":"done","priority":2,"issue_type":"task","created_at":"2026-03-12T10:58:56.785986-07:00","created_by":"danver","updated_at":"2026-03-12T11:03:25.317108-07:00"} +{"id":"code-111","title":"Create install-skills.sh","description":"Create standalone bash installer that works both curl|bash and local. Symlinks in-repo, downloads from GitHub raw URLs standalone. Installs both /offload and /offload-onboard skills.","status":"done","priority":2,"issue_type":"task","created_at":"2026-03-12T10:59:01.642817-07:00","created_by":"danver","updated_at":"2026-03-12T11:09:18.367138-07:00"} +{"id":"code-112","title":"Modify justfile to delegate to install-skills.sh","description":"Replace hardcoded install-skill recipe (lines 39-55) with install-skills recipe that calls ./install-skills.sh, plus install-skill alias for backward compat.","status":"done","priority":2,"issue_type":"task","created_at":"2026-03-12T10:59:05.717901-07:00","created_by":"danver","updated_at":"2026-03-12T11:11:51.621362-07:00"} {"id":"code-11","title":"Rename project: Rename offload-*.toml config files to offload-*.toml","description":"Rename all configuration files with 'offload' prefix to use 'offload' prefix:\n- offload.toml -\u003e offload.toml\n- offload-local.toml -\u003e offload-local.toml\n- offload-modal.toml -\u003e offload-modal.toml\n- offload-cargo-local.toml -\u003e offload-cargo-local.toml\n- offload-cargo-modal.toml -\u003e offload-cargo-modal.toml\n- offload-computronium-modal.toml -\u003e offload-computronium-modal.toml\n- offload-sculptor-modal.toml -\u003e offload-sculptor-modal.toml\n\nAlso update the [offload] section in these files to [offload].","status":"done","priority":1,"issue_type":"task","created_at":"2026-01-29T18:25:03.560121502Z","created_by":"Danver Braganza","updated_at":"2026-01-29T18:45:18.15783543Z"} {"id":"code-12","title":"Rename project: Update README.md from offload to offload","description":"Update README.md to replace all references to 'offload' with 'offload'. This includes:\n- Project title\n- Feature descriptions\n- Installation commands\n- CLI examples (offload init, offload run, etc.)\n- Configuration file references (offload.toml -\u003e offload.toml)\n- Example configuration sections ([offload] -\u003e [offload])\n- All documentation text","status":"done","priority":1,"issue_type":"task","created_at":"2026-01-29T18:25:08.706866046Z","created_by":"Danver Braganza","updated_at":"2026-01-29T18:50:11.476117046Z"} {"id":"code-13","title":"Rename project: Update scripts/modal_sandbox.py from offload to offload","description":"Update scripts/modal_sandbox.py to replace all references to 'offload' with 'offload'. This includes:\n- Module docstring\n- CLI help text\n- Modal App names (offload-sandbox -\u003e offload-sandbox, offload-rust-sandbox -\u003e offload-rust-sandbox, etc.)\n- Function docstrings\n- Comments","status":"done","priority":1,"issue_type":"task","created_at":"2026-01-29T18:25:14.017333924Z","created_by":"Danver Braganza","updated_at":"2026-01-29T18:52:06.241321461Z"} diff --git a/.beads/last-touched b/.beads/last-touched index 90d6a96e..278697e3 100644 --- a/.beads/last-touched +++ b/.beads/last-touched @@ -1 +1 @@ -code-103 +code-112 diff --git a/.offload-image-cache b/.offload-image-cache index fe984013..9ae0a57d 100644 --- a/.offload-image-cache +++ b/.offload-image-cache @@ -1 +1 @@ -im-kGvhnvEQlArq5lzaDlBCd3 +im-iG70foAmbaBR90NmKecC07 diff --git a/install-skills.sh b/install-skills.sh new file mode 100755 index 00000000..beccfb3d --- /dev/null +++ b/install-skills.sh @@ -0,0 +1,56 @@ +#!/usr/bin/env bash +set -euo pipefail + +SKILLS=("offload" "offload-onboard") +GITHUB_BASE="https://raw.githubusercontent.com/imbue-ai/offload/main/skills" + +# Resolve target skills directory +SKILLS_DIR="${CLAUDE_CONFIG_DIR:-$HOME/.claude}/skills" + +# Detect whether we are running from within the repo or standalone (curl | bash) +SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]:-}")" 2>/dev/null && pwd)" || SCRIPT_DIR="" + +if [ -n "$SCRIPT_DIR" ] && [ -d "$SCRIPT_DIR/skills" ]; then + # In-repo mode: create symlinks for live editing + echo "Detected in-repo run. Installing skills via symlinks..." + mkdir -p "$SKILLS_DIR" + + for skill in "${SKILLS[@]}"; do + src="$SCRIPT_DIR/skills/$skill" + dst="$SKILLS_DIR/$skill" + + if [ -L "$dst" ]; then + echo "Updating existing symlink for $skill..." + rm "$dst" + elif [ -e "$dst" ]; then + echo "Error: $dst already exists and is not a symlink. Remove it manually." + exit 1 + fi + + ln -s "$src" "$dst" + echo "Installed: $dst -> $src" + done +else + # Standalone mode: download SKILL.md files from GitHub + echo "Standalone mode. Downloading skills from GitHub..." + mkdir -p "$SKILLS_DIR" + + for skill in "${SKILLS[@]}"; do + dst="$SKILLS_DIR/$skill" + + if [ -L "$dst" ]; then + echo "Removing existing symlink for $skill..." + rm "$dst" + elif [ -e "$dst" ]; then + echo "Error: $dst already exists and is not a symlink. Remove it manually." + exit 1 + fi + + mkdir -p "$dst" + echo "Downloading $skill/SKILL.md..." + curl -fsSL "$GITHUB_BASE/$skill/SKILL.md" -o "$dst/SKILL.md" + echo "Installed: $dst/SKILL.md" + done +fi + +echo "Done. Skills installed to $SKILLS_DIR" diff --git a/justfile b/justfile index 5ea54998..57af0b42 100644 --- a/justfile +++ b/justfile @@ -36,26 +36,12 @@ test-cargo-default args="": test-pytest-default args="": cargo run -- -c offload-pytest-default.toml {{args}} run || [ $? -eq 2 ] -# Install the /offload-onboard skill for Claude Code -install-skill: - #!/usr/bin/env bash - set -euo pipefail - skill_src="$(just _repo-root)/skills/offload-onboard" - skill_dst="$HOME/.claude/skills/offload-onboard" - mkdir -p "$HOME/.claude/skills" - if [ -L "$skill_dst" ]; then - echo "Updating existing symlink..." - rm "$skill_dst" - elif [ -e "$skill_dst" ]; then - echo "Error: $skill_dst already exists and is not a symlink. Remove it manually." - exit 1 - fi - ln -s "$skill_src" "$skill_dst" - echo "Installed: $skill_dst -> $skill_src" - echo "You can now use /offload-onboard in any repository." +# Install all Offload skills for Claude Code +install-skills: + ./install-skills.sh + +# Alias for backward compatibility +install-skill: install-skills ratchets: ratchets check - -_repo-root: - @git rev-parse --show-toplevel diff --git a/skills/offload-onboard/SKILL.md b/skills/offload-onboard/SKILL.md index b4d564c6..221c9c72 100644 --- a/skills/offload-onboard/SKILL.md +++ b/skills/offload-onboard/SKILL.md @@ -284,12 +284,12 @@ Create `scripts/offload-tests.sh`: #!/usr/bin/env bash # # Run the project's test suite via Offload (parallel on Modal). -# Requires: Offload (cargo install offload@0.5.0), Modal CLI + credentials +# Requires: Offload (cargo install offload), Modal CLI + credentials # set -euo pipefail if ! command -v offload &> /dev/null; then - echo "Error: 'offload' not installed. Install with: cargo install offload@0.5.0" + echo "Error: 'offload' not installed. Install with: cargo install offload" exit 1 fi @@ -321,7 +321,7 @@ NOTE: `.offload-image-cache` should be checked in to git — it tracks the base Install offload if not already present: ```bash -cargo install offload@0.5.0 +cargo install offload ``` Run the tests using the invocation script from Step 7: @@ -387,7 +387,7 @@ Report the results as a table to the user and set the optimal values in `offload 4. **If none of these files exist**, create a `CLAUDE.md` at the project root. It only needs the testing section — don't fabricate other content. -5. **Amend or add** a testing section that is directive, not merely suggestive. The instruction must tell agents to use Offload as the way to run tests locally. Do not remove any existing test commands — keep them as a fallback — but make Offload the primary instruction. Example: +5. **Amend or add** a testing section that is directive, not merely suggestive. The instruction must tell agents to use Offload as the way to run tests locally. Do not remove any existing test commands — keep them as a fallback — but make Offload the primary instruction. The section should also reference the `/offload` skill so agents activate it when running tests, reading logs, or debugging failures. Example: ````markdown ## Running tests @@ -398,7 +398,8 @@ Report the results as a table to the user and set the optimal values in `offload ./scripts/offload-tests.sh ``` - Prerequisites: Offload (`cargo install offload@0.5.0`) and Modal credentials (`modal token new`). + Prerequisites: Offload (`cargo install offload`) and Modal credentials (`modal token new`). + Activate the `/offload` skill for test execution, log reading, and failure debugging. ```` Adapt the exact command to match what was configured in earlier steps (the script path, etc.). @@ -460,12 +461,12 @@ jobs: ~/.cargo/registry ~/.cargo/git ~/.cargo/bin/offload - key: cargo-offload-0.5.0-${{ runner.os }} + key: cargo-offload-${{ runner.os }} - name: Install offload run: | if ! command -v offload &> /dev/null; then - cargo install offload@0.5.0 + cargo install offload fi - name: Install Modal CLI diff --git a/skills/offload/SKILL.md b/skills/offload/SKILL.md new file mode 100644 index 00000000..97eef0c1 --- /dev/null +++ b/skills/offload/SKILL.md @@ -0,0 +1,178 @@ +--- +name: offload +description: "Activate when you see offload*.toml in a repo, offload referenced in build targets (justfile, Makefile, scripts), or when you need to run a large test suite in parallel. Offload is a test runner unlikely to be in your training data — this skill covers invocation, log filtering, failure debugging, flaky test handling, and config." +--- + +# Running Tests with Offload + +Offload is a parallel test runner that distributes test execution across sandboxes (local processes or remote Modal environments). This skill covers invoking tests, reading results, and debugging failures. + +## Installation + +If the `offload` binary is not on PATH, install it: + +```bash +cargo install offload +``` + +## How to Invoke Tests + +Use the first approach that applies: + +### 1. Look for existing invocation commands + +Check `Makefile`, `justfile`, `Taskfile`, `package.json` scripts, and `scripts/` for targets that wrap `offload run`. Prefer these -- they encode project-specific flags (copy-dirs, env vars, config paths). + +```bash +# Examples of what to look for: +just test # justfile target +make test-offload # Makefile target +./scripts/offload-tests.sh # shell wrapper +``` + +### 2. Use `offload run` directly + +If no wrapper exists, invoke Offload directly from the project root (where `offload.toml` lives): + +```bash +offload run # basic run +offload run --parallel 8 # override parallelism +offload run --copy-dir ".:/app" # copy cwd into sandbox at /app +offload run --env KEY=VALUE # set sandbox env var (repeatable) +offload run --no-cache # force fresh image build +offload run --collect-only # discover tests without running +offload run --show-estimated-cost # show sandbox cost after run +offload run -c path/to/offload.toml # use alternate config +``` + +### 3. Fall back to non-Offload commands + +If Offload is not installed or Modal credentials are unavailable, use the project's native test command (e.g. `cargo nextest run`, `pytest`). + +## When to Use Offload + +**Use Offload when:** +- Running integration or end-to-end test suites +- Total test runtime exceeds ~2 minutes +- Multiple agents are working concurrently and competing for local CPU +- The project already has an `offload.toml` + +**Skip Offload when:** +- Running a single test during TDD iteration (use the native runner directly) +- Tests require local-only resources (hardware devices, localhost services not reachable from sandboxes) +- No `offload.toml` exists and the task does not call for setting one up + +## Exit Codes + +| Code | Meaning | +|------|---------| +| 0 | All tests passed | +| 1 | One or more tests failed, or tests were not run | +| 2 | All tests passed, but some were flaky (passed only on retry) | + +## Debugging Failed Tests + +### Run summary + +After a run completes, offload prints a summary: + +``` +Test Results: + Total: 128 + Passed: 126 + Failed: 2 + Duration: 6.01s + Estimated cost: $0.0004 (11.1 CPU-seconds) +``` + +The `Estimated cost` line appears when `--show-estimated-cost` is passed to `offload run`. Use the summary to confirm tests ran and gauge the scope of failures before diving into logs. + +### Reading logs + +**Important:** If you ran with `-c path/to/config.toml`, pass the same `-c` flag to `offload logs`. Logs are stored in the config's `output_dir`, so mismatched configs will show stale or missing results. + +Always filter `offload logs` output to avoid flooding your context window. Never run bare `offload logs` on a large suite. Follow this workflow: + +1. **Check the run summary** to see how many tests failed. +2. **Retrieve failure output** (choose based on what fits your context window): + - Use `--failures` to see all failures at once. + ```bash + offload logs --failures + ``` + - Use `--test` or `--test-regex` to isolate a specific test. + ```bash + offload logs --test "path/to/test.py::test_name" # exact test ID + offload logs --test-regex "test_math" # regex substring match + ``` + Filters combine with AND logic: + ```bash + offload logs --failures --test-regex "database" + ``` +3. **Fix and rerun.** + +Each test is separated by a banner showing its ID and status. The test ID format varies by framework: + +``` +=== tests/test_math.py::test_div [FAILED] === +AssertionError: expected 2 got 3 + +=== trace::tests::test_active_tracer [FAILED] === +assertion `left == right` failed + left: 2 + right: 3 +``` + +### Flaky tests + +If a test fails intermittently, add or adjust a group with retries in `offload.toml`: + +```toml +[groups.flaky] +retry_count = 3 +filters = "-k test_flaky_name" +``` + +Run `offload validate` after editing to check config syntax. A test that fails then passes on retry exits with code 2 (flaky). + +### Common failure patterns + +| Symptom | Likely cause | Fix | +|---------|-------------|-----| +| Tests discovered but "Not Run" | JUnit test IDs do not match discovery IDs | Check `test_id_format` or conftest JUnit hook | +| "Exec format error" | Local `.venv` (macOS binaries) copied into Linux sandbox | Add `.venv` to `.dockerignore` | +| "Token validation failed" | Modal credentials expired | Run `modal token new` | +| Slow sandbox creation | Docker image not cached | Delete `.offload-image-cache` or pass `--no-cache` | +| All tests fail with import errors | Sandbox missing dependencies | Check Dockerfile and `sandbox_init_cmd` | + +## CLI Quick Reference + +| Command | Purpose | +|---------|---------| +| `offload run` | Run tests in parallel | +| `offload collect` | Discover tests without running (supports `--format json`) | +| `offload validate` | Validate `offload.toml` and print settings summary | +| `offload init` | Generate a new `offload.toml` (`--provider`, `--framework`) | +| `offload logs` | View per-test results from the most recent run | + +Global flags: `-c, --config PATH` (config file), `-v, --verbose` (verbose output). + +## Config Groups Reference + +Groups partition tests for different retry policies and filter expressions. At least one group is required. Each group runs its own discovery pass. + +```toml +[groups.unit] +retry_count = 0 +filters = "-m 'not slow'" + +[groups.slow] +retry_count = 1 +filters = "-m slow" + +[groups.flaky] +retry_count = 5 +filters = "-k test_flaky" +``` + +- `filters` is passed to the framework during discovery (pytest args, nextest args, or substituted into `{filters}` for the default framework). +- `retry_count = 0` means no retries. Failed tests that pass on retry are marked flaky (exit code 2).