beads: mark code-104 done

jacobkirmayer-imbue · claude · jacobkirmayer-imbue · commit f5dad401bb4d · 2026-03-17T12:07:04.000-07:00
Co-Authored-By: Claude Opus 4.6 (1M context) &lt;noreply@anthropic.com&gt;
diff --git a/.beads/issues.jsonl b/.beads/issues.jsonl
@@ -4,7 +4,7 @@
 {"id":"code-101","title":"Investigate and instrument slow acceptance tests for optimization","description":"## Directive\n\nIMPORTANT: Before doing ANY git or VCS operations, you MUST activate the jujutsu skill by running /jujutsu. This is a jujutsu-managed repository. Using raw git commands will corrupt data.\n\nWhen this bead is complete, mark the final revision with a branch: danver/investigate-slow-acceptance-tests\n\n## Problem\n\nFive acceptance tests individually take 47-68 seconds each, forming the hard floor on execution time. No amount of parallelism or scheduling improvement can reduce the wall clock below the duration of the slowest single test. These tests are the dominant bottleneck in Run 3.\n\n## Evidence from Trace Analysis\n\nFrom Run 3 (26,210 tests, 200 sandboxes):\n\n| Batch | Tests | Duration | Sandbox |\n|-------|-------|----------|---------|\n| batch_3 | 1 | 67.6s | 17 |\n| batch_1 | 1 | 58.1s | 2 |\n| batch_0 | 1 | 48.0s | 0 |\n| batch_4 | 1 | 48.0s | 32 |\n| batch_2 | 1 | 47.6s | 3 |\n\nThese 5 tests consume 269 sandbox-seconds. The longest (67.6s) is 72% of the entire 93.8s execution window. Even a 2x improvement on just the slowest test would save ~34s off the critical path.\n\n## Context\n\nThese tests are in the `mng` repository (the repo that uses offload to run its tests), not in the offload repository itself. The offload tool runs whatever tests it discovers -- it does not control their content. However, we can instrument offload to help identify what makes these tests slow.\n\n## Required Changes\n\n### 1. Add per-test timing to batch output\n\nCurrently, offload knows the total batch duration but not individual test durations within a batch. For single-test batches this is fine, but for multi-test batches the per-test breakdown is invisible.\n\nIn `src/orchestrator/runner.rs`, after downloading the JUnit XML results, parse the `time` attribute from each `\u003ctestcase\u003e` element and log the top-N slowest tests:\n\n```rust\n// After downloading junit.xml, log slowest tests\nlet mut test_times: Vec\u003c(\u0026str, f64)\u003e = Vec::new();\n// Parse \u003ctestcase name=\"...\" time=\"...\"\u003e elements\n// Sort by time descending\n// Log top 5 slowest\nfor (name, time) in test_times.iter().take(5) {\n    info!(\"[SLOW TEST] {}: {:.1}s\", name, time);\n}\n```\n\n### 2. Add a `--slow-test-threshold` CLI flag\n\nAdd a `--slow-test-threshold` flag (default: 30s) that causes offload to emit a warning for any test exceeding the threshold:\n\n```\nWARNING: Test 'test_full_acceptance_flow' took 67.6s (threshold: 30s)\n```\n\nThis makes slow tests visible in CI output without requiring trace analysis.\n\n### 3. Add slow test data to the Perfetto trace\n\nIn the trace output, add per-test duration events. Currently the trace has batch-level events (`exec_batch`, `download_results`). Add individual test events within the exec thread:\n\n```rust\n// For each testcase in the junit XML:\ntracer.complete_event(\n    test_name,\n    \"test\",\n    sandbox_pid,\n    TID_EXEC,\n    test_start_us,\n    test_duration_us,\n);\n```\n\nThis requires parsing the JUnit XML for individual test times and mapping them back to the trace timeline. The start time can be approximated (batch_start + cumulative_previous_test_times).\n\n### 4. Add a summary section to the run output\n\nAfter the existing summary (passed/failed/flaky counts), add a \"Slowest Tests\" section:\n\n```\nSlowest tests:\n  1. test_full_acceptance_flow  67.6s\n  2. test_end_to_end_pipeline   58.1s\n  3. test_modal_integration     48.0s\n  ...\n```\n\nUse the JUnit XML `time` attributes as the source of truth.\n\n### 5. Write tests\n\n- Test that the slow test warning is emitted when a test exceeds the threshold\n- Test that the slow test summary is correctly sorted and limited to top N\n- Test that per-test trace events are emitted correctly\n\n## Expected Impact\n\n- No direct wall-clock improvement (this is instrumentation)\n- Enables the mng team to identify and profile the specific slow tests\n- The slow test warnings in CI output will create visibility and pressure to fix them\n- Per-test trace events enable deeper analysis in Perfetto UI\n\n## Files to Modify\n- src/orchestrator/runner.rs (add per-test timing extraction from JUnit XML)\n- src/main.rs (add --slow-test-threshold flag)\n- src/report.rs or src/report/junit.rs (add slow test summary to output)\n- src/trace.rs (possibly add per-test trace events)","status":"open","priority":2,"issue_type":"task","created_at":"2026-03-05T22:07:16.863726-08:00","created_by":"danver","updated_at":"2026-03-05T22:07:16.863726-08:00"}
 {"id":"code-102","title":"Add vitest duplicate test name check to onboarding skill","description":"Update SKILL.md to detect vitest framework and check for duplicate space-separated test IDs during onboarding. If duplicates are found, the agent must stop and ask the user if they want the agent to deduplicate them by renaming tests more verbosely. Convey that this is a blocking requirement for using Offload.","status":"done","priority":0,"issue_type":"task","owner":"jacob.kirmayer@imbue.com","created_at":"2026-03-16T10:16:55.275063-07:00","created_by":"Jacob Kirmayer","updated_at":"2026-03-16T10:22:43.276909-07:00"}
 {"id":"code-103","title":"Add offload collect verification step to onboarding skill","description":"Update SKILL.md Step 10 (Run Offload Locally and Verify) to instruct agents to use 'offload collect' first to verify discovery works before running full 'offload run'. The agent should iterate on offload collect until discovery succeeds before attempting execution.","status":"done","priority":0,"issue_type":"task","owner":"jacob.kirmayer@imbue.com","created_at":"2026-03-16T10:54:33.111537-07:00","created_by":"Jacob Kirmayer","updated_at":"2026-03-16T10:56:07.309012-07:00"}
-{"id":"code-104","title":"Replace flat default_duration with per-group average in scheduler","description":"Replace the hardcoded 1s default_duration in schedule_lpt with per-group average durations computed from historical data. Steps: (1) Add group() accessor to TestInstance. (2) Change schedule_lpt signature to accept HashMap\u003cString, Duration\u003e for group defaults instead of a single Duration. (3) In orchestrator.rs, compute per-group averages from the durations map + test records, and pass that mapping to schedule_lpt. (4) Update all scheduler tests. Keep the 1s fallback only when a group has zero historical data.","status":"open","priority":1,"issue_type":"task","owner":"jacob.kirmayer@imbue.com","created_at":"2026-03-17T12:02:13.11179-07:00","created_by":"Jacob Kirmayer","updated_at":"2026-03-17T12:02:13.11179-07:00"}
+{"id":"code-104","title":"Replace flat default_duration with per-group average in scheduler","description":"Replace the hardcoded 1s default_duration in schedule_lpt with per-group average durations computed from historical data. Steps: (1) Add group() accessor to TestInstance. (2) Change schedule_lpt signature to accept HashMap\u003cString, Duration\u003e for group defaults instead of a single Duration. (3) In orchestrator.rs, compute per-group averages from the durations map + test records, and pass that mapping to schedule_lpt. (4) Update all scheduler tests. Keep the 1s fallback only when a group has zero historical data.","status":"done","priority":1,"issue_type":"task","owner":"jacob.kirmayer@imbue.com","created_at":"2026-03-17T12:02:13.11179-07:00","created_by":"Jacob Kirmayer","updated_at":"2026-03-17T12:07:04.910208-07:00"}
 {"id":"code-11","title":"Rename project: Rename offload-*.toml config files to offload-*.toml","description":"Rename all configuration files with 'offload' prefix to use 'offload' prefix:\n- offload.toml -\u003e offload.toml\n- offload-local.toml -\u003e offload-local.toml\n- offload-modal.toml -\u003e offload-modal.toml\n- offload-cargo-local.toml -\u003e offload-cargo-local.toml\n- offload-cargo-modal.toml -\u003e offload-cargo-modal.toml\n- offload-computronium-modal.toml -\u003e offload-computronium-modal.toml\n- offload-sculptor-modal.toml -\u003e offload-sculptor-modal.toml\n\nAlso update the [offload] section in these files to [offload].","status":"done","priority":1,"issue_type":"task","created_at":"2026-01-29T18:25:03.560121502Z","created_by":"Danver Braganza","updated_at":"2026-01-29T18:45:18.15783543Z"}
 {"id":"code-12","title":"Rename project: Update README.md from offload to offload","description":"Update README.md to replace all references to 'offload' with 'offload'. This includes:\n- Project title\n- Feature descriptions\n- Installation commands\n- CLI examples (offload init, offload run, etc.)\n- Configuration file references (offload.toml -\u003e offload.toml)\n- Example configuration sections ([offload] -\u003e [offload])\n- All documentation text","status":"done","priority":1,"issue_type":"task","created_at":"2026-01-29T18:25:08.706866046Z","created_by":"Danver Braganza","updated_at":"2026-01-29T18:50:11.476117046Z"}
 {"id":"code-13","title":"Rename project: Update scripts/modal_sandbox.py from offload to offload","description":"Update scripts/modal_sandbox.py to replace all references to 'offload' with 'offload'. This includes:\n- Module docstring\n- CLI help text\n- Modal App names (offload-sandbox -\u003e offload-sandbox, offload-rust-sandbox -\u003e offload-rust-sandbox, etc.)\n- Function docstrings\n- Comments","status":"done","priority":1,"issue_type":"task","created_at":"2026-01-29T18:25:14.017333924Z","created_by":"Danver Braganza","updated_at":"2026-01-29T18:52:06.241321461Z"}