Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 11 additions & 0 deletions .beads/issues.jsonl
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,17 @@
{"id":"code-101","title":"Investigate and instrument slow acceptance tests for optimization","description":"## Directive\n\nIMPORTANT: Before doing ANY git or VCS operations, you MUST activate the jujutsu skill by running /jujutsu. This is a jujutsu-managed repository. Using raw git commands will corrupt data.\n\nWhen this bead is complete, mark the final revision with a branch: danver/investigate-slow-acceptance-tests\n\n## Problem\n\nFive acceptance tests individually take 47-68 seconds each, forming the hard floor on execution time. No amount of parallelism or scheduling improvement can reduce the wall clock below the duration of the slowest single test. These tests are the dominant bottleneck in Run 3.\n\n## Evidence from Trace Analysis\n\nFrom Run 3 (26,210 tests, 200 sandboxes):\n\n| Batch | Tests | Duration | Sandbox |\n|-------|-------|----------|---------|\n| batch_3 | 1 | 67.6s | 17 |\n| batch_1 | 1 | 58.1s | 2 |\n| batch_0 | 1 | 48.0s | 0 |\n| batch_4 | 1 | 48.0s | 32 |\n| batch_2 | 1 | 47.6s | 3 |\n\nThese 5 tests consume 269 sandbox-seconds. The longest (67.6s) is 72% of the entire 93.8s execution window. Even a 2x improvement on just the slowest test would save ~34s off the critical path.\n\n## Context\n\nThese tests are in the `mng` repository (the repo that uses offload to run its tests), not in the offload repository itself. The offload tool runs whatever tests it discovers -- it does not control their content. However, we can instrument offload to help identify what makes these tests slow.\n\n## Required Changes\n\n### 1. Add per-test timing to batch output\n\nCurrently, offload knows the total batch duration but not individual test durations within a batch. For single-test batches this is fine, but for multi-test batches the per-test breakdown is invisible.\n\nIn `src/orchestrator/runner.rs`, after downloading the JUnit XML results, parse the `time` attribute from each `\u003ctestcase\u003e` element and log the top-N slowest tests:\n\n```rust\n// After downloading junit.xml, log slowest tests\nlet mut test_times: Vec\u003c(\u0026str, f64)\u003e = Vec::new();\n// Parse \u003ctestcase name=\"...\" time=\"...\"\u003e elements\n// Sort by time descending\n// Log top 5 slowest\nfor (name, time) in test_times.iter().take(5) {\n info!(\"[SLOW TEST] {}: {:.1}s\", name, time);\n}\n```\n\n### 2. Add a `--slow-test-threshold` CLI flag\n\nAdd a `--slow-test-threshold` flag (default: 30s) that causes offload to emit a warning for any test exceeding the threshold:\n\n```\nWARNING: Test 'test_full_acceptance_flow' took 67.6s (threshold: 30s)\n```\n\nThis makes slow tests visible in CI output without requiring trace analysis.\n\n### 3. Add slow test data to the Perfetto trace\n\nIn the trace output, add per-test duration events. Currently the trace has batch-level events (`exec_batch`, `download_results`). Add individual test events within the exec thread:\n\n```rust\n// For each testcase in the junit XML:\ntracer.complete_event(\n test_name,\n \"test\",\n sandbox_pid,\n TID_EXEC,\n test_start_us,\n test_duration_us,\n);\n```\n\nThis requires parsing the JUnit XML for individual test times and mapping them back to the trace timeline. The start time can be approximated (batch_start + cumulative_previous_test_times).\n\n### 4. Add a summary section to the run output\n\nAfter the existing summary (passed/failed/flaky counts), add a \"Slowest Tests\" section:\n\n```\nSlowest tests:\n 1. test_full_acceptance_flow 67.6s\n 2. test_end_to_end_pipeline 58.1s\n 3. test_modal_integration 48.0s\n ...\n```\n\nUse the JUnit XML `time` attributes as the source of truth.\n\n### 5. Write tests\n\n- Test that the slow test warning is emitted when a test exceeds the threshold\n- Test that the slow test summary is correctly sorted and limited to top N\n- Test that per-test trace events are emitted correctly\n\n## Expected Impact\n\n- No direct wall-clock improvement (this is instrumentation)\n- Enables the mng team to identify and profile the specific slow tests\n- The slow test warnings in CI output will create visibility and pressure to fix them\n- Per-test trace events enable deeper analysis in Perfetto UI\n\n## Files to Modify\n- src/orchestrator/runner.rs (add per-test timing extraction from JUnit XML)\n- src/main.rs (add --slow-test-threshold flag)\n- src/report.rs or src/report/junit.rs (add slow test summary to output)\n- src/trace.rs (possibly add per-test trace events)","status":"open","priority":2,"issue_type":"task","created_at":"2026-03-05T22:07:16.863726-08:00","created_by":"danver","updated_at":"2026-03-05T22:07:16.863726-08:00"}
{"id":"code-102","title":"Add vitest duplicate test name check to onboarding skill","description":"Update SKILL.md to detect vitest framework and check for duplicate space-separated test IDs during onboarding. If duplicates are found, the agent must stop and ask the user if they want the agent to deduplicate them by renaming tests more verbosely. Convey that this is a blocking requirement for using Offload.","status":"done","priority":0,"issue_type":"task","owner":"jacob.kirmayer@imbue.com","created_at":"2026-03-16T10:16:55.275063-07:00","created_by":"Jacob Kirmayer","updated_at":"2026-03-16T10:22:43.276909-07:00"}
{"id":"code-103","title":"Add offload collect verification step to onboarding skill","description":"Update SKILL.md Step 10 (Run Offload Locally and Verify) to instruct agents to use 'offload collect' first to verify discovery works before running full 'offload run'. The agent should iterate on offload collect until discovery succeeds before attempting execution.","status":"done","priority":0,"issue_type":"task","owner":"jacob.kirmayer@imbue.com","created_at":"2026-03-16T10:54:33.111537-07:00","created_by":"Jacob Kirmayer","updated_at":"2026-03-16T10:56:07.309012-07:00"}
{"id":"code-102","title":"Strip pytest framework config to bare minimum","description":"Remove python, extra_args, markers fields from PytestFrameworkConfig. Make command required (not Option). Keep test_id_format as internal constant. Update all related code: schema.rs, pytest.rs, main.rs init template, example TOML configs, tests, README.","status":"done","priority":1,"issue_type":"task","created_at":"2026-03-10T11:56:28.4364-07:00","created_by":"Jacob Kirmayer","updated_at":"2026-03-10T12:03:22.088725-07:00"}
{"id":"code-103","title":"Add CostEstimate struct to provider.rs with cpu_seconds and estimated_cost_usd fields. Include Display impl. The struct should be Clone, Debug, Default.","status":"done","priority":0,"issue_type":"task","created_at":"2026-03-11T11:24:05.719648-07:00","created_by":"danver","updated_at":"2026-03-11T11:27:36.278936-07:00"}
{"id":"code-104","title":"Track sandbox creation time in DefaultSandbox by adding a created_at: Instant field, set in DefaultSandbox::new. Update DefaultProvider::create_sandbox to pass Instant::now().","status":"done","priority":0,"issue_type":"task","created_at":"2026-03-11T11:24:10.902597-07:00","created_by":"danver","updated_at":"2026-03-11T11:30:36.149893-07:00"}
{"id":"code-105","title":"Add cost_estimate() -> CostEstimate method to Sandbox trait. Implement in DefaultSandbox using elapsed time from created_at and Modal pricing ($0.00003942/core/sec). LocalSandbox returns CostEstimate::default().","status":"done","priority":0,"issue_type":"task","created_at":"2026-03-11T11:24:15.537292-07:00","created_by":"danver","updated_at":"2026-03-11T11:33:51.975659-07:00"}
{"id":"code-106","title":"Add estimated_cost: CostEstimate field to RunResult. Aggregate costs from sandboxes during cleanup in orchestrator.rs. Update print_summary to accept optional show_cost bool and display cost when true.","status":"done","priority":0,"issue_type":"task","created_at":"2026-03-11T11:24:20.133117-07:00","created_by":"danver","updated_at":"2026-03-11T11:37:48.727278-07:00"}
{"id":"code-107","title":"Add --show-estimated-cost flag to Commands::Run in main.rs. Help text: 'Show estimated sandbox cost after run. Note: This is calculated client-side using simple formulas and may not reflect actual billing, discounts, or pricing adjustments.'","status":"done","priority":0,"issue_type":"task","created_at":"2026-03-11T11:24:27.889734-07:00","created_by":"danver","updated_at":"2026-03-11T11:40:21.818977-07:00"}
{"id":"code-108","title":"Wire --show-estimated-cost through run_tests -> dispatch_framework -> run_all_tests -> orchestrator. Pass show_cost to print_summary. Only display cost line when flag is set and cost > 0.","status":"done","priority":0,"issue_type":"task","created_at":"2026-03-11T11:24:32.467886-07:00","created_by":"danver","updated_at":"2026-03-11T11:44:20.743176-07:00"}
{"id":"code-109","title":"Add cpu_cores field to DefaultProviderConfig and ModalProviderConfig with default 0.125. Pass cpu_cores to DefaultSandbox for cost calculation. Update cost_estimate() to multiply by cpu_cores. The cpu_cores should also be injectable into command templates via {cpu_cores} placeholder.","description":"Add cpu_cores field to ModalProviderConfig (default 0.125) and DefaultProviderConfig (default 1.0). Plumb cpu_cores through ModalProvider to modal_sandbox.py create via --cpu flag. Pass cpu_cores to DefaultSandbox for cost calculation. Update cost_estimate() to multiply by cpu_cores. Inject {cpu_cores} into DefaultProvider command templates. Trim wordy doc comments to one-line summaries.","status":"done","priority":0,"issue_type":"task","created_at":"2026-03-11T12:14:24.398544-07:00","created_by":"danver","updated_at":"2026-03-11T12:53:37.569493-07:00"}
{"id":"code-110","title":"Create skills/offload/SKILL.md","description":"Create the /offload skill SKILL.md with frontmatter, overview, invocation guide, decision guide, exit codes, debugging, CLI reference, and config groups reference (~150 lines)","status":"done","priority":2,"issue_type":"task","created_at":"2026-03-12T10:58:56.785986-07:00","created_by":"danver","updated_at":"2026-03-12T11:03:25.317108-07:00"}
{"id":"code-111","title":"Create install-skills.sh","description":"Create standalone bash installer that works both curl|bash and local. Symlinks in-repo, downloads from GitHub raw URLs standalone. Installs both /offload and /offload-onboard skills.","status":"done","priority":2,"issue_type":"task","created_at":"2026-03-12T10:59:01.642817-07:00","created_by":"danver","updated_at":"2026-03-12T11:09:18.367138-07:00"}
{"id":"code-112","title":"Modify justfile to delegate to install-skills.sh","description":"Replace hardcoded install-skill recipe (lines 39-55) with install-skills recipe that calls ./install-skills.sh, plus install-skill alias for backward compat.","status":"done","priority":2,"issue_type":"task","created_at":"2026-03-12T10:59:05.717901-07:00","created_by":"danver","updated_at":"2026-03-12T11:11:51.621362-07:00"}
{"id":"code-11","title":"Rename project: Rename offload-*.toml config files to offload-*.toml","description":"Rename all configuration files with 'offload' prefix to use 'offload' prefix:\n- offload.toml -\u003e offload.toml\n- offload-local.toml -\u003e offload-local.toml\n- offload-modal.toml -\u003e offload-modal.toml\n- offload-cargo-local.toml -\u003e offload-cargo-local.toml\n- offload-cargo-modal.toml -\u003e offload-cargo-modal.toml\n- offload-computronium-modal.toml -\u003e offload-computronium-modal.toml\n- offload-sculptor-modal.toml -\u003e offload-sculptor-modal.toml\n\nAlso update the [offload] section in these files to [offload].","status":"done","priority":1,"issue_type":"task","created_at":"2026-01-29T18:25:03.560121502Z","created_by":"Danver Braganza","updated_at":"2026-01-29T18:45:18.15783543Z"}
{"id":"code-12","title":"Rename project: Update README.md from offload to offload","description":"Update README.md to replace all references to 'offload' with 'offload'. This includes:\n- Project title\n- Feature descriptions\n- Installation commands\n- CLI examples (offload init, offload run, etc.)\n- Configuration file references (offload.toml -\u003e offload.toml)\n- Example configuration sections ([offload] -\u003e [offload])\n- All documentation text","status":"done","priority":1,"issue_type":"task","created_at":"2026-01-29T18:25:08.706866046Z","created_by":"Danver Braganza","updated_at":"2026-01-29T18:50:11.476117046Z"}
{"id":"code-13","title":"Rename project: Update scripts/modal_sandbox.py from offload to offload","description":"Update scripts/modal_sandbox.py to replace all references to 'offload' with 'offload'. This includes:\n- Module docstring\n- CLI help text\n- Modal App names (offload-sandbox -\u003e offload-sandbox, offload-rust-sandbox -\u003e offload-rust-sandbox, etc.)\n- Function docstrings\n- Comments","status":"done","priority":1,"issue_type":"task","created_at":"2026-01-29T18:25:14.017333924Z","created_by":"Danver Braganza","updated_at":"2026-01-29T18:52:06.241321461Z"}
Expand Down
2 changes: 1 addition & 1 deletion .beads/last-touched
Original file line number Diff line number Diff line change
@@ -1 +1 @@
code-103
code-112
2 changes: 1 addition & 1 deletion .offload-image-cache
Original file line number Diff line number Diff line change
@@ -1 +1 @@
im-kGvhnvEQlArq5lzaDlBCd3
im-iG70foAmbaBR90NmKecC07
56 changes: 56 additions & 0 deletions install-skills.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
#!/usr/bin/env bash
set -euo pipefail

SKILLS=("offload" "offload-onboard")
GITHUB_BASE="https://raw.githubusercontent.com/imbue-ai/offload/main/skills"

# Resolve target skills directory
SKILLS_DIR="${CLAUDE_CONFIG_DIR:-$HOME/.claude}/skills"

# Detect whether we are running from within the repo or standalone (curl | bash)
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]:-}")" 2>/dev/null && pwd)" || SCRIPT_DIR=""

if [ -n "$SCRIPT_DIR" ] && [ -d "$SCRIPT_DIR/skills" ]; then
# In-repo mode: create symlinks for live editing
echo "Detected in-repo run. Installing skills via symlinks..."
mkdir -p "$SKILLS_DIR"

for skill in "${SKILLS[@]}"; do
src="$SCRIPT_DIR/skills/$skill"
dst="$SKILLS_DIR/$skill"

if [ -L "$dst" ]; then
echo "Updating existing symlink for $skill..."
rm "$dst"
elif [ -e "$dst" ]; then
echo "Error: $dst already exists and is not a symlink. Remove it manually."
exit 1
fi

ln -s "$src" "$dst"
echo "Installed: $dst -> $src"
done
else
# Standalone mode: download SKILL.md files from GitHub
echo "Standalone mode. Downloading skills from GitHub..."
mkdir -p "$SKILLS_DIR"

for skill in "${SKILLS[@]}"; do
dst="$SKILLS_DIR/$skill"

if [ -L "$dst" ]; then
echo "Removing existing symlink for $skill..."
rm "$dst"
elif [ -e "$dst" ]; then
echo "Error: $dst already exists and is not a symlink. Remove it manually."
exit 1
fi

mkdir -p "$dst"
echo "Downloading $skill/SKILL.md..."
curl -fsSL "$GITHUB_BASE/$skill/SKILL.md" -o "$dst/SKILL.md"
echo "Installed: $dst/SKILL.md"
done
fi

echo "Done. Skills installed to $SKILLS_DIR"
26 changes: 6 additions & 20 deletions justfile
Original file line number Diff line number Diff line change
Expand Up @@ -36,26 +36,12 @@ test-cargo-default args="":
test-pytest-default args="":
cargo run -- -c offload-pytest-default.toml {{args}} run || [ $? -eq 2 ]

# Install the /offload-onboard skill for Claude Code
install-skill:
#!/usr/bin/env bash
set -euo pipefail
skill_src="$(just _repo-root)/skills/offload-onboard"
skill_dst="$HOME/.claude/skills/offload-onboard"
mkdir -p "$HOME/.claude/skills"
if [ -L "$skill_dst" ]; then
echo "Updating existing symlink..."
rm "$skill_dst"
elif [ -e "$skill_dst" ]; then
echo "Error: $skill_dst already exists and is not a symlink. Remove it manually."
exit 1
fi
ln -s "$skill_src" "$skill_dst"
echo "Installed: $skill_dst -> $skill_src"
echo "You can now use /offload-onboard in any repository."
# Install all Offload skills for Claude Code
install-skills:
./install-skills.sh

# Alias for backward compatibility
install-skill: install-skills

ratchets:
ratchets check

_repo-root:
@git rev-parse --show-toplevel
15 changes: 8 additions & 7 deletions skills/offload-onboard/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -284,12 +284,12 @@ Create `scripts/offload-tests.sh`:
#!/usr/bin/env bash
#
# Run the project's test suite via Offload (parallel on Modal).
# Requires: Offload (cargo install offload@0.5.0), Modal CLI + credentials
# Requires: Offload (cargo install offload), Modal CLI + credentials
#
set -euo pipefail

if ! command -v offload &> /dev/null; then
echo "Error: 'offload' not installed. Install with: cargo install offload@0.5.0"
echo "Error: 'offload' not installed. Install with: cargo install offload"
exit 1
fi

Expand Down Expand Up @@ -321,7 +321,7 @@ NOTE: `.offload-image-cache` should be checked in to git — it tracks the base
Install offload if not already present:

```bash
cargo install offload@0.5.0
cargo install offload
```

Run the tests using the invocation script from Step 7:
Expand Down Expand Up @@ -387,7 +387,7 @@ Report the results as a table to the user and set the optimal values in `offload

4. **If none of these files exist**, create a `CLAUDE.md` at the project root. It only needs the testing section — don't fabricate other content.

5. **Amend or add** a testing section that is directive, not merely suggestive. The instruction must tell agents to use Offload as the way to run tests locally. Do not remove any existing test commands — keep them as a fallback — but make Offload the primary instruction. Example:
5. **Amend or add** a testing section that is directive, not merely suggestive. The instruction must tell agents to use Offload as the way to run tests locally. Do not remove any existing test commands — keep them as a fallback — but make Offload the primary instruction. The section should also reference the `/offload` skill so agents activate it when running tests, reading logs, or debugging failures. Example:

````markdown
## Running tests
Expand All @@ -398,7 +398,8 @@ Report the results as a table to the user and set the optimal values in `offload
./scripts/offload-tests.sh
```

Prerequisites: Offload (`cargo install offload@0.5.0`) and Modal credentials (`modal token new`).
Prerequisites: Offload (`cargo install offload`) and Modal credentials (`modal token new`).
Activate the `/offload` skill for test execution, log reading, and failure debugging.
````

Adapt the exact command to match what was configured in earlier steps (the script path, etc.).
Expand Down Expand Up @@ -460,12 +461,12 @@ jobs:
~/.cargo/registry
~/.cargo/git
~/.cargo/bin/offload
key: cargo-offload-0.5.0-${{ runner.os }}
key: cargo-offload-${{ runner.os }}

- name: Install offload
run: |
if ! command -v offload &> /dev/null; then
cargo install offload@0.5.0
cargo install offload
fi

- name: Install Modal CLI
Expand Down
Loading
Loading