feat: build lightweight benchmark images by default by simonrosenberg · Pull Request #549 · OpenHands/benchmarks

simonrosenberg · 2026-03-20T20:01:59Z

Summary

Add --agent-type CLI flag to SWE-bench and SWT-bench image build scripts
Thread extra_build_args through the full call chain: build_all_images → _build_with_logging → build_image → SDK BuildOptions
Both workflows default agent-type to "default", which skips ACP and boto3 installation
Can be toggled via workflow dispatch input (agent-type: acp-claude or acp-codex) when ACP images are needed

Changes

benchmarks/utils/build_utils.py: LIGHTWEIGHT_BUILD_ARGS / ACP_BUILD_ARGS constants, --agent-type CLI flag, build_args_for_agent_type() helper, extra_build_args parameter threaded through all build functions
benchmarks/swebench/build_images.py: Pass lightweight build args to build_all_images
benchmarks/swtbench/build_images.py: Same
.github/workflows/build-swebench-images.yml: agent-type input (default default), passed as --agent-type flag
.github/workflows/build-swtbench-images.yml: Same
vendor/software-agent-sdk: Updated to SDK main (62c2e7cf) which includes extra_build_args support (#2541), optional ACP (#2535), and optional boto3 (#2536)

Expected impact (from analysis of 433-image build logs, #537)

Dependency skipped	Per-image saving	Cumulative (433 imgs)
npm ACP + nodejs	~32s install + ~4s export/push	4.5h
boto3/botocore	~3s export/push	0.4h
Total	~36s/image	~4.9h cumulative

Wall-clock improvement: ~1.4–2.5h (at 3.5× effective parallelism), plus non-linear savings from reduced disk pressure.

SDK dependencies (all merged to main)

feat(docker): make ACP npm package installation optional via build arg software-agent-sdk#2535 — INSTALL_ACP Dockerfile ARG
feat(docker): make boto3 installation optional via build arg software-agent-sdk#2536 — INSTALL_BOTO3 Dockerfile ARG
feat(docker): add extra_build_args to BuildOptions software-agent-sdk#2541 — extra_build_args on BuildOptions

Relation to #548

This is a slimmed-down version of #548 that works with SDK main today. It excludes INSTALL_BROWSER (browser-use optional support hasn't been merged to the SDK yet). Once browser-use lands, a follow-up can add INSTALL_BROWSER: "false" to LIGHTWEIGHT_BUILD_ARGS.

Test plan

Run SWT-bench build with --agent-type default on 4–10 images to verify builds succeed
Verify image can run a basic SWT-bench evaluation
Run with --agent-type acp-claude to verify ACP images still build correctly
Full 433-image build to measure actual wall-clock improvement

Closes #537

🤖 Generated with Claude Code

Add --agent-type CLI flag to SWE-bench and SWT-bench image build scripts. Thread extra_build_args through the full call chain: build_all_images -> _build_with_logging -> build_image -> SDK BuildOptions. Both workflows default agent-type to "default", which skips ACP and boto3 installation. Use "acp-claude" or "acp-codex" to keep ACP installed. Expected per-image saving: ~36s (ACP ~32s + boto3 ~3s), translating to ~4.9h cumulative across 433 images. Depends on: OpenHands/software-agent-sdk#2541 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

all-hands-bot

🟡 Taste Rating: Acceptable - Pragmatic solution to a real problem (4.9h build savings), but parameter defaulting needs cleanup.

Verdict: ✅ Core logic is sound. The config threading is straightforward. Fix the inconsistent defaults and this is good to merge.

benchmarks/utils/build_utils.py

Update vendor/software-agent-sdk to SDK main now that #2541 (extra_build_args) has been merged. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Use a consistent default across build_image, _build_with_logging, and build_all_images instead of defaulting to None and checking later. Removes the ternary fallback in _build_with_logging. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

The base parser (get_parser) already defines --agent-type. Adding it again in get_build_parser caused an argparse conflict error. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

all-hands-bot

🟢 Good taste - Pragmatic solution to a real problem (4.9h build savings) with clean execution.

Previous review issues resolved:

Consistent defaulting throughout: All functions now default extra_build_args to LIGHTWEIGHT_BUILD_ARGS instead of None
Eliminated unnecessary conditionals

Implementation is solid:

Simple, straightforward parameter threading
Clean constants with clear documentation
build_args_for_agent_type() uses startswith() pattern that allows future extensions
No special cases, no unnecessary complexity

Minor observation (not blocking):
Mutable dict as default argument is technically a Python code smell, but it's pragmatic here since:

Functions don't mutate, just pass through to SDK
Constants use SCREAMING_SNAKE_CASE signaling immutability
Making them immutable would add complexity without solving a real problem

✅ Verdict: Ship it

all-hands-bot reviewed Mar 20, 2026

View reviewed changes

benchmarks/utils/build_utils.py Show resolved Hide resolved

benchmarks/utils/build_utils.py Outdated Show resolved Hide resolved

benchmarks/utils/build_utils.py Outdated Show resolved Hide resolved

benchmarks/utils/build_utils.py Show resolved Hide resolved

Debug Agent and others added 3 commits March 20, 2026 17:06

chore: pin SDK submodule to main (62c2e7cf)

8167eab

Update vendor/software-agent-sdk to SDK main now that #2541 (extra_build_args) has been merged. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

fix: remove duplicate --agent-type arg from build parser

f5d5b1e

The base parser (get_parser) already defines --agent-type. Adding it again in get_build_parser caused an argparse conflict error. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

simonrosenberg requested a review from all-hands-bot March 20, 2026 20:16

simonrosenberg self-assigned this Mar 20, 2026

all-hands-bot approved these changes Mar 20, 2026

View reviewed changes

simonrosenberg merged commit 96d72a0 into main Mar 20, 2026
3 checks passed

simonrosenberg deleted the feat/lightweight-benchmark-images-v2 branch March 20, 2026 20:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: build lightweight benchmark images by default#549

feat: build lightweight benchmark images by default#549
simonrosenberg merged 4 commits intomainfrom
feat/lightweight-benchmark-images-v2

simonrosenberg commented Mar 20, 2026 •

edited

Loading

Uh oh!

all-hands-bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

all-hands-bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

simonrosenberg commented Mar 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Expected impact (from analysis of 433-image build logs, #537)

SDK dependencies (all merged to main)

Relation to #548

Test plan

Uh oh!

all-hands-bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

all-hands-bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

simonrosenberg commented Mar 20, 2026 •

edited

Loading