feat: build lightweight benchmark images by default#549
Merged
simonrosenberg merged 4 commits intomainfrom Mar 20, 2026
Merged
Conversation
Add --agent-type CLI flag to SWE-bench and SWT-bench image build scripts. Thread extra_build_args through the full call chain: build_all_images -> _build_with_logging -> build_image -> SDK BuildOptions. Both workflows default agent-type to "default", which skips ACP and boto3 installation. Use "acp-claude" or "acp-codex" to keep ACP installed. Expected per-image saving: ~36s (ACP ~32s + boto3 ~3s), translating to ~4.9h cumulative across 433 images. Depends on: OpenHands/software-agent-sdk#2541 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Collaborator
all-hands-bot
left a comment
There was a problem hiding this comment.
🟡 Taste Rating: Acceptable - Pragmatic solution to a real problem (4.9h build savings), but parameter defaulting needs cleanup.
Verdict: ✅ Core logic is sound. The config threading is straightforward. Fix the inconsistent defaults and this is good to merge.
Update vendor/software-agent-sdk to SDK main now that #2541 (extra_build_args) has been merged. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Use a consistent default across build_image, _build_with_logging, and build_all_images instead of defaulting to None and checking later. Removes the ternary fallback in _build_with_logging. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The base parser (get_parser) already defines --agent-type. Adding it again in get_build_parser caused an argparse conflict error. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
all-hands-bot
approved these changes
Mar 20, 2026
Collaborator
all-hands-bot
left a comment
There was a problem hiding this comment.
🟢 Good taste - Pragmatic solution to a real problem (4.9h build savings) with clean execution.
Previous review issues resolved:
- Consistent defaulting throughout: All functions now default
extra_build_argstoLIGHTWEIGHT_BUILD_ARGSinstead ofNone - Eliminated unnecessary conditionals
Implementation is solid:
- Simple, straightforward parameter threading
- Clean constants with clear documentation
build_args_for_agent_type()usesstartswith()pattern that allows future extensions- No special cases, no unnecessary complexity
Minor observation (not blocking):
Mutable dict as default argument is technically a Python code smell, but it's pragmatic here since:
- Functions don't mutate, just pass through to SDK
- Constants use SCREAMING_SNAKE_CASE signaling immutability
- Making them immutable would add complexity without solving a real problem
✅ Verdict: Ship it
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
--agent-typeCLI flag to SWE-bench and SWT-bench image build scriptsextra_build_argsthrough the full call chain:build_all_images→_build_with_logging→build_image→ SDKBuildOptionsagent-typeto"default", which skips ACP and boto3 installationagent-type: acp-claudeoracp-codex) when ACP images are neededChanges
benchmarks/utils/build_utils.py:LIGHTWEIGHT_BUILD_ARGS/ACP_BUILD_ARGSconstants,--agent-typeCLI flag,build_args_for_agent_type()helper,extra_build_argsparameter threaded through all build functionsbenchmarks/swebench/build_images.py: Pass lightweight build args tobuild_all_imagesbenchmarks/swtbench/build_images.py: Same.github/workflows/build-swebench-images.yml:agent-typeinput (defaultdefault), passed as--agent-typeflag.github/workflows/build-swtbench-images.yml: Samevendor/software-agent-sdk: Updated to SDK main (62c2e7cf) which includesextra_build_argssupport (#2541), optional ACP (#2535), and optional boto3 (#2536)Expected impact (from analysis of 433-image build logs, #537)
Wall-clock improvement: ~1.4–2.5h (at 3.5× effective parallelism), plus non-linear savings from reduced disk pressure.
SDK dependencies (all merged to main)
INSTALL_ACPDockerfile ARGINSTALL_BOTO3Dockerfile ARGextra_build_argsonBuildOptionsRelation to #548
This is a slimmed-down version of #548 that works with SDK main today. It excludes
INSTALL_BROWSER(browser-use optional support hasn't been merged to the SDK yet). Once browser-use lands, a follow-up can addINSTALL_BROWSER: "false"toLIGHTWEIGHT_BUILD_ARGS.Test plan
--agent-type defaulton 4–10 images to verify builds succeed--agent-type acp-claudeto verify ACP images still build correctlyCloses #537
🤖 Generated with Claude Code