Skip to content

Comments

Generalize build script#103

Open
dedeswim wants to merge 19 commits intomainfrom
generalize-build-script
Open

Generalize build script#103
dedeswim wants to merge 19 commits intomainfrom
generalize-build-script

Conversation

@dedeswim
Copy link
Collaborator

@dedeswim dedeswim commented Jan 27, 2026

Summary

  • Refactor build_images.py from a SWE-bench-specific script into a generic image builder that works with any dataset implementing the new ImageBuildableDataset
    protocol
  • Add ImageBuildableDataset protocol to the dataset registry — datasets opt into image building by implementing a get_image_build_specs() classmethod
  • Introduce tuple entry point format (factory_fn, dataset_class) in registry_base.py so the registry can discover both the factory and the class at load time
  • Add DerivedImageSpec for images built on top of a base (e.g., SWE-bench pair images), ImageBuildSpec union type, and SeederFn support for pre-build setup on
    BuildImageSpec (needed for upcoming browser dataset)
  • Add get_image_build_specs() classmethod to SwebenchDataset and expose swebench_entry / agentdojo_entry tuples
  • Update ImageCache to handle DerivedImageSpec
  • The CLI now accepts --dataset <name> / --all-datasets instead of being hardcoded to SWE-bench

Test plan

  • Verify uv run pytest -vx -m "not docker_integration" passes
  • Verify uv run ruff check and uv run ruff format --check pass
  • Verify prompt-siren-build-images --dataset swebench still builds SWE-bench images correctly
  • Verify prompt-siren-build-images --all-datasets discovers swebench as the only image-buildable dataset

We cannot run the last two yet while we figure out what registry to use to push images.

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Jan 27, 2026
@dedeswim dedeswim force-pushed the generalize-build-script branch from b5f3d05 to 9e776c4 Compare January 27, 2026 08:33
@dedeswim dedeswim requested a review from evtimovi January 27, 2026 13:14
@dedeswim dedeswim marked this pull request as ready for review January 27, 2026 13:14
@dedeswim dedeswim force-pushed the generalize-build-script branch from 772fab5 to 0a7ff07 Compare January 27, 2026 13:42
dedeswim and others added 15 commits January 29, 2026 20:37
Refactor the Docker image build system from being SWE-bench specific to
supporting any dataset that implements the ImageBuildableDataset protocol.

Key changes:
- Add ImageBuildableDataset protocol to dataset registry with
  get_image_build_specs() classmethod
- Add DerivedImageSpec and ImageBuildSpec union type to image_spec module
- Support tuple entry point format (factory_fn, dataset_class) in
  registry_base for protocol discovery
- Refactor build_images.py to use registry-based dataset discovery
  instead of importing SWE-bench directly
- Add get_image_build_specs() classmethod to SwebenchDataset
- Update dataset entry points to use tuple format
- Add SeederFn support in BuildImageSpec for pre-build setup
- Add DerivedImageSpec handling in ImageCache

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Remove dead config_overrides parameter from run_build()
- Remove dead except (KeyboardInterrupt, SystemExit, asyncio.CancelledError)
  re-raise blocks (these inherit from BaseException, not Exception)
- Fix double _should_build check in build_modified_image by calling
  _do_build directly
- Store per-entry-point errors in registry instead of silently swallowing
- Collapse _load_image_buildable_classes and
  _ensure_image_buildable_classes_loaded into one function
- Fix BaseRegistry docstring to use generic terminology
- Remove agentdojo_entry alias, register factory directly in pyproject.toml
- Remove unused has_image_spec_factory() from exports and tests

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Eliminate the parallel entry-point loading system in datasets/registry.py
by extending BaseRegistry to store component classes from tuple entries.
This removes 4 module-level globals and ~35 lines of duplicated loading
logic. Also replaces print() with proper logging, fixes a bare except
that swallowed errors, removes dead code, and updates stale docstrings.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…component_class)

All dataset entry points now use a consistent 3-tuple format instead of
a mix of plain callables and 2-tuples. This makes the config class
explicit in the tuple rather than requiring signature inspection for
tuple entries, and eliminates the inconsistency between agentdojo
(plain factory) and swebench (2-tuple).

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…Spec, fix error handling

- Add ComponentEntryPoint NamedTuple for structured entry point definitions
  and update dataset entry points to use it (plain tuples still supported)
- Fix docstring claiming 2-tuple when code uses 3-tuple
- Re-raise in outer except block instead of silently swallowing errors
- Add StringConstraints(min_length=1) and tag != base_image_tag validator
  to DerivedImageSpec
- Generalize DerivedImageSpec docstring to remove SWE-bench-specific language
- Split conflated error messages in get_image_build_specs
- Add exc_info=e to error logging in build_images.py
- Rename _should_build to _prepare_for_build with side-effect documentation
- Remove redundant comments in image_cache.py
- Add ImageBuildableDataset protocol docstring note about BaseModel parameter

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…alidators and logging

Split build and push try blocks in build_all_specs so push failures no longer
add tags to failed_base_tags, which was causing derived images to be skipped
with misleading "base image failed to build" messages. Add model validator to
MultiStageBuildImageSpec enforcing final_tag matches last stage tag and stages
is non-empty. Expose failed_entry_points on BaseRegistry and log warnings in
get_datasets_with_image_specs for actionable --all-datasets output. Fix stale
_should_build reference in _do_build docstring.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@dedeswim dedeswim force-pushed the generalize-build-script branch from a3d000d to a256ab5 Compare January 29, 2026 19:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant