Skip to content

Add candidate-patch rollout hook to reward.py (#2a)#2

Open
flo7up wants to merge 1 commit into
fix/verifier-trust-batchfrom
feat/reward-candidate-patch
Open

Add candidate-patch rollout hook to reward.py (#2a)#2
flo7up wants to merge 1 commit into
fix/verifier-trust-batchfrom
feat/reward-candidate-patch

Conversation

@flo7up

@flo7up flo7up commented Jun 13, 2026

Copy link
Copy Markdown
Owner

@-

reward.py previously built the image from the static base snapshot with no way
to supply a candidate fix, so an unmodified taskpack scored 0.0 by construction
and there was no supported path to 1.0 — i.e. no way to actually grade an
agent's attempt. This adds that path.

- reward.py (generated template): `--patch <diff>` copies repo/ to a throwaway
  temp dir, applies the patch with `git apply` (works without .git), builds the
  image from the patched copy, runs the tests, and cleans up. The base snapshot
  is never mutated. A non-applying patch -> 0.0 with a clear error, reported
  before the docker step. Container engine is configurable via FORGE_DOCKER_BIN
  (e.g. podman).
- reward_runner.run_reward_script / `forge reward`: add `--patch` passthrough
  (resolved to an absolute path before invoking the script).
- tests/test_reward_rollout.py: run the generated script as a subprocess with a
  bogus FORGE_DOCKER_BIN to prove staging behavior without a container engine —
  good patch applies and reaches the docker check (base left unmodified), bad
  patch is rejected first, missing patch reported, no-patch path preserved, and
  the wrapper passes --patch through. Plus a compile() check on the template.

84 tests pass.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant