Commit 28688e7
authored
[reward] fix: preserve input non_tensor_batch in AgentLoopManager when reward_loop_worker_handles is None (verl-project#5195)
### What does this PR do?
### Problem
When the codebase is updated to
[2cd9283](verl-project@2cd9283)
(migration to the new asynchronous reward manager), using **colocate
RM** with async rollout (`AgentLoopManager`) causes validation to fail
with: KeyError: 'data_source'
- **Where:** `verl/experimental/reward_loop/reward_manager/naive.py`,
line 42, in `run_single` — it accesses
`data_item.non_tensor_batch["data_source"]`.
- **Call path:** `_validate` →
`_compute_reward_colocate(test_output_gen_batch_padded)` →
`reward_loop_manager.compute_rm_score(batch)` →
`RewardLoopWorker.compute_score_batch` → `compute_score` → `run_single`.
- **Cause:** When `reward_loop_worker_handles is None` (e.g. colocate
RM), `AgentLoopManager.generate_sequences` returns a `DataProto` whose
`non_tensor_batch` is built only from agent outputs (`__num_turns__`,
`multi_modal_inputs`, `raw_prompt`). Input metadata such as
`data_source` is never forwarded, so the batch passed to the reward
manager is missing `data_source` and the naive reward manager raises
`KeyError: 'data_source'`.
<img width="1684" height="785"
alt="WeChatWorkScreenshot_0bc71e8a-a6a8-4334-b930-5a5d0bb149a2"
src="https://github.com/user-attachments/assets/ed492d44-a198-4508-b094-11426207fdf2"
/>
## Solution
- Pass the input batch’s `non_tensor_batch` into `_postprocess` as
`**kwargs`.
- When `reward_loop_worker_handles is None`, merge these `kwargs` into
the output `non_tensor_batch` so `data_source` and other input keys are
preserved.
- Colocate RM / validation then receives a batch that includes
`data_source`, and the `KeyError` is fixed.1 parent d8561c2 commit 28688e7
1 file changed
+8
-2
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
469 | 469 | | |
470 | 470 | | |
471 | 471 | | |
472 | | - | |
| 472 | + | |
473 | 473 | | |
474 | 474 | | |
475 | 475 | | |
| |||
717 | 717 | | |
718 | 718 | | |
719 | 719 | | |
720 | | - | |
| 720 | + | |
| 721 | + | |
| 722 | + | |
| 723 | + | |
| 724 | + | |
721 | 725 | | |
722 | 726 | | |
723 | 727 | | |
| |||
757 | 761 | | |
758 | 762 | | |
759 | 763 | | |
| 764 | + | |
| 765 | + | |
760 | 766 | | |
761 | 767 | | |
762 | 768 | | |
| |||
0 commit comments