Skip to content

Commit 52e0978

Browse files
shamanezPanAndy
authored andcommitted
fix: disable reward normalization for SWE configs with group_size=1
With group_size=1, mean normalization over traj_group_id produces all-zero advantages (R - R = 0). Use method: identity to skip normalization and pass raw rewards through directly. Closes #397
1 parent 6190d06 commit 52e0978

File tree

2 files changed

+3
-3
lines changed

2 files changed

+3
-3
lines changed

examples/agentic_demo/agent_val_rock_swe.yaml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -127,8 +127,8 @@ reference:
127127
infer_batch_size: 1
128128

129129
reward_normalization:
130-
grouping: traj_group_id # 可以tags(env_type)/traj_group_id(group)/batch(rollout_batch)... group_by计算reward/adv
131-
method: mean
130+
grouping: traj_group_id
131+
method: identity
132132
# norm_mean_type: batch
133133
# norm_std_type: group
134134

examples/agentic_demo/agent_val_rock_swe_qwen35_2b.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -138,7 +138,7 @@ reference:
138138

139139
reward_normalization:
140140
grouping: traj_group_id
141-
method: mean
141+
method: identity
142142

143143
train_env_manager:
144144
max_env_num_per_worker: 1

0 commit comments

Comments
 (0)