Skip to content

fix: use mean().mean() for console metrics to match wandb#2056

Merged
mikasenghaas merged 1 commit intomainfrom
fix/console-metric-consistency
Mar 20, 2026
Merged

fix: use mean().mean() for console metrics to match wandb#2056
mikasenghaas merged 1 commit intomainfrom
fix/console-metric-consistency

Conversation

@mikasenghaas
Copy link
Member

@mikasenghaas mikasenghaas commented Mar 20, 2026

Summary

  • Console reward and val reward were using results_df.reward.mean() (flat mean across all rollouts), while wandb logged by_example.reward.mean().mean() (mean per example, then across examples)
  • These diverge when rollouts_per_example varies across problems (e.g. due to filtering or errors)
  • Now both console and wandb use the same mean().mean() aggregation
  • Also reuses the existing by_example groupby instead of re-computing it

🤖 Generated with Claude Code


Note

Low Risk
Low risk: only changes how step-level console metrics are aggregated and formatted, without affecting training, logging payloads, or model behavior.

Overview
Console step logging in orchestrator.py now computes train and validation reward as mean per example, then mean across examples (groupby("example_id").mean().mean()), matching the aggregation used for W&B/monitor logging when rollout counts vary.

It also reuses the existing by_example groupby for train reward/sequence length and builds the validation string separately to simplify the step message.

Written by Cursor Bugbot for commit 387afac. This will update automatically on new commits. Configure here.

Console reward was using results_df.reward.mean() (flat mean across all
rollouts) while wandb logged by_example.reward.mean().mean() (mean per
example, then across examples). These diverge when rollouts_per_example
varies across problems. Now both use the same aggregation.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@mikasenghaas mikasenghaas requested a review from samsja March 20, 2026 00:33
@mikasenghaas mikasenghaas merged commit 3dcff28 into main Mar 20, 2026
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants