fix: use mean().mean() for console metrics to match wandb by mikasenghaas · Pull Request #2056 · PrimeIntellect-ai/prime-rl

mikasenghaas · 2026-03-20T00:20:46Z

Summary

Console reward and val reward were using results_df.reward.mean() (flat mean across all rollouts), while wandb logged by_example.reward.mean().mean() (mean per example, then across examples)
These diverge when rollouts_per_example varies across problems (e.g. due to filtering or errors)
Now both console and wandb use the same mean().mean() aggregation
Also reuses the existing by_example groupby instead of re-computing it

🤖 Generated with Claude Code

Note

Low Risk
Low risk: only changes how step-level console metrics are aggregated and formatted, without affecting training, logging payloads, or model behavior.

Overview
Console step logging in orchestrator.py now computes train and validation reward as mean per example, then mean across examples (groupby("example_id").mean().mean()), matching the aggregation used for W&B/monitor logging when rollout counts vary.

It also reuses the existing by_example groupby for train reward/sequence length and builds the validation string separately to simplify the step message.

^{Written by Cursor Bugbot for commit 387afac. This will update automatically on new commits. Configure here.}

Console reward was using results_df.reward.mean() (flat mean across all rollouts) while wandb logged by_example.reward.mean().mean() (mean per example, then across examples). These diverge when rollouts_per_example varies across problems. Now both use the same aggregation. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

mikasenghaas requested a review from samsja March 20, 2026 00:33

d42me approved these changes Mar 20, 2026

View reviewed changes

samsja approved these changes Mar 20, 2026

View reviewed changes

mikasenghaas merged commit 3dcff28 into main Mar 20, 2026
9 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: use mean().mean() for console metrics to match wandb#2056

fix: use mean().mean() for console metrics to match wandb#2056
mikasenghaas merged 1 commit intomainfrom
fix/console-metric-consistency

mikasenghaas commented Mar 20, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

mikasenghaas commented Mar 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

mikasenghaas commented Mar 20, 2026 •

edited

Loading