Skip to content

Conversation

@felipemello1
Copy link
Contributor

@felipemello1 felipemello1 commented Nov 25, 2025

Reduced 99 -> 70

metrics now:

andbBackend: Logged 70 metrics at step 2
=== [global_reduce] - METRICS STEP 2 ===
  buffer/add/count_episodes_added: 16.0
  buffer/evict/sum_episodes_evicted: 16.0
  buffer/sample/avg_sampled_policy_age: 0.5
  buffer/sample/demand_to_size_ratio: 1.9523809523809523
  buffer/sample/max_sampled_policy_age: 1.0
  dataset/sample/current_epoch: 0.0
  episode/avg_prompt_tokens: 144.5
  episode/avg_response_tokens: 947.15625
  episode/max_prompt_tokens: 170.0
  episode/max_response_tokens: 2048.0
  episode/min_prompt_tokens: 128.0
  episode/min_response_tokens: 219.0
  generator/generate/count_requests: 4.0
  generator/generate/count_sequences_completed: 32.0
  generator_perf/generate/total_duration_avg_s: 5.812659332275391
  generator_perf/generate/total_duration_max_s: 10.64300390625
  generator_perf/update_weights/avg_waiting_for_generation_duration_s: 9.653438535518944
  generator_perf/update_weights/sum_pending_gen_requests: 1.0
  grpo_loss/advantage_mean: 1.043081283569336e-07
  grpo_loss/advantage_std: 0.9657998085021973
  grpo_loss/kl_divergence_max: 6551.78759765625
  grpo_loss/kl_divergence_mean: 1.3870304822921753
  grpo_loss/policy_gradient_loss: 7.560361581226971e-08
  grpo_loss/total_loss: 1.3114268995195744e-06
  main/continuous_rollouts/count_rollout_iterations: 2.0
  main/continuous_rollouts/unfit_for_training_dropped_episodes: 2.0
  main_perf/continuous_rollouts/policy_generation/duration_avg_s: 3.307545113377273
  main_perf/continuous_rollouts/policy_generation/duration_max_s: 5.272513267584145
  main_perf/continuous_rollouts/reference_model_calculate_logprobs/duration_avg_s: 0.2661919156089425
  main_perf/continuous_rollouts/reference_model_calculate_logprobs/duration_max_s: 0.26626743003726006
  main_perf/continuous_rollouts/reward_evaluation/duration_avg_s: 0.043925000820308924
  main_perf/continuous_rollouts/reward_evaluation/duration_max_s: 0.04406541399657726
  main_perf/continuous_rollouts/total_duration_avg_s: 3.669450921472162
  main_perf/continuous_rollouts/total_duration_max_s: 5.634579579345882
  main_perf/continuous_training/drop_weights/duration_avg_s: 0.7262937715277076
  main_perf/continuous_training/drop_weights/duration_max_s: 0.7262937715277076
  main_perf/continuous_training/push_weights/duration_avg_s: 2.6497885528951883
  main_perf/continuous_training/push_weights/duration_max_s: 2.6497885528951883
  main_perf/continuous_training/total_duration_avg_s: 22.276771686039865
  main_perf/continuous_training/total_duration_max_s: 22.276771686039865
  main_perf/continuous_training/train_step/duration_avg_s: 6.086995053105056
  main_perf/continuous_training/train_step/duration_max_s: 6.086995053105056
  main_perf/continuous_training/update_weights/duration_avg_s: 10.764651249162853
  main_perf/continuous_training/update_weights/duration_max_s: 10.764651249162853
  main_perf/continuous_training/waiting_for_buffer/duration_avg_s: 2.049040215089917
  main_perf/continuous_training/waiting_for_buffer/duration_max_s: 2.049040215089917
  reference_perf/forward/count_forward_passes: 2.0
  reference_perf/forward/memory_delta_end_start_avg_gb: 6.95526123046875
  reference_perf/forward/memory_peak_max_gb: 33.95764207839966
  reference_perf/forward/total_duration_avg_s: 0.017874598503112793
  reference_perf/forward/total_duration_max_s: 0.019280921667814255
  reward/evaluate_response/avg_LanguageReward_reward: 0.375
  reward/evaluate_response/avg_MathReward_reward: 0.8031250000000003
  reward/evaluate_response/avg_ThinkingReward_reward: 0.75625
  reward/evaluate_response/avg_total_reward: 0.6447916666666668
  reward/evaluate_response/std_LanguageReward_reward: 0.7806247497997998
  reward/evaluate_response/std_MathReward_reward: 0.3720587781184579
  reward/evaluate_response/std_ThinkingReward_reward: 0.4234807404121231
  rl_trainer/learning_rate: 1e-05
  rl_trainer/loss: 1.3114268995195744e-06
  rl_trainer_perf/step/forward_backward/duration_avg_s: 5.688607947900891
  rl_trainer_perf/step/forward_backward/duration_max_s: 5.688607947900891
  rl_trainer_perf/step/memory_delta_end_start_avg_gb: 0.0007634162902832031
  rl_trainer_perf/step/memory_peak_max_gb: 73.74072074890137
  rl_trainer_perf/step/optimizer_step/duration_avg_s: 0.03665326815098524
  rl_trainer_perf/step/optimizer_step/duration_max_s: 0.03665326815098524
  rl_trainer_perf/step/save_checkpoint/duration_avg_s: 0.35752423107624054
  rl_trainer_perf/step/save_checkpoint/duration_max_s: 0.35752423107624054
  rl_trainer_perf/step/total_duration_avg_s: 6.0827940898016095
  rl_trainer_perf/step/total_duration_max_s: 6.0827940898016095

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Nov 25, 2025
@felipemello1 felipemello1 changed the title [metric logging] clean up 1/n [logging] clean up 1/n Nov 25, 2025
@felipemello1
Copy link
Contributor Author

upon a second look, i will remove a few more

@felipemello1
Copy link
Contributor Author

felipemello1 commented Nov 26, 2025

@allenwang28 @joecummings please take a look at the list. Would you remove anything else? (note: some values are small because its a 1.7b model with small context length and bsz, but would be more significant with scale)

Copy link
Member

@joecummings joecummings left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few comments to address but overall good!

loss = -(mean_policy_loss - beta * mean_kl)

# Log metrics
# TODO: Better design - have loss function return all metrics as a dict,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to get away from TODOs in code as a marker. Can you turn this into a small GI?

Reduce.STD,
)

# avg total reward
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove. It's already called avg_total_reward

drop = rewards_std < 1e-3 or max_response_len >= max_res_tokens
record_metric(
"main/continuous_rollouts/dropped_episodes",
"main/continuous_rollouts/unfit_for_training_dropped_episodes",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This naming is a little confusing. Can you explain?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants