Skip to content

[GRPO app] Logging hygiene request #372

@allenwang28

Description

@allenwang28

🐛 Describe the bug

As I run GRPO app, I had some ideas on some logging hygiene that could help us as we continue to test.

For context, what we see right now:

>>> Aggregated Logs (2025-10-10 14:04:57) >>>
[3 similar log lines] [PolicyWorker-0/4] 2025-10-10 14:04:57 INFO [PolicyWorker::update] start updating weights to version 3
[4 similar log lines] [PolicyWorker-0/4] 2025-10-10 14:04:57 INFO Loading policy_ver_0000000003.dcp_whole_state_dict from DCP with handle policy_ver_0000000003.dcp_whole_state_dict
<<< Aggregated Logs (2025-10-10 14:05:00) <<<

[1] [ReferenceModel-1/4] 2025-10-10 14:05:01 INFO [GC] Performing periodic GC collection took 0.01 seconds
[0] [ReferenceModel-0/4] 2025-10-10 14:05:01 INFO [GC] Performing periodic GC collection took 0.02 seconds
[2] [ReferenceModel-2/4] 2025-10-10 14:05:01 INFO [GC] Performing periodic GC collection took 0.02 seconds
[3] [ReferenceModel-3/4] 2025-10-10 14:05:01 INFO [GC] Performing periodic GC collection took 0.02 seconds
[2] [ReferenceModel-2/4] 2025-10-10 14:05:03 INFO [GC] Performing periodic GC collection took 0.01 seconds
[0] [ReferenceModel-0/4] 2025-10-10 14:05:03 INFO [GC] Performing periodic GC collection took 0.01 seconds
[3] [ReferenceModel-3/4] 2025-10-10 14:05:03 INFO [GC] Performing periodic GC collection took 0.01 seconds
[1] [ReferenceModel-1/4] 2025-10-10 14:05:03 INFO [GC] Performing periodic GC collection took 0.01 seconds
>>> Aggregated Logs (2025-10-10 14:05:00) >>>
[3 similar log lines] [PolicyWorker-2/4] 2025-10-10 14:05:00 INFO [PolicyWorker::update] start updating weights to version 3
[4 similar log lines] [PolicyWorker-0/4] 2025-10-10 14:05:00 INFO Loading policy_ver_0000000003.dcp_whole_state_dict from DCP with handle policy_ver_0000000003.dcp_whole_state_dict
<<< Aggregated Logs (2025-10-10 14:05:03) <<<

[2] [ReferenceModel-2/4] 2025-10-10 14:05:04 INFO [GC] Performing periodic GC collection took 0.02 seconds
[3] [ReferenceModel-3/4] 2025-10-10 14:05:04 INFO [GC] Performing periodic GC collection took 0.01 seconds
[0] [ReferenceModel-0/4] 2025-10-10 14:05:04 INFO [GC] Performing periodic GC collection took 0.01 seconds
[1] [ReferenceModel-1/4] 2025-10-10 14:05:04 INFO [GC] Performing periodic GC collection took 0.01 seconds
[2] [ReferenceModel-2/4] 2025-10-10 14:05:06 INFO [GC] Performing periodic GC collection took 0.01 seconds
[1] [ReferenceModel-1/4] 2025-10-10 14:05:06 INFO [GC] Performing periodic GC collection took 0.01 seconds
[0] [ReferenceModel-0/4] 2025-10-10 14:05:06 INFO [GC] Performing periodic GC collection took 0.01 seconds
[3] [ReferenceModel-3/4] 2025-10-10 14:05:06 INFO [GC] Performing periodic GC collection took 0.01 seconds
[3] [ReferenceModel-3/4] 2025-10-10 14:05:08 INFO [GC] Performing periodic GC collection took 0.01 seconds
[1] [ReferenceModel-1/4] 2025-10-10 14:05:08 INFO [GC] Performing periodic GC collection took 0.01 seconds
[2] [ReferenceModel-2/4] 2025-10-10 14:05:08 INFO [GC] Performing periodic GC collection took 0.01 seconds
[0] [ReferenceModel-0/4] 2025-10-10 14:05:08 INFO [GC] Performing periodic GC collection took 0.01 seconds
[0] [ReferenceModel-0/4] 2025-10-10 14:05:09 INFO [GC] Performing periodic GC collection took 0.01 seconds
[1] [ReferenceModel-1/4] 2025-10-10 14:05:09 INFO [GC] Performing periodic GC collection took 0.01 seconds
[3] [ReferenceModel-3/4] 2025-10-10 14:05:09 INFO [GC] Performing periodic GC collection took 0.01 seconds
[2] [ReferenceModel-2/4] 2025-10-10 14:05:09 INFO [GC] Performing periodic GC collection took 0.02 seconds
[0] [ReferenceModel-0/4] 2025-10-10 14:05:11 INFO [GC] Performing periodic GC collection took 0.01 seconds
[1] [ReferenceModel-1/4] 2025-10-10 14:05:11 INFO [GC] Performing periodic GC collection took 0.01 seconds
[3] [ReferenceModel-3/4] 2025-10-10 14:05:11 INFO [GC] Performing periodic GC collection took 0.01 seconds
[2] [ReferenceModel-2/4] 2025-10-10 14:05:11 INFO [GC] Performing periodic GC collection took 0.02 seconds
[3] [ReferenceModel-3/4] 2025-10-10 14:05:13 INFO [GC] Performing periodic GC collection took 0.01 seconds
[2] [ReferenceModel-2/4] 2025-10-10 14:05:13 INFO [GC] Performing periodic GC collection took 0.01 seconds
[1] [ReferenceModel-1/4] 2025-10-10 14:05:13 INFO [GC] Performing periodic GC collection took 0.01 seconds
[0] [ReferenceModel-0/4] 2025-10-10 14:05:13 INFO [GC] Performing periodic GC collection took 0.01 seconds
[0] [ReferenceModel-0/4] 2025-10-10 14:05:14 INFO [GC] Performing periodic GC collection took 0.01 seconds
[1] [ReferenceModel-1/4] 2025-10-10 14:05:14 INFO [GC] Performing periodic GC collection took 0.01 seconds
[2] [ReferenceModel-2/4] 2025-10-10 14:05:14 INFO [GC] Performing periodic GC collection took 0.01 seconds
[3] [ReferenceModel-3/4] 2025-10-10 14:05:14 INFO [GC] Performing periodic GC collection took 0.01 seconds
[3] [ReferenceModel-3/4] 2025-10-10 14:05:16 INFO [GC] Performing periodic GC collection took 0.01 seconds
[0] [ReferenceModel-0/4] 2025-10-10 14:05:16 INFO [GC] Performing periodic GC collection took 0.01 seconds
[1] [ReferenceModel-1/4] 2025-10-10 14:05:16 INFO [GC] Performing periodic GC collection took 0.01 seconds
[2] [ReferenceModel-2/4] 2025-10-10 14:05:16 INFO [GC] Performing periodic GC collection took 0.01 seconds
[2] [ReferenceModel-2/4] 2025-10-10 14:05:18 INFO [GC] Performing periodic GC collection took 0.01 seconds
[1] [ReferenceModel-1/4] 2025-10-10 14:05:18 INFO [GC] Performing periodic GC collection took 0.01 seconds
[0] [ReferenceModel-0/4] 2025-10-10 14:05:18 INFO [GC] Performing periodic GC collection took 0.01 seconds
[3] [ReferenceModel-3/4] 2025-10-10 14:05:18 INFO [GC] Performing periodic GC collection took 0.01 seconds
[1] [ReferenceModel-1/4] 2025-10-10 14:05:19 INFO [GC] Performing periodic GC collection took 0.01 seconds
[3] [ReferenceModel-3/4] 2025-10-10 14:05:19 INFO [GC] Performing periodic GC collection took 0.01 seconds
[2] [ReferenceModel-2/4] 2025-10-10 14:05:19 INFO [GC] Performing periodic GC collection took 0.01 seconds
[0] [ReferenceModel-0/4] 2025-10-10 14:05:19 INFO [GC] Performing periodic GC collection took 0.01 seconds
[0] [ReferenceModel-0/4] 2025-10-10 14:05:21 INFO [GC] Performing periodic GC collection took 0.01 seconds
[1] [ReferenceModel-1/4] 2025-10-10 14:05:21 INFO [GC] Performing periodic GC collection took 0.01 seconds
[2] [ReferenceModel-2/4] 2025-10-10 14:05:21 INFO [GC] Performing periodic GC collection took 0.01 seconds
[3] [ReferenceModel-3/4] 2025-10-10 14:05:21 INFO [GC] Performing periodic GC collection took 0.01 seconds
[3] [ReferenceModel-3/4] 2025-10-10 14:05:23 INFO [GC] Performing periodic GC collection took 0.01 seconds
[1] [ReferenceModel-1/4] 2025-10-10 14:05:23 INFO [GC] Performing periodic GC collection took 0.01 seconds
[0] [ReferenceModel-0/4] 2025-10-10 14:05:23 INFO [GC] Performing periodic GC collection took 0.01 seconds
[2] [ReferenceModel-2/4] 2025-10-10 14:05:23 INFO [GC] Performing periodic GC collection took 0.01 seconds
[2] [ReferenceModel-2/4] 2025-10-10 14:05:24 INFO [GC] Performing periodic GC collection took 0.01 seconds
[1] [ReferenceModel-1/4] 2025-10-10 14:05:24 INFO [GC] Performing periodic GC collection took 0.01 seconds
[0] [ReferenceModel-0/4] 2025-10-10 14:05:24 INFO [GC] Performing periodic GC collection took 0.02 seconds
[3] [ReferenceModel-3/4] 2025-10-10 14:05:24 INFO [GC] Performing periodic GC collection took 0.01 seconds
[0] [ReferenceModel-0/4] 2025-10-10 14:05:26 INFO [GC] Performing periodic GC collection took 0.02 seconds
[3] [ReferenceModel-3/4] 2025-10-10 14:05:26 INFO [GC] Performing periodic GC collection took 0.01 seconds
[1] [ReferenceModel-1/4] 2025-10-10 14:05:26 INFO [GC] Performing periodic GC collection took 0.01 seconds
[2] [ReferenceModel-2/4] 2025-10-10 14:05:26 INFO [GC] Performing periodic GC collection took 0.01 seconds
[3] [ReferenceModel-3/4] 2025-10-10 14:05:28 INFO [GC] Performing periodic GC collection took 0.01 seconds
[0] [ReferenceModel-0/4] 2025-10-10 14:05:28 INFO [GC] Performing periodic GC collection took 0.01 seconds
[1] [ReferenceModel-1/4] 2025-10-10 14:05:28 INFO [GC] Performing periodic GC collection took 0.01 seconds
[2] [ReferenceModel-2/4] 2025-10-10 14:05:28 INFO [GC] Performing periodic GC collection took 0.01 seconds
[2] [ReferenceModel-2/4] 2025-10-10 14:05:29 INFO [GC] Performing periodic GC collection took 0.01 seconds
[0] [ReferenceModel-0/4] 2025-10-10 14:05:29 INFO [GC] Performing periodic GC collection took 0.01 seconds
[3] [ReferenceModel-3/4] 2025-10-10 14:05:29 INFO [GC] Performing periodic GC collection took 0.01 seconds
[1] [ReferenceModel-1/4] 2025-10-10 14:05:29 INFO [GC] Performing periodic GC collection took 0.01 seconds
[3] [ReferenceModel-3/4] 2025-10-10 14:05:31 INFO [GC] Performing periodic GC collection took 0.01 seconds
[2] [ReferenceModel-2/4] 2025-10-10 14:05:31 INFO [GC] Performing periodic GC collection took 0.01 seconds
[0] [ReferenceModel-0/4] 2025-10-10 14:05:31 INFO [GC] Performing periodic GC collection took 0.01 seconds
[1] [ReferenceModel-1/4] 2025-10-10 14:05:31 INFO [GC] Performing periodic GC collection took 0.01 seconds
[3] [ReferenceModel-3/4] 2025-10-10 14:05:33 INFO [GC] Performing periodic GC collection took 0.01 seconds
[2] [ReferenceModel-2/4] 2025-10-10 14:05:33 INFO [GC] Performing periodic GC collection took 0.01 seconds
[1] [ReferenceModel-1/4] 2025-10-10 14:05:33 INFO [GC] Performing periodic GC collection took 0.01 seconds
[0] [ReferenceModel-0/4] 2025-10-10 14:05:33 INFO [GC] Performing periodic GC collection took 0.01 seconds
[0] [ReferenceModel-0/4] 2025-10-10 14:05:34 INFO [GC] Performing periodic GC collection took 0.01 seconds
[2] [ReferenceModel-2/4] 2025-10-10 14:05:35 INFO [GC] Performing periodic GC collection took 0.01 seconds
[1] [ReferenceModel-1/4] 2025-10-10 14:05:35 INFO [GC] Performing periodic GC collection took 0.01 seconds
[3] [ReferenceModel-3/4] 2025-10-10 14:05:35 INFO [GC] Performing periodic GC collection took 0.01 seconds
[0] INFO 10-10 14:07:19 [block_pool.py:321] Successfully reset prefix cache
[0] [Policy-0/1] 2025-10-10 14:07:19 INFO Weight update completed (now v3)
[0] [ReferenceModel-0/4] 2025-10-10 14:07:42 INFO [GC] Performing periodic GC collection took 0.02 seconds
[1] [ReferenceModel-1/4] 2025-10-10 14:07:42 INFO [GC] Performing periodic GC collection took 0.02 seconds
[3] [ReferenceModel-3/4] 2025-10-10 14:07:42 INFO [GC] Performing periodic GC collection took 0.02 seconds
[2] [ReferenceModel-2/4] 2025-10-10 14:07:42 INFO [GC] Performing periodic GC collection took 0.02 seconds
[1] [ReferenceModel-1/4] 2025-10-10 14:07:44 INFO [GC] Performing periodic GC collection took 0.02 seconds
[0] [ReferenceModel-0/4] 2025-10-10 14:07:44 INFO [GC] Performing periodic GC collection took 0.01 seconds
[2] [ReferenceModel-2/4] 2025-10-10 14:07:44 INFO [GC] Performing periodic GC collection took 0.02 seconds
[3] [ReferenceModel-3/4] 2025-10-10 14:07:44 INFO [GC] Performing periodic GC collection took 0.02 seconds
[3] [ReferenceModel-3/4] 2025-10-10 14:07:46 INFO [GC] Performing periodic GC collection took 0.01 seconds
[1] [ReferenceModel-1/4] 2025-10-10 14:07:46 INFO [GC] Performing periodic GC collection took 0.01 seconds
[2] [ReferenceModel-2/4] 2025-10-10 14:07:46 INFO [GC] Performing periodic GC collection took 0.01 seconds
[0] [ReferenceModel-0/4] 2025-10-10 14:07:46 INFO [GC] Performing periodic GC collection took 0.01 seconds
[2] [ReferenceModel-2/4] 2025-10-10 14:07:47 INFO [GC] Performing periodic GC collection took 0.01 seconds
[0] [ReferenceModel-0/4] 2025-10-10 14:07:47 INFO [GC] Performing periodic GC collection took 0.01 seconds
[3] [ReferenceModel-3/4] 2025-10-10 14:07:47 INFO [GC] Performing periodic GC collection took 0.01 seconds
[1] [ReferenceModel-1/4] 2025-10-10 14:07:47 INFO [GC] Performing periodic GC collection took 0.01 seconds
[1] [ReferenceModel-1/4] 2025-10-10 14:07:49 INFO [GC] Performing periodic GC collection took 0.01 seconds
[0] [ReferenceModel-0/4] 2025-10-10 14:07:49 INFO [GC] Performing periodic GC collection took 0.01 seconds
[3] [ReferenceModel-3/4] 2025-10-10 14:07:49 INFO [GC] Performing periodic GC collection took 0.01 seconds
[2] [ReferenceModel-2/4] 2025-10-10 14:07:49 INFO [GC] Performing periodic GC collection took 0.01 seconds
[0] [ReferenceModel-0/4] 2025-10-10 14:07:54 INFO [GC] Performing periodic GC collection took 0.02 seconds
[1] [ReferenceModel-1/4] 2025-10-10 14:07:54 INFO [GC] Performing periodic GC collection took 0.01 seconds
[2] [ReferenceModel-2/4] 2025-10-10 14:07:54 INFO [GC] Performing periodic GC collection took 0.02 seconds
[3] [ReferenceModel-3/4] 2025-10-10 14:07:54 INFO [GC] Performing periodic GC collection took 0.02 seconds
[0] [ReferenceModel-0/4] 2025-10-10 14:07:59 INFO [GC] Performing periodic GC collection took 0.02 seconds
[3] [ReferenceModel-3/4] 2025-10-10 14:07:59 INFO [GC] Performing periodic GC collection took 0.02 seconds
[2] [ReferenceModel-2/4] 2025-10-10 14:07:59 INFO [GC] Performing periodic GC collection took 0.02 seconds
[1] [ReferenceModel-1/4] 2025-10-10 14:07:59 INFO [GC] Performing periodic GC collection took 0.02 seconds
[0] INFO 10-10 14:08:00 [block_pool.py:321] Successfully reset prefix cache
[0] [Policy-0/1] 2025-10-10 14:08:00 INFO Weight update completed (now v3)
Dropping weights @ version 2
Dropped weights @ version 2, took 5.83 seconds
WandbBackend: Logged 101 metrics at global_step 3
=== [global_logger_2aeC_r0] - METRICS STEP 3 ===
  buffer/add/count_episodes_added: 786.0
  buffer/evict/avg_policy_staleness: 1.0
  buffer/evict/max_policy_staleness: 1.0
  buffer/evict/sum_episodes_evicted: 784.0
  buffer/sample/avg_buffer_utilization: 592.0
  buffer/sample/avg_buffer_utilization_pct: 5.405405405405405
  buffer/sample/count_sample_requests: 1.0
  buffer_perf/sample/total_duration_avg_s: 0.0055667751003056765
  buffer_perf/sample/total_duration_max_s: 0.0055667751003056765
  dataset/sample/avg_sample_len: 515.0816326530612
  dataset/sample/count_samples_generated: 49.0
  main/continuous_rollouts/count_rollout_iterations: 49.0
  main_perf/continuous_rollouts/compute_logprobs/duration_avg_s: 5.996471974162423e-05
  main_perf/continuous_rollouts/compute_logprobs/duration_max_s: 0.00010304898023605347
  main_perf/continuous_rollouts/data_loading/duration_avg_s: 0.0013156006996500858
  main_perf/continuous_rollouts/data_loading/duration_max_s: 0.0015002100262790918
  main_perf/continuous_rollouts/policy_generation/duration_avg_s: 137.51652606647957
  main_perf/continuous_rollouts/policy_generation/duration_max_s: 199.73020377987996
  main_perf/continuous_rollouts/reference_model_calculate_logprobs/duration_avg_s: 16.182050521685078
  main_perf/continuous_rollouts/reference_model_calculate_logprobs/duration_max_s: 36.03378381417133
  main_perf/continuous_rollouts/reward_evaluation/duration_avg_s: 0.38161538480020757
  main_perf/continuous_rollouts/reward_evaluation/duration_max_s: 1.0126315460074693
  main_perf/continuous_rollouts/total_duration_avg_s: 154.18965516618587
  main_perf/continuous_rollouts/total_duration_max_s: 213.60761165595613
  main_perf/continuous_training/drop_weights/duration_avg_s: 5.830546634970233
  main_perf/continuous_training/drop_weights/duration_max_s: 5.830546634970233
  main_perf/continuous_training/push_weights/duration_avg_s: 14.282035800861195
  main_perf/continuous_training/push_weights/duration_max_s: 14.282035800861195
  main_perf/continuous_training/total_duration_avg_s: 224.97223744285293
  main_perf/continuous_training/total_duration_max_s: 224.97223744285293
  main_perf/continuous_training/train_step/duration_avg_s: 7.712055152049288
  main_perf/continuous_training/train_step/duration_max_s: 7.712055152049288
  main_perf/continuous_training/update_weights/duration_avg_s: 197.13718346506357
  main_perf/continuous_training/update_weights/duration_max_s: 197.13718346506357
  main_perf/continuous_training/waiting_for_buffer/duration_avg_s: 0.01041399291716516
  main_perf/continuous_training/waiting_for_buffer/duration_max_s: 0.01041399291716516
  policy/generate/avg_tokens_generated: 800.0061274509804
  policy/generate/count_requests: 42.0
  policy/generate/count_sequences_completed: 816.0
  policy/generate/sum_tokens_generated: 652805.0
  policy/update_weights/count_weight_updates: 2.0
  policy_perf/generate/end/duration_avg_s: 2.7680000057443975e-06
  policy_perf/generate/end/duration_max_s: 2.816000021994114e-06
  policy_perf/generate/generate/duration_avg_s: 74.18627746332908
  policy_perf/generate/generate/duration_max_s: 199.69165625
  policy_perf/generate/process_inputs/duration_avg_s: 0.0009526073742457029
  policy_perf/generate/process_inputs/duration_max_s: 0.002440608024597168
  policy_perf/generate/prompt_truncation/duration_avg_s: 1.6781176540854517e-05
  policy_perf/generate/prompt_truncation/duration_max_s: 6.841599941253662e-05
  policy_perf/generate/total_duration_avg_s: 72.07658744781732
  policy_perf/generate/total_duration_max_s: 199.6942355780003
  policy_perf/update_weights/avg_pending_requests: 7.0
  policy_perf/update_weights/max_pending_requests: 8.0
  policy_worker_perf/update/total_duration_avg_s: 161.0851328125
  policy_worker_perf/update/total_duration_max_s: 180.38240625
  reference_perf/forward/avg_sequence_length: 2048.0
  reference_perf/forward/compute_logprobs/duration_avg_s: 0.026915128278732294
  reference_perf/forward/compute_logprobs/duration_max_s: 0.02828870391845703
  reference_perf/forward/count_forward_passes: 200.0
  reference_perf/forward/forward/duration_avg_s: 1.6151581799316403
  reference_perf/forward/forward/duration_max_s: 1.768312744140625
  reference_perf/forward/garbage_collection/duration_avg_s: 0.014899226388931274
  reference_perf/forward/garbage_collection/duration_max_s: 0.017250463485717772
  reference_perf/forward/memory_delta_end_start_avg_gb: 9.273681640625
  reference_perf/forward/memory_peak_max_gb: 47.74769067764282
  reference_perf/forward/to_device/duration_avg_s: 0.00029641920007765293
  reference_perf/forward/to_device/duration_max_s: 0.0004175359904766083
  reference_perf/forward/total_duration_avg_s: 1.657271959719452
  reference_perf/forward/total_duration_max_s: 1.811123239449691
  reward/evaluate_response/avg_MathReward_reward: 0.6261786600496322
  reward/evaluate_response/avg_ThinkingReward_reward: 0.7697270471464038
  reward/evaluate_response/avg_total_reward: 0.6979528535980215
  reward/evaluate_response/count_MathReward_calls: 806.0
  reward/evaluate_response/count_ThinkingReward_calls: 806.0
  reward/evaluate_response/std_MathReward_reward: 0.45319055073623077
  reward/evaluate_response/std_ThinkingReward_reward: 0.3622053691857174
  reward/evaluate_response/sum_MathReward_reward: 504.70000000000357
  reward/evaluate_response/sum_ThinkingReward_reward: 620.4000000000015
  rl_trainer/avg_grpo_loss: 0.16886521875858307
  rl_trainer/count_training_steps: 8.0
  rl_trainer/learning_rate: 0.001
  rl_trainer_perf/push_weights/dcp_save/duration_avg_s: 14.125432128906251
  rl_trainer_perf/push_weights/dcp_save/duration_max_s: 14.2749453125
  rl_trainer_perf/push_weights/flatten_state_dict/duration_avg_s: 0.0010219000056385995
  rl_trainer_perf/push_weights/flatten_state_dict/duration_max_s: 0.001112671971321106
  rl_trainer_perf/push_weights/memory_delta_end_start_avg_gb: 0.0
  rl_trainer_perf/push_weights/memory_peak_max_gb: 22.953386306762695
  rl_trainer_perf/push_weights/to_hf/duration_avg_s: 0.001430628001689911
  rl_trainer_perf/push_weights/to_hf/duration_max_s: 0.001469920039176941
  rl_trainer_perf/push_weights/total_duration_avg_s: 14.127981416915546
  rl_trainer_perf/push_weights/total_duration_max_s: 14.27768739247322
  rl_trainer_perf/step/forward_backward/duration_avg_s: 7.619148498535156
  rl_trainer_perf/step/forward_backward/duration_max_s: 7.62258251953125
  rl_trainer_perf/step/memory_delta_end_start_avg_gb: 0.0008854866027832031
  rl_trainer_perf/step/memory_peak_max_gb: 68.14654541015625
  rl_trainer_perf/step/optimizer_step/duration_avg_s: 0.03404809999465942
  rl_trainer_perf/step/optimizer_step/duration_max_s: 0.034347774505615235
  rl_trainer_perf/step/save_checkpoint/duration_avg_s: 0.003174496024847031
  rl_trainer_perf/step/save_checkpoint/duration_max_s: 0.003422719955444336
  rl_trainer_perf/step/total_duration_avg_s: 7.656416046554223
  rl_trainer_perf/step/total_duration_max_s: 7.659922744065523
==============================

A few challenges:

  • We enable garbage collection per step in torchtitan (both trainer and reference model), but it ideally shouldn't be logged at all.
  • We have very comprehensive observability! but I think we can minimize the number of metrics that are being logged both in wandb and console. What is difficult is that I think all of the metrics will be useful at some point - but if I'm analyzing the training stability and system throughput, here is a starting proposal:
=== [global_logger_2aeC_r0] - METRICS STEP 3 ===
  buffer/add/count_episodes_added: 786.0
  buffer/evict/avg_policy_staleness: 1.0
  buffer/evict/max_policy_staleness: 1.0
  buffer/evict/sum_episodes_evicted: 784.0
  main/continuous_rollouts/count_rollout_iterations: 49.0
  main_perf/continuous_rollouts/policy_generation/duration_avg_s: 137.51652606647957
  main_perf/continuous_rollouts/policy_generation/duration_max_s: 199.73020377987996
  main_perf/continuous_rollouts/reference_model_calculate_logprobs/duration_avg_s: 16.182050521685078
  main_perf/continuous_rollouts/reference_model_calculate_logprobs/duration_max_s: 36.03378381417133
  main_perf/continuous_rollouts/total_duration_avg_s: 154.18965516618587
  main_perf/continuous_rollouts/total_duration_max_s: 213.60761165595613
  main_perf/continuous_training/drop_weights/duration_avg_s: 5.830546634970233
  main_perf/continuous_training/drop_weights/duration_max_s: 5.830546634970233
  main_perf/continuous_training/push_weights/duration_avg_s: 14.282035800861195
  main_perf/continuous_training/push_weights/duration_max_s: 14.282035800861195
  main_perf/continuous_training/total_duration_avg_s: 224.97223744285293
  main_perf/continuous_training/total_duration_max_s: 224.97223744285293
  main_perf/continuous_training/train_step/duration_avg_s: 7.712055152049288
  main_perf/continuous_training/train_step/duration_max_s: 7.712055152049288
  main_perf/continuous_training/update_weights/duration_avg_s: 197.13718346506357
  main_perf/continuous_training/update_weights/duration_max_s: 197.13718346506357
  main_perf/continuous_training/waiting_for_buffer/duration_avg_s: 0.01041399291716516
  main_perf/continuous_training/waiting_for_buffer/duration_max_s: 0.01041399291716516
  policy/generate/avg_tokens_generated: 800.0061274509804
  policy/generate/count_requests: 42.0
  policy/generate/count_sequences_completed: 816.0
  policy_perf/generate/generate/duration_avg_s: 74.18627746332908
  policy_perf/generate/generate/duration_max_s: 199.69165625
  policy_perf/generate/total_duration_avg_s: 72.07658744781732
  policy_perf/generate/total_duration_max_s: 199.6942355780003
  policy_worker_perf/update/total_duration_avg_s: 161.0851328125
  policy_worker_perf/update/total_duration_max_s: 180.38240625
  reference_perf/forward/memory_delta_end_start_avg_gb: 9.273681640625
  reference_perf/forward/memory_peak_max_gb: 47.74769067764282
  reference_perf/forward/total_duration_avg_s: 1.657271959719452
  reference_perf/forward/total_duration_max_s: 1.811123239449691
  reward/evaluate_response/avg_MathReward_reward: 0.6261786600496322
  reward/evaluate_response/avg_ThinkingReward_reward: 0.7697270471464038
  reward/evaluate_response/avg_total_reward: 0.6979528535980215
  reward/evaluate_response/std_MathReward_reward: 0.45319055073623077
  reward/evaluate_response/std_ThinkingReward_reward: 0.3622053691857174
  reward/evaluate_response/sum_MathReward_reward: 504.70000000000357
  reward/evaluate_response/sum_ThinkingReward_reward: 620.4000000000015
  rl_trainer/avg_grpo_loss: 0.16886521875858307
  rl_trainer/count_training_steps: 8.0
  rl_trainer/learning_rate: 0.001
  rl_trainer_perf/push_weights/dcp_save/duration_avg_s: 14.125432128906251
  rl_trainer_perf/push_weights/dcp_save/duration_max_s: 14.2749453125
  rl_trainer_perf/push_weights/to_hf/duration_avg_s: 0.001430628001689911
  rl_trainer_perf/push_weights/to_hf/duration_max_s: 0.001469920039176941
  rl_trainer_perf/push_weights/total_duration_avg_s: 14.127981416915546
  rl_trainer_perf/push_weights/total_duration_max_s: 14.27768739247322
  rl_trainer_perf/step/forward_backward/duration_avg_s: 7.619148498535156
  rl_trainer_perf/step/forward_backward/duration_max_s: 7.62258251953125
  rl_trainer_perf/step/memory_delta_end_start_avg_gb: 0.0008854866027832031
  rl_trainer_perf/step/memory_peak_max_gb: 68.14654541015625
  rl_trainer_perf/step/save_checkpoint/duration_avg_s: 0.003174496024847031
  rl_trainer_perf/step/save_checkpoint/duration_max_s: 0.003422719955444336
  rl_trainer_perf/step/total_duration_avg_s: 7.656416046554223
  rl_trainer_perf/step/total_duration_max_s: 7.659922744065523
==============================

idk if we can or should introduce the concept of "metric verbosity" or something like that, but overall it's a bit difficult to find the information I'm looking for

Versions

No response

Metadata

Metadata

Labels

Better EngineeringTasks which help improve eng productivity e.g. building tools, cleaning up code, writing docs

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions