You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Add logging for learning rates in MetricsProcessor (#1413)
This PR adds learning rate logging. There was a previous attempt to
implement this in an [earlier
PR](#937), but that one was
ultimately **closed**. This version ensures that LR logging works
properly, I verified it using the WSD scheduler that was recently added
in [another PR](#938).
<img width="1842" height="730" alt="image"
src="https://github.com/user-attachments/assets/8f23674a-d689-4cc2-9d9b-30bff4e63f3b"
/>
One design consideration here is that torchtitan supports multiple
optimizers and learning rate schedules, each potentially having its own
LR. However, in practice, I believe that 99.9999% of use cases will use
a single LR.
Given that, the logging works as follows:
- If there is only one learning rate, it gets logged directly under the
main charts as `lr`.
- If there are multiple learning rates, they are logged under a separate
section, each with its corresponding label.
Alternatively, we could have ignored the multi-LR case and always logged
a single LR, but I prefer this approach since it handles both scenarios
robustly with minimal extra code.
Happy to adjust if others have a strong preference for simplicity over
robustness.
0 commit comments