Gradient accumulation + DeepSpeed LR scheduler #11686
Answered
by
rohitgr7
gahdritz
asked this question in
Lightning Trainer API: Trainer, LightningModule, LightningDataModule
-
How does gradient accumulation interact with DeepSpeed learning rate scheduling (e.g. the per-step warm-up scheduler)? Is the learning rate updated after every iteration, or only after the model weights are ultimately updated? |
Beta Was this translation helpful? Give feedback.
Answered by
rohitgr7
Feb 1, 2022
Replies: 1 comment
-
it considers the accumulation before doing lr_scheduler_step: |
Beta Was this translation helpful? Give feedback.
0 replies
Answer selected by
gahdritz
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
it considers the accumulation before doing lr_scheduler_step:
https://github.com/PyTorchLightning/pytorch-lightning/blob/86b177ebe5427725b35fde1a8808a7b59b8a277a/pytorch_lightning/loops/epoch/training_epoch_loop.py#L387-L390