Gradient accumulation + DeepSpeed LR scheduler #11686

gahdritz · 2022-02-01T05:24:27Z

gahdritz
Feb 1, 2022

How does gradient accumulation interact with DeepSpeed learning rate scheduling (e.g. the per-step warm-up scheduler)? Is the learning rate updated after every iteration, or only after the model weights are ultimately updated?

Answered by rohitgr7

Feb 1, 2022

it considers the accumulation before doing lr_scheduler_step:
https://github.com/PyTorchLightning/pytorch-lightning/blob/86b177ebe5427725b35fde1a8808a7b59b8a277a/pytorch_lightning/loops/epoch/training_epoch_loop.py#L387-L390

View full answer

rohitgr7 · 2022-02-01T10:30:17Z

rohitgr7
Feb 1, 2022

it considers the accumulation before doing lr_scheduler_step:
https://github.com/PyTorchLightning/pytorch-lightning/blob/86b177ebe5427725b35fde1a8808a7b59b8a277a/pytorch_lightning/loops/epoch/training_epoch_loop.py#L387-L390

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Gradient accumulation + DeepSpeed LR scheduler #11686

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Gradient accumulation + DeepSpeed LR scheduler #11686

Uh oh!

gahdritz Feb 1, 2022

Replies: 1 comment

Uh oh!

rohitgr7 Feb 1, 2022

gahdritz
Feb 1, 2022

rohitgr7
Feb 1, 2022