Why accumulate_grad_batches cannot be used with manual optimization? #10998
-
I've stumbled upon the problem of not being able to use However, I think it would be possible to implement something that would "store" calls to the My question: is there a reason for such incompatibility of One reason might be the need to |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 3 replies
-
gradient accumulation revolves around |
Beta Was this translation helpful? Give feedback.
-
Hey @NathanGodey, manual optimization was built to provide full control optimization control to the user while abstracting distributed training and precision. There is no way Lightning can automate properly accumulate grad batches for all the possible use cases and therefore isn't supported. However, you can easily implement it by not calling zero_grad, step every n batches. |
Beta Was this translation helpful? Give feedback.
Hey @NathanGodey,
manual optimization was built to provide full control optimization control to the user while abstracting distributed training and precision.
There is no way Lightning can automate properly accumulate grad batches for all the possible use cases and therefore isn't supported.
However, you can easily implement it by not calling zero_grad, step every n batches.