Why accumulate_grad_batches cannot be used with manual optimization? #10998

NathanGodey · 2021-12-08T16:45:28Z

NathanGodey
Dec 8, 2021

I've stumbled upon the problem of not being able to use accumulate_grad_batches argument in the Trainer as I was doing manual optimization in my LightningModule to use adversarial loss functions.

However, I think it would be possible to implement something that would "store" calls to the step method for the module's optimizers and actually apply them once every accumulate_grad_batches iterations. I've seen several related issues about similar behavours when overriding optimizer_step or close to my use case (#5054, #5108). The proposed fixes always leave some manual get-arounds in the final code.

My question: is there a reason for such incompatibility of accumulate_grad_batches with manual optimization ?

One reason might be the need to step different optimizers at different paces (one every batch, another every n batches ...) but this seems to be an extreme use case.

Answered by tchaton

Dec 10, 2021

Hey @NathanGodey,

manual optimization was built to provide full control optimization control to the user while abstracting distributed training and precision.

There is no way Lightning can automate properly accumulate grad batches for all the possible use cases and therefore isn't supported.

However, you can easily implement it by not calling zero_grad, step every n batches.

View full answer

rohitgr7 · 2021-12-08T18:37:21Z

rohitgr7
Dec 8, 2021

gradient accumulation revolves around loss.backward and optimizer.step calls. Since both of them are not controlled by PL during manual optimization and is up to the user to call it, it can't really be integrated natively for this case.

2 replies

NathanGodey Dec 9, 2021
Author

Yes, that is right. But what I end up doing is storing my own accumulate_grad_batches as a custom attribute of my LightningModule, and using it in the manual optimization process, which looks like an ugly fix. My question is: is there room for improvement in the library here or are there deeper reasons for not changing this ?

NathanGodey Dec 9, 2021
Author

For instance, there could be some kind of buffer regarding step calls to the module's optimizers when both manual optimization and accumulate_grad_batches are required.

tchaton · 2021-12-10T09:04:20Z

tchaton
Dec 10, 2021
Maintainer

Hey @NathanGodey,

manual optimization was built to provide full control optimization control to the user while abstracting distributed training and precision.

There is no way Lightning can automate properly accumulate grad batches for all the possible use cases and therefore isn't supported.

However, you can easily implement it by not calling zero_grad, step every n batches.

1 reply

NathanGodey Dec 10, 2021
Author

Thanks for your answer! That's how I implemented it indeed, the question was more on the reasoning behind this "incompatibility", but as you mentioned, it would be almost impossible to cover all use cases.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Why accumulate_grad_batches cannot be used with manual optimization? #10998

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments 3 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Why accumulate_grad_batches cannot be used with manual optimization? #10998

Uh oh!

NathanGodey Dec 8, 2021

Replies: 2 comments · 3 replies

Uh oh!

rohitgr7 Dec 8, 2021

Uh oh!

NathanGodey Dec 9, 2021 Author

Uh oh!

NathanGodey Dec 9, 2021 Author

Uh oh!

tchaton Dec 10, 2021 Maintainer

Uh oh!

NathanGodey Dec 10, 2021 Author

NathanGodey
Dec 8, 2021

Replies: 2 comments 3 replies

rohitgr7
Dec 8, 2021

NathanGodey Dec 9, 2021
Author

NathanGodey Dec 9, 2021
Author

tchaton
Dec 10, 2021
Maintainer

NathanGodey Dec 10, 2021
Author