In Manual Optimization mode, loss does not converge when the Train Batch Size is 1 #14526

OPilgrim · 2022-09-04T04:22:10Z

OPilgrim
Sep 4, 2022

This is very confusing that if the train batch size is set to 1 and Automatic Optimization mode is used, accumulate_grad_batches=32, train loss can converge normally. However, using Manual Optimization mode with automatic_optimization=False, the train loss fails to converge. Is there any other code I'm ignoring in PyTorch Lightning?

The code used is as follows:

     def training_step(self, batch, batch_idx, optimizer_idx=None):
            opt = self.optimizers()
            loss = gen_helper()
            self.log('train/loss', loss, prog_bar=True, logger=True)
            self.manual_backward(loss)
            if (batch_idx+1) % self.hparams.accumulate_grad_batches == 0:
                opt.step()
                opt.zero_grad()

Answered by akihironitta

Sep 4, 2022

@OPilgrim Your training_step looks good to me.

However, please note that your logged value "train/loss" is being logged regardless of whether the weights are being updated in the iteration, which will make it look like a convergence issue.

View full answer

akihironitta · 2022-09-04T12:56:45Z

akihironitta
Sep 4, 2022

@OPilgrim Your training_step looks good to me.

However, please note that your logged value "train/loss" is being logged regardless of whether the weights are being updated in the iteration, which will make it look like a convergence issue.

1 reply

OPilgrim Sep 5, 2022
Author

Thank you very much for your reply. By the way, I noticed that I did not call lr_scheduler.step() during the training. I think this is the reason why loss is abnormal. After correction, loss looks much more normal.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

In Manual Optimization mode, loss does not converge when the Train Batch Size is 1 #14526

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 1 comment 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

In Manual Optimization mode, loss does not converge when the Train Batch Size is 1 #14526

Uh oh!

Uh oh!

OPilgrim Sep 4, 2022

Replies: 1 comment · 1 reply

Uh oh!

akihironitta Sep 4, 2022

Uh oh!

OPilgrim Sep 5, 2022 Author

OPilgrim
Sep 4, 2022

Replies: 1 comment 1 reply

akihironitta
Sep 4, 2022

OPilgrim Sep 5, 2022
Author