In Manual Optimization mode, loss does not converge when the Train Batch Size is 1 #14526
-
This is very confusing that if the train batch size is set to 1 and Automatic Optimization mode is used, accumulate_grad_batches=32, train loss can converge normally. However, using Manual Optimization mode with automatic_optimization=False, the train loss fails to converge. Is there any other code I'm ignoring in PyTorch Lightning? The code used is as follows: def training_step(self, batch, batch_idx, optimizer_idx=None):
opt = self.optimizers()
loss = gen_helper()
self.log('train/loss', loss, prog_bar=True, logger=True)
self.manual_backward(loss)
if (batch_idx+1) % self.hparams.accumulate_grad_batches == 0:
opt.step()
opt.zero_grad() |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 1 reply
-
@OPilgrim Your However, please note that your logged value |
Beta Was this translation helpful? Give feedback.
@OPilgrim Your
training_step
looks good to me.However, please note that your logged value
"train/loss"
is being logged regardless of whether the weights are being updated in the iteration, which will make it look like a convergence issue.