unexpected behavior in automatic optimization #8516

nik-sm · 2021-07-21T21:20:40Z

nik-sm
Jul 21, 2021

Hi - I'm seeing unexpected behavior with automatic optimization.
I have two nn.Modules in my model, and in order to understand the effect of that second model, I'm trying some experiments with and without it.
The setup is something like this:

def configure_optimizers(self):
    if self.secondary_model:
        return [Adam(self.main_model.parameters()), Adam(self.secondary_model.parameters()]
    else:
        return Adam(self.main_model.parameters())

def training_step(self, batch, batch_idx, optimizer_idx=0):
    data, labels = batch
    preds = self.main_model(data)
    loss1 = f(preds, labels)
    if self.secondary_model and optimizer_idx > 0:
        loss2 = g(self.secondary_model(preds), labels)
        return alpha * loss2
    else:
        return loss1

Observation 1: the code runs significantly faster with the second model included, which surprises me.
The secondary model is smaller, so if using two optimizers meant that 1/2 of gradient steps were only using this smaller model, then I could understand.
However the algorithm described in the docs at https://pytorch-lightning.readthedocs.io/en/latest/common/optimizers.html#automatic-optimization) should result in the computation being equal or greater (and never less):

In the case of multiple optimizers, Lightning does the following:

 for epoch in epochs:
     for batch in data:
         for opt in optimizers:
             loss = model.training_step(batch, batch_idx, optimizer_idx)
             opt.zero_grad()
             loss.backward()
             opt.step()
     for lr_scheduler in lr_schedulers:
         lr_scheduler.step()

Observation 2: if I set that coefficient alpha=0 , the main model's performance is NOT identical.
Based on the pseudocode from the docs above, the main model should see all the same batches and perform exactly the same number of gradient steps.
I am using deterministic=True in the trainer object, and setting seed at beginning of my script with seed_everything(1, workers=True).
There is no randomness in the secondary model, so random seed should not be affected by that inner for opt in optimizers loop.

Am I misunderstanding how automatic optimization works, or is there something else that might be affecting me here?

Thanks in advance!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

unexpected behavior in automatic optimization #8516

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

unexpected behavior in automatic optimization #8516

Uh oh!

nik-sm Jul 21, 2021

Replies: 0 comments

nik-sm
Jul 21, 2021