unexpected behavior in automatic optimization #8516
Unanswered
nik-sm
asked this question in
Lightning Trainer API: Trainer, LightningModule, LightningDataModule
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hi - I'm seeing unexpected behavior with automatic optimization.
I have two
nn.Modules
in my model, and in order to understand the effect of that second model, I'm trying some experiments with and without it.The setup is something like this:
Observation 1: the code runs significantly faster with the second model included, which surprises me.
The secondary model is smaller, so if using two optimizers meant that 1/2 of gradient steps were only using this smaller model, then I could understand.
However the algorithm described in the docs at https://pytorch-lightning.readthedocs.io/en/latest/common/optimizers.html#automatic-optimization) should result in the computation being equal or greater (and never less):
Observation 2: if I set that coefficient
alpha=0
, the main model's performance is NOT identical.Based on the pseudocode from the docs above, the main model should see all the same batches and perform exactly the same number of gradient steps.
I am using
deterministic=True
in the trainer object, and setting seed at beginning of my script withseed_everything(1, workers=True)
.There is no randomness in the secondary model, so random seed should not be affected by that inner
for opt in optimizers loop
.Am I misunderstanding how automatic optimization works, or is there something else that might be affecting me here?
Thanks in advance!
Beta Was this translation helpful? Give feedback.
All reactions