Error with loading model checkpoint #12399
-
Hi everyone. I was recently running a lightning model and saved a checkpoint to store the intermediate results. When I try to open the checkpoint, I get an error that positional arguments (used to initialize the lightning module) are not present. This wouldn't be a big deal but one of the positional arguments is the encoder (used for BarlowTwins training). I was worried if I loaded the model checkpoint with an encoder initialized with starting weights, this would overwrite the weight parameters stored in the checkpoint. See the error log and a block of code below. Any suggestions on how I can appropriately load this stored model to resume training?
original model loaded with:
|
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 3 replies
-
hey @dmandair ! did you call also note that, if you are passing an |
Beta Was this translation helpful? Give feedback.
hey @dmandair !
did you call
self.save_hyperparameters()
inside yourLM.__init__
? else hyperparameters won't be saved inside the checkpoint and you might need to provide them again usingLMModel.load_from_checkpoint(..., encoder=encoder, encoder_out_dim=encoder_out_dim, ...)
.also note that, if you are passing an
nn.Module
inside your LM and callingself.save_hyperparameters()
, it will save that too inside your hparams, which is not a good thing considering that nn.Modules are saved inside checkpoint state_dict and might create issues for you. Ideally, you should ignore them usingself.save_hyperparameters(ignore=['encoder'])
. Check out this PR: #12068