Loading fine-tuned model built from pretrained subnetworks #10152

adrienchaton · 2021-10-26T12:57:50Z

adrienchaton
Oct 26, 2021

Hello everyone,

I would like to ask for confirmation if I get the expected behaviour please and if there would be best practices to handle the following situation.

I have two LightningModule that I call e.g. model_1 and model_2, which I pretrain separately. After saving them, I get ckpt_1,yaml_1 and ckpt_2,yaml_2 which describe their trained parameters and hyper-parameters.

Now I put them together in a model e.g. combined_model and I fine-tune them on the task of the combined_model.

class combined_model(pl.LightningModule):
    def init(self,ckpt_1="",yaml_1="",ckpt_2="",yaml_2="",…):
        super().init()
        self.model_1 = model_1.load_from_checkpoint(checkpoint_path=ckpt_1,hparams_file=yaml_1,map_location=‘cpu’)
        self.model_2 = model_2.load_from_checkpoint(checkpoint_path=ckpt_2,hparams_file=yaml_2,map_location=‘cpu’)
…

At the beginning of the fine-tuning I build the model as:

combined_model(ckpt_1=ckpt_1,yaml_1=yaml_1,ckpt_2=ckpt_2,yaml_2=yaml_2,…)

→ combined_model optimizes the trainable parameters of model_1 and model_2, starting from the pretrained checkpoints, right ?

After the fine-tuning is done, I have ckpt_3 and yaml_3 which give the fine-tuned parameters and the destinations of the pretrained checkpoints used to build combined_model.

Usually I could just restore the fine-tuned model as

combined_model.load_from_checkpoint(checkpoint_path=ckpt_3,hparams_file=yaml_3)

The problem I have is working with remote servers, the paths change in between the fine-tuning run and another test run so in the end yaml_3 point to wrong paths for ckpt_1,yaml_1 and ckpt_2,yaml_2 when I want to restore the fine-tuned combined_model.

What I do then is that I manually specify these new paths ckpt_1bis,yaml_1bis and ckpt_2bis,yaml_2bis in

combined_model.load_from_checkpoint(checkpoint_path=ckpt_3,hparams_file=yaml_3,ckpt_1=ckpt_1bis,yaml_1=yaml_1bis,ckpt_2=ckpt_2bis,yaml_2=yaml_2bis)

→ in this case, am I for sure properly loading the fine-tuned weights of ckpt_3 and not the pretrained weights of ckpt_1bis and ckpt_2bis ?

I think so but I would like to be sure and also, are there any recommended ways to better handle this situation please ?

Thanks !

Programmer-RD-AI · 2021-10-28T03:34:15Z

Programmer-RD-AI
Oct 28, 2021

https://pytorch-lightning.readthedocs.io/en/latest/common/weights_loading.html

0 replies

Programmer-RD-AI · 2021-10-28T03:34:45Z

Programmer-RD-AI
Oct 28, 2021

https://forums.pytorchlightning.ai/t/save-load-model-for-inference/542

0 replies

Programmer-RD-AI · 2021-10-28T03:34:51Z

Programmer-RD-AI
Oct 28, 2021

https://forums.pytorchlightning.ai/t/how-to-load-and-use-model-checkpoint-ckpt/677

0 replies

Programmer-RD-AI · 2021-10-28T03:35:12Z

Programmer-RD-AI
Oct 28, 2021

https://forums.pytorchlightning.ai/t/hparams-not-restored-when-using-load-from-checkpoint-default-argument-values-are-the-problem/237

0 replies

Programmer-RD-AI · 2021-10-28T03:35:17Z

Programmer-RD-AI
Oct 28, 2021

https://forums.pytorchlightning.ai/t/saving-loading-lightningmodule-with-injected-network/394

0 replies

Programmer-RD-AI · 2021-10-28T03:35:23Z

Programmer-RD-AI
Oct 28, 2021

https://forums.pytorchlightning.ai/t/saving-loading-the-model-for-inference-later/589

0 replies

Programmer-RD-AI · 2021-10-28T03:35:35Z

Programmer-RD-AI
Oct 28, 2021

#3096

2 replies

awaelchli Oct 28, 2021

Hello @Programmer-RD-AI
I am part of the core team of Lightning.

Thank for trying to help out here. Unfortunately your answers are not helpful as they are right now, because they don't contain much context. It is considered "spam" from our point of view. Here is one way to improve this:

Collect the links into one post. Add context to each how you think they might help the person resolve their problem. Alternatively, if you or someone/some project you know has the same issue as reported here, you can add to the discussion by expanding on it, proposing ideas/changes etc.

Thank you for being more respectful in the future <3

Programmer-RD-AI Oct 29, 2021

Sorry,

I am new to contributing so I don't know much.

I am learning.

Thank you for your recommendation

adrienchaton · 2021-10-29T07:03:03Z

adrienchaton
Oct 29, 2021
Author

Hi @Programmer-RD-AI @awaelchli

Thanks for the answers, I have read before the Lightning documentation about the basic usage of loading functions for pre-trained model checkpoints. This didn't answer my current problem and within the links listed, the closest discussion to mine is https://forums.pytorchlightning.ai/t/saving-loading-lightningmodule-with-injected-network/394

However this discussion was not answered and I am still looking for best practices on this matter, i.e. handling model injection in the Lightning module class and restoring such models.

I would highly appreciate if you could comment on the situation I posted or ask me any additional detail if I didn't expose properly enough the problem.

Thanks !

0 replies

awaelchli · 2021-10-29T07:38:45Z

awaelchli
Oct 29, 2021

@adrienchaton
First of all, if you didn't know, the yaml file is optional. The hyperparameters can be saved to the checkpoint automatically using self.save_hyperparameters(). This should already make it easier for you as you don't have to handle additional paths to files.

If you add self.save_hyperparameters() to the init of your Model1 and Model2, you will be able to simply reload it like so:

Model1.load_from_checkpoint(ckpt_1)

One less path to worry about :)

The problem I have is working with remote servers, the paths change in between the fine-tuning run and another test run so in the end yaml_3 point to wrong paths for ckpt_1,yaml_1 and ckpt_2,yaml_2 when I want to restore the fine-tuned combined_model.

I don't have a good answer how to get the path of the right yaml file, but you could use the checkpoint path from the trainer:

trainer.checkpoint_callback.best_model_path
or
trainer.checkpoint_callback.last_model_path

So together:

# trainer model1
...
ckpt1 = trainer.checkpoint_callback.best_model_path

# trainer model2
...
ckpt2 = trainer.checkpoint_callback.best_model_path

# train combined model

model = CombinedModel(ckpt1, ckpt2)
...
ckpt3 = trainer.checkpoint_callback.best_model_path

Let me know if that's useful for you.

3 replies

adrienchaton Nov 3, 2021
Author

@awaelchli thank you for your answer, I knew about the optional yaml as I use already

self.save_hyperparameters()

but I didn't know about the

checkpoint_callback..best_model_path or trainer.checkpoint_callback.last_model_path

That's useful for later but now the problem is probably to take care of myself manually as using scp between machines breaks the paths used when training.

awaelchli Nov 3, 2021

Is it because you are using absolute paths or the logging folder does not match locally and remotely? There are a couple of options.

Create a ModelCheckpoint instance and give it a filepath to a fixed directory with the name of your experiment. Everytime you run the experiment, give it a name.
Sync the whole folder to your local machine instead of individual files. Then use relative filepaths to reference the checkpoint files relative to that folder. Syncinc the folder can be done with the rsync command.

Let me know how it goes

adrienchaton Nov 10, 2021
Author

@awaelchli thank you and good pick ! this can be fixed with ensuring that all outputs are in relative paths inside the project, I will stick to that now

Loading fine-tuned model built from pretrained subnetworks #10152

Uh oh!

Uh oh!

Replies: 9 comments · 5 replies

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

adrienchaton Oct 29, 2021 Author

Uh oh!

Uh oh!

Uh oh!

adrienchaton Nov 3, 2021 Author

Uh oh!

Uh oh!

adrienchaton Nov 10, 2021 Author

Replies: 9 comments 5 replies

adrienchaton
Oct 29, 2021
Author

adrienchaton Nov 3, 2021
Author

adrienchaton Nov 10, 2021
Author