Fine tune a pre-trained model twice #12560

zorikg · 2022-04-01T08:11:08Z

zorikg
Apr 1, 2022

Hello,

I am using a pre-trained model form Hugging Face and would like to fine tune it on dataset A, save the checkpoint, load it and continue to fine tune it on dataset B. I am able to easily do step A (dataset A) but cannot figure our what is the best way to do step B.

My code looks like this:

class QaLongformer(LightningModule):
    def __init__(self, model_name_or_path, <more_args>):
        self.save_hyperparameters()
        if Path(self.hparams.model_name_or_path).is_file():
            qa_longformer = MyModel.load_from_checkpoint(self.hparams.model_name_or_path)
            model = qa_longformer.model
            tokenizer = qa_longformer.tokenizer
        else:
            model = LongformerForQuestionAnswering.from_pretrained(self.hparams.model_name_or_path)
            tokenizer = LongformerForQuestionAnswering.from_pretrained(self.hparams.model_name_or_path)

On step A I pass model_name_or_path='allenai/longformer-base-4096' and it works great. Then on step B I pass my checkpoint path as model_name_or_path and I get this message:

model_name_or_path is file: False (allenai/longformer-base-4096)
Some weights of the model checkpoint at allenai/longformer-base-4096 were not used when initializing LongformerForQuestionAnswering: ['lm_head.decoder.weight', 'lm_head.layer_norm.weight', 'lm_head.layer_norm.bias', 'lm_head.dense.bias', 'lm_head.bias', 'lm_head.dense.weight']
- This IS expected if you are initializing LongformerForQuestionAnswering from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing LongformerForQuestionAnswering from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of LongformerForQuestionAnswering were not initialized from the model checkpoint at allenai/longformer-base-4096 and are newly initialized: ['qa_outputs.weight', 'qa_outputs.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.

Also it seems that load_from_checkpoint calls the Ctor of QaLongformer with model_name_or_path='allenai/longformer-base-4096' and loads the pre-trained model. When I observe the training, it seems to be training from scratch.

I could use your help in figuring out what am I doing wrong and what is the best practice for 2 step fine-tuning a pre-trained model.

Thanks!
Zorik.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fine tune a pre-trained model twice #12560

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Fine tune a pre-trained model twice #12560

Uh oh!

zorikg Apr 1, 2022

Replies: 0 comments

zorikg
Apr 1, 2022