How to implement Linear Probing for first N epochs and then switch to fine-tuning? #12488

konradkalita · 2022-03-28T10:43:49Z

konradkalita
Mar 28, 2022

Hello, I’m thinking how I should implement a training technique from Fine-Tuning can Distort Pretrained Features and Underperform Out-of-Distribution paper. Essentially what authors describe is to freeze all model weights except softmax layer for beginning of training and after that switch to fine-tuning. I’m working on BERT-like models from transformers. Should I create 2 separate optimizers and change them after N epochs? Also how I could do this switch to fine-tuning gradual (let’s say every epoch unfreeze 1 top layer from transformer)?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

How to implement Linear Probing for first N epochs and then switch to fine-tuning? #12488

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

How to implement Linear Probing for first N epochs and then switch to fine-tuning? #12488

Uh oh!

konradkalita Mar 28, 2022

Replies: 0 comments

konradkalita
Mar 28, 2022