Learning Rate for whisper-large v3 fine-tuning #2012
Replies: 2 comments
-
No one is helping? |
Beta Was this translation helpful? Give feedback.
-
Hello, I hope you are doing well. In this case, as you have plenty of data to train, I will suggest a smaller learning rate like 1e-6 or even 1e-7, however, with more epochs of training. Also, the train loss curves will always be decreasing; what you actually want is the curves from validation, which are the ones stating how well your model will perform on unseen data. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
I am fine-tuning whisper-large v3 and I wonder what was the learning rate used for pre-training this version of the model to help me choose the right value of LR in fine-tuning process, however this is my setup, I have about 3300 hours of training data and here are some experiments:
Train/Loss curve Using LR=5e-6 (with linear schedular) and batch-size=128 for first 0.6 epoch:
Train/Loss curve Using LR=4e-6 (with linear schedular) and batch-size=256 for first 1.0 epoch:
which setup do you think is better?
Beta Was this translation helpful? Give feedback.
All reactions