Whisper fine-tuning: Validation loss increases but WER is decreasing. #2575
Replies: 2 comments 3 replies
-
Yes, I saw same thing. I think this behavior is actually not unusual when working with Whisper or other sequence-to-sequence models. Situation Recap
At first glance, it might seem contradictory. How can the model be getting worse according to the loss, but better according to WER? Short AnswerYes, this can be perfectly fine — especially if WER is the main metric you care about. Why This Happens1. The loss function is different from your evaluation metric. 2. Training uses teacher forcing, evaluation uses autoregressive decoding. 3. WER is what matters for real-world performance. Should You Keep Such a Model?Yes — if your downstream task prioritizes WER (or CER), it’s reasonable to select the model checkpoint with the lowest WER, not necessarily the lowest validation loss. In most ASR use cases, sequence-level accuracy matters more than per-token log-likelihood. Take care for real-world not math world |
Beta Was this translation helpful? Give feedback.
-
Yes, refered to this the dataset size can affect that |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Hello,
I referred https://huggingface.co/blog/fine-tune-whisper for fine-tuning large model of whisper for English. One of my observation was that training loss is decreasing, and validation loss is increasing which is a classic example of overfitting.
But, WER is decreasing on the validation set.
The same behavior can also be seen in the above-shared link. I'm a bit confused about this.
Is it okay to take such models? What can be the reasons behind this?
@sanchit-gandhi , please reply if possible.
Beta Was this translation helpful? Give feedback.
All reactions