Medium and Large models hallucination #678
-
Hi, I am trying to train Whisper Medium model on my custom dataset. (Using this guide) I did training on small models and it achieves very good results with WER 10 on my custom dataset. By hallucination I mean increasing WER with repeating words. For example my Medium model after training on 500 hours of audio gives me: word I checked audio itself, and its fine, there are some background noises, but the voice is very recognizable, and in fact my finetuned The model behaves better with smaller generation_max_length setting, but cannot predict long audios. The dataset itself contains small audios within 10 seconds range. I tried:
still no good. How to solve such issue? |
Beta Was this translation helpful? Give feedback.
Replies: 3 comments 11 replies
-
I just posted about this a minute ago, I had the same problem. |
Beta Was this translation helpful? Give feedback.
-
For any one facing such issue in transformers.
In my case, the dataset was fine, and I managed to overcome the problem by simply updating transformers version to 4.25 and finetuning parameters Previously I used 4.24, which was causing this, after updating it went well. I suppose the issue was in tokenizer, but I am not sure why exactly P.S. dont forget to clear your dataset cache and rerun feature_extraction and tokenization again. In my case using newest version with old cache did not work either |
Beta Was this translation helpful? Give feedback.
-
Based on this guide https://huggingface.co/blog/fine-tune-whisper, I tried to fine-tune "small" and "large-v3" models.
|
Beta Was this translation helpful? Give feedback.
For any one facing such issue in transformers.
In my case, the dataset was fi…