Replies: 1 comment
-
I'm facing the same issue with my custom dataset. I tried fine-tuning for more epochs with my custom dataset, but the 'ghost transcript' still hasn't improved. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Hi guys, I'm having a strange hallucination problem while using Whisper-large-v3 for English speech recognition. Some additional names of people appear at the beginning of the transcript results, which are not present at the beginning of the audio. But the name is mentioned later in the audio.






The audio causing the issue is:
audio.zip
Its label is: ”And thank you very much good evening everybody and a warm welcome to our next presentation. My name is Katharina Morlang and together with my colleagues hiker hoods and patrick young, please give me your hands.“
When I invoke the model as follows, the output starts with "Katharina Moorlach". If you look at timestamp, the hallucination person's name appears between 0 and 0.04 seconds. But listen carefully, the audio doesn't begin with that word. And when look at the whole transcript results, the hallucination name is mentioned later in the audio.
I also tried to follow the tutorial in the Huggingface,
from transformers import AutoModelForSpeechSeq2Seq, AutoProcessor, pipeline
to invoke the model. But it still has hallucination words "Katharina Morlan and Hülse Tuchel-" at the beginning of the transcript.When I added
"suppress_tokens": ""
, the output changed but still contained the hallucination "Katharina Morlan:".I also tried to use the large-v3 model to do speech recognition for other English audio, but only found problems with this audio till now. I also tried large-v2 model, and large-v2 model didn't have this issue on this audio.
I have carefully checked the Github Discussions for other discussions that mention Whister-large-v3 hallucination issues. But it seems that the hallucination problems often occur in the non-speech parts of the audio. In my test audio, the beginning of the audio sound is clear, you can clearly hear the word "and", and there is no non-speech part which only has background sound.
There are several interesting phenomena in this issue, which I would like to consult:
"suppress_tokens": ""
?The issue seems to be a very specific and rare hallucination problem. Can anyone share some thoughts on this issue? And how to solve this problem? Thank you so much for your help!!!
Beta Was this translation helpful? Give feedback.
All reactions