Incorrect transcription with recent releases #1407
Replies: 3 comments 4 replies
-
it's called hallucinations see #679, it has persisted ever since an automatic solution to remove silence/non-speech is to use VAD (silero, pyannote-audio, nemo toolkit, etc.) |
Beta Was this translation helpful? Give feedback.
-
I think you are observing a behavior of smaller models not a behavior of different code. Repeated runs with smaller models can produce different results. |
Beta Was this translation helpful? Give feedback.
-
It happened to me too. I don't like the hallucinations especially on longer audio files. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
We are working with whisper for some time now and its working pretty well for us.
whisper version in use: openai-whisper==v20230314
However we noticed one issue when we were transcribing one of our internal video. For the same we used the base model. However the model gave us a very bad transcription. Basically a word "you" repeated some 200 times and nothing else. Running with the small model instead of base gave an acceptable transcription.
One observation, the video has some background noise and no proper audio conversation for the first five minutes. If we strip that 5 minutes out and run the transcription for the remaining video with the base model than it provides an acceptable transcription.
Running against the latest whisper code also gave the same result.
We had an docker image of an very old whisper release. We tried to check on it and it gave an acceptable transcription with the base model. pip freeze on the docker image gave the following:
The base.pt model seems to be the same across all versions. So is there some recent code changes in whisper that can be attributed to this behavior? Do we have some options to make the transcriptions more accurate?
Beta Was this translation helpful? Give feedback.
All reactions