Incorrect transcription with recent releases #1407

narendrapatel · 2023-05-30T06:38:00Z

narendrapatel
May 30, 2023

We are working with whisper for some time now and its working pretty well for us.
whisper version in use: openai-whisper==v20230314
However we noticed one issue when we were transcribing one of our internal video. For the same we used the base model. However the model gave us a very bad transcription. Basically a word "you" repeated some 200 times and nothing else. Running with the small model instead of base gave an acceptable transcription.
One observation, the video has some background noise and no proper audio conversation for the first five minutes. If we strip that 5 minutes out and run the transcription for the remaining video with the base model than it provides an acceptable transcription.

Running against the latest whisper code also gave the same result.

We had an docker image of an very old whisper release. We tried to check on it and it gave an acceptable transcription with the base model. pip freeze on the docker image gave the following:

whisper @ git+https://github.com/openai/whisper.git@eff383b27b783e280c089475852ba83f20f64998

The base.pt model seems to be the same across all versions. So is there some recent code changes in whisper that can be attributed to this behavior? Do we have some options to make the transcriptions more accurate?

phineas-pta · 2023-05-30T09:04:35Z

phineas-pta
May 30, 2023

it's called hallucinations see #679, it has persisted ever since

an automatic solution to remove silence/non-speech is to use VAD (silero, pyannote-audio, nemo toolkit, etc.)

2 replies

nikola1975 May 30, 2023

Unfortunately, hallucinations will appear in speech segments as well. I have few examples where Whisper (API) will return the transcription of the same audio file, sometimes with series of repeated words, and on re-transcribing the same file, it might return it fine.

narendrapatel May 31, 2023
Author

Thanks for the revert. Will check it out.

Purfview · 2023-05-30T15:43:47Z

Purfview
May 30, 2023

So is there some recent code changes in whisper that can be attributed to this behavior?

I think you are observing a behavior of smaller models not a behavior of different code. Repeated runs with smaller models can produce different results.
Run medium/large for consistency.

2 replies

narendrapatel May 31, 2023
Author

Yes I agree that larger models provide more accurate transcription.
But repeated runs with the historical release on base model still gives an acceptable transcription but hallucinates with the newer version.
Also if we strip out the initial noise then it works with the newer version as well.

mrfragger May 31, 2023

yea to me I used medium.en model and only get halluciantions on about 2% of the time. So far I'm correcting it where subtitles stay stuck as the timecodes are wrong. The culprit so far seems to be music even just 5 seconds sends it for a frizzy.

a2937 · 2025-07-23T14:45:27Z

a2937
Jul 23, 2025

It happened to me too. I don't like the hallucinations especially on longer audio files.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Incorrect transcription with recent releases #1407

Uh oh!

{{title}}

Uh oh!

Replies: 3 comments 4 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Incorrect transcription with recent releases #1407

Uh oh!

narendrapatel May 30, 2023

Replies: 3 comments · 4 replies

Uh oh!

phineas-pta May 30, 2023

Uh oh!

nikola1975 May 30, 2023

Uh oh!

narendrapatel May 31, 2023 Author

Uh oh!

Uh oh!

Purfview May 30, 2023

Uh oh!

narendrapatel May 31, 2023 Author

Uh oh!

mrfragger May 31, 2023

Uh oh!

a2937 Jul 23, 2025

narendrapatel
May 30, 2023

Replies: 3 comments 4 replies

phineas-pta
May 30, 2023

narendrapatel May 31, 2023
Author

Purfview
May 30, 2023

narendrapatel May 31, 2023
Author

a2937
Jul 23, 2025