Repeat the same text at the end of long audio? #977

miku1958 · 2023-02-18T05:06:42Z

miku1958
Feb 18, 2023

Hi, I tried to use "whisper --model large-v2 --output_format json --language Chinese" to process an audio of more than 4 hours, the first part works fine, but after 3 hours and 40 minutes it keeps repeating a certain word "小蜘蛛"（little spider in chinese） until the end. I have checked the audio, there is normal content in the back.

Does anyone know how to fix it? I am using Windows10, Cuda 11.7.

Thanks!

miku1958 · 2023-02-18T05:34:28Z

miku1958
Feb 18, 2023
Author

It happened again when I was working on a 2 hour audio.

0 replies

miku1958 · 2023-02-18T05:41:23Z

miku1958
Feb 18, 2023
Author

I set --condition_on_previous_text to False and it looks like fix the issue, but --initial_prompt doesn't seem to work anymore, the output content will change back and forth between Simplified Chinese and Traditional Chinese

0 replies

miku1958 · 2023-02-19T12:28:42Z

miku1958
Feb 19, 2023
Author

Anyone konw how to fix it without set --condition_on_previous_text to False? Accuracy drops too much after disabling.

0 replies

ACBelkina · 2023-02-19T20:04:31Z

ACBelkina
Feb 19, 2023

I have observed this before and it happens rather randomly. I only see it with large models, but I don't use the other ones as much. I am mostly doing Chinese speech to text with or without translation.
Similar behavior reported here: https://blog.gdeltproject.org/experiments-with-whisper-asr-model-parameters-non-determinism-temperature_increment_on_fallback/
I have found it rather random and happening more often with longer videos (of course). I ended up slicing and joining them back after I translate them. If I get this glitch, I just rerun the segment. Not a real solution of course...

0 replies

heimoshuiyu · 2023-04-18T08:33:37Z

heimoshuiyu
Apr 18, 2023

@miku1958 Hello, I encountered the same issue as you did. I made some modifications #1253 to the prompt and solved my issue. You can try the modified branch by running the following command:

pip uninstall openai-whisper
pip install git+https://github.com/heimoshuiyu/whisper.git@prompt --upgrade

0 replies

dgoryeo · 2023-04-18T09:16:49Z

dgoryeo
Apr 18, 2023

Hi @heimoshuiyu , this is quite interesting approach. If you don't mind, would you share your insight on why you decided not to introduce a VAD for your usecase? Would VAD affect the quality of transcription negatively?

1 reply

heimoshuiyu Apr 18, 2023

There is no particular reason...it's just that introducing VAD requires additional code to process Whisper's input and output, which I find to be a bit cumbersome 😶. I prefer to use Whisper to get everything done.

Would VAD affect the quality of transcription negatively?

I think It depends on the VAD you are using and the type of audio. Generally, they can improve transcription quality, but sometimes VAD may cut the audio too early or too late, or even fail to cut between sentences, which may decrease the transcription quality of Whisper.

For my use case, generating extra text is not a problem because I can easily remove them. But if there are missing text, I will have to manually add them, which can be a bit cumbersome.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repeat the same text at the end of long audio? #977

Uh oh!

{{title}}

Uh oh!

Replies: 6 comments 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Repeat the same text at the end of long audio? #977

Uh oh!

miku1958 Feb 18, 2023

Replies: 6 comments · 1 reply

Uh oh!

miku1958 Feb 18, 2023 Author

Uh oh!

miku1958 Feb 18, 2023 Author

Uh oh!

miku1958 Feb 19, 2023 Author

Uh oh!

Uh oh!

ACBelkina Feb 19, 2023

Uh oh!

heimoshuiyu Apr 18, 2023

Uh oh!

dgoryeo Apr 18, 2023

Uh oh!

heimoshuiyu Apr 18, 2023

miku1958
Feb 18, 2023

Replies: 6 comments 1 reply

miku1958
Feb 18, 2023
Author

miku1958
Feb 18, 2023
Author

miku1958
Feb 19, 2023
Author

ACBelkina
Feb 19, 2023

heimoshuiyu
Apr 18, 2023

dgoryeo
Apr 18, 2023