Whisper can't do large gaps between spoken sections #1278

fingertrouble · 2023-04-24T17:22:17Z

fingertrouble
Apr 24, 2023

Whisper is brilliant - and getting better I think (or learning more about how I say things) but one thing is a real faff - it does weird things if you have long gaps between spoken words (like I am a music podcaster, so I will have gaps between the spoken bits).

Interestingly it can also pick up lyrics and singing, but if I find I have long instrumental sections, it will quite often lose and skip whole spoken word sections repeating a word like this:

01:19:15.600 --> 01:19:17.600
Jingle

01:19:37.600 --> 01:19:47.600
Jingle

01:19:51.600 --> 01:19:53.600
Jingle

01:20:07.600 --> 01:20:17.600
Jingle

01:49:42.600 --> 01:49:45.600
Ooh, ooh, ooh

01:49:45.600 --> 01:49:48.600
Ooh, ooh, ooh

01:49:48.600 --> 01:49:52.600
Ooh, ooh, ooh

01:49:53.600 --> 01:49:56.600
Ooh, ooh, ooh

01:49:59.600 --> 01:50:02.600
Ooh, ooh, ooh

It does this if I work on the original podcast (with music sections/beds) or just export the speech with no music behind

What I have ended up doing is exporting all the spoken bits separately with small gaps between chunks, but that means any timings are wrong, and it's a faff to edit and export a second podcast just for transcription.

Then if the gaps are quite short it doesn't seem to lose whole parts. I'd love to have transcription in the player, like some players allow, for accessibility but that's not possible with the way Whisper works.

Is there a setting I've missed for it to not lose track? Or is this a bug?

Using Whisper via brew (and formerly pip install, same issue, I recently upgraded to the latest build via Homebrew, didn't fix this) on a M1 Pro Macbook, Ventura 13.3.1, Python 3.11.3 - latter Python install via homebrew, cos Macs still use 2.7 for some odd reason.

mayeaux · 2023-04-24T18:15:23Z

mayeaux
Apr 24, 2023

It's the classic Whisper hallucination issue. You can try running it with --condition_on_previous_text as False but the only real solution atm is using a VAD. People have tried different attempts to fix it but nothing has stuck yet insofar as I can tell.

1 reply

EtienneAb3d Apr 25, 2023

In my own experiments, this is solving 99.9% of the hallucination problems (and is improving lyrics recognition by vocals extraction/separation from music tracks):
https://github.com/EtienneAb3d/WhisperHallu

To get timestamps, you have to combine it with:
https://github.com/EtienneAb3d/WhisperTimeSync

See full discussion here:
#679

themanyone · 2023-04-25T07:50:01Z

themanyone
Apr 25, 2023

I got around this with my whisper continuous dictation & remote control tool by using sox to cut silence.
#1282

The gist of it is here, although listening to the mic, sox can also process audio files. Refer to the man page on that as it can get quite involved.

# Listen to mic. The `&` lets it operate in the background.
# The `1 0.2 3%` part of the sox rec command trims 1 segment of silence from the beginning longer than 0.2 seconds and low
er than 3% of the volume level.
# The final `1 2.0 1%` part tells it to trim 1 segment of silence from the end. It stops recording after 2.0 seconds of si
lence. Change to 5% or more with poor recording equipment and noisy environments.

rec -c 1 -r 22050 -t mp3 $tmp silence 1 0.2 3% 1 2.0 1%

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Whisper can't do large gaps between spoken sections #1278

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 2 comments 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Select a reply

Uh oh!

Whisper can't do large gaps between spoken sections #1278

Uh oh!

Uh oh!

fingertrouble Apr 24, 2023

Replies: 2 comments · 1 reply

Uh oh!

mayeaux Apr 24, 2023

Uh oh!

Uh oh!

EtienneAb3d Apr 25, 2023

Uh oh!

Uh oh!

themanyone Apr 25, 2023

fingertrouble
Apr 24, 2023

Replies: 2 comments 1 reply

mayeaux
Apr 24, 2023

themanyone
Apr 25, 2023