Missing the first 21 seconds in small.en and large-v2 #1937
Replies: 5 comments 10 replies
-
Hi! I did some testing & can reproduce this problem with whisper.cpp and whisper-timestamped as well (tested with small.en). |
Beta Was this translation helpful? Give feedback.
-
Problematic part is:
I encountered same behavior with the different contents, looks like some models just refuse to output anything on such ads. |
Beta Was this translation helpful? Give feedback.
-
Can you please share a link to the hosted MP3 file? |
Beta Was this translation helpful? Give feedback.
-
DonQuixote_GitHub_15Minutes.zip |
Beta Was this translation helpful? Give feedback.
-
Here's yet another workaround idea that seems to work: Tested by slowing down the audio to 0.6 of the original. Do note that I only used the first 30s for this test. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
I have a Libravox mp3 recording of Don Quixote and I've transcribed the file using all models, from tiny_en to large-v3. What I've encountered is that in small_en and large-v2, the first 21 seconds speech are not transcribed. It will simply start transcribing at the 21st second. All other models transcribe the file from the beginning. The only thing that jumps out at me is the coincidence that the word at the beginning at at the 21st second are the same word "dedication".
My concern is that there are other files where the beginning is being skipped and I don't know the trigger, or a workaround.
below is the SRT output from base.en and large-v2
This is transcribed using base.en
This is the beginning
1
00:00:00,000 --> 00:00:09,640
Dedication, Preface, Dermatic Personae, and Act I of Don Quixote in England by Henry Fielding.
2
00:00:10,440 --> 00:00:11,660
This is a LibraVox recording.
3
00:00:12,640 --> 00:00:14,800
All LibraVox recordings are in the public domain.
4
00:00:15,800 --> 00:00:20,720
For more information or to volunteer, please visit LibraVox.org.
5
00:00:21,480 --> 00:00:27,720
Dedication to the right honorable Philip, Earl of Chesterfield, Knight of the Most Noble
This is transcribed using Large-V2
This is not the beginning of the file but actually starting at 00:00:21,000
1
00:00:00,000 --> 00:00:22,420
DEDICATION
2
00:00:22,740 --> 00:00:28,980
To the Right Honorable Philip, Earl of Chesterfield, Knight of the Most Noble Order of the Garter,
3
00:00:29,660 --> 00:00:30,240
My Lord.
4
00:00:31,000 --> 00:00:37,260
However unworthy these scenes may be of your Lordship's protection, the design with which
Beta Was this translation helpful? Give feedback.
All reactions