Issues with clip_timestamps: Slow Transcription and Nonsensical Results #2551
Unanswered
esphoenixc
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
When the following value is provided for the
clip_timestamps
area, it takes far too long to transcribe a one-minute audio file—approximately 60 to 120 seconds, which is absurd. The intention behind using clip_timestamps is to prevent Whisper from generating hallucinations by only transcribing the speech areas detected via silero-vad. However, when the audio contains multiple languages, even reducing the clip_timestamps value by a factor of ten still results in a lengthy transcription process, and the output becomes gibberish with nonsensical, repeating words and strange symbols. Has anyone else experienced this issue?I am using Whisper large V3 Turbo model.
Beta Was this translation helpful? Give feedback.
All reactions