Doesn't seem to be working WITH SONGS? Keeps duplicating transcripts on several timestamps #1833
Replies: 3 comments 7 replies
-
Whisper tends to get confused when there is no voice, such as when there are periods of music without any vocals. You can use Demucs to filter out the backing music first so that you're left just with the vocals, and then use VAD (e.g. Silaro VAD) to filter out the silent bits so that Whisper only analyses the parts with actual speech to be transcribed. stable-ts is one tool that can do both of these for you, but there are many other tools that also at least do VAD. |
Beta Was this translation helpful? Give feedback.
-
You meant this vad thing (https://github.com/snakers4/silero-vad) ?
|
Beta Was this translation helpful? Give feedback.
-
Try large-v2! I'm also currently preparing a new version of WhisperHallu with a bunch of new features specially dedicated to music/song processing. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Hello,

I tried whisper with some song and Got this:
I tried large-v3 and "ro" language.
This line "Kimenikin ajtoumeleibe" only comes later, and several words were not registered before it.
Sometimes words and sentences are indeed repeated int this song but not like this,
Any way to improve this?
Beta Was this translation helpful? Give feedback.
All reactions