Replies: 4 comments
-
Worth experimenting to see for yourself, I would think that the effect would vary based on speaker clarity, audio quality/noise, speedup and language. Note that the original Whisper announcement (2022) had a demo transcript of very high speed speech, see "Speed talking" under examples here. |
Beta Was this translation helpful? Give feedback.
-
I did some experiments with this, with several stretching tools (see WhisperHallu code), and didn't get any improvements. |
Beta Was this translation helpful? Give feedback.
-
Yeah I had the same idea here a while back. Also, at some point whisper.cpp had a switch that sped up the audio but I guess it's been dropped since. |
Beta Was this translation helpful? Give feedback.
-
on most Kaldi recipes , speed-permutation was a default augmentation method , and usually performs well , gives you a "free" 1%-2% reduction in WER. It might helps if you want to fine-tune to a very unique domain , but with smaller training set. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
I imagine at a certain point enough increase/decrease will be detrimental to accuracy and different voices will provide different results in terms of speed. But just thinking out loud that an increase of even just 10% speed would be a big boost in terms of transcribing audio speed. As well as reducing costs overall.
Thoughts?
Beta Was this translation helpful? Give feedback.
All reactions