-
Hello, I'm developing an app for people who stutter to help them know where there are stuttering and create some exercices, but whisper do a great job and remove the suttering parts from the transcription. Do you think that is possible to have the transcription like "T-T-Th-The cat is flying" and not the corrected transcription "The cat is flying" ? Thank you |
Beta Was this translation helpful? Give feedback.
Replies: 4 comments 4 replies
-
whisper cannot do this. FYI in case this other effort is helpful -
|
Beta Was this translation helpful? Give feedback.
-
This might help your problem a bit altough probably not perfect for stuttered speech as is, but maybe a good starting point if you have datasets including stuttered speech with appropriate labels: |
Beta Was this translation helpful? Give feedback.
-
hey @snettah that app is a really cool idea. So have you already developed this or are you still working on it now? |
Beta Was this translation helpful? Give feedback.
-
Hey, just stumbled upon this (no pun intended) . I was looking for the same use-case, and even though CrisperWhisper is good at detecting disfluencies it was not good enough for Dutch. I decided to fine-tune Whisper on a large Dutch dataset (Corpus Gesproken Nederlands). It contains literal transcriptions of 600 hours of audio. The results are really good. It can catch disfluencies really well like stuttering, eh, uh, mm-hu etc. but it was also able to detect stuttering. Even though it is only Dutch it should be possible to do for other languages or make it more generic (if you have enough time/money). I fine-tuned whisper-large-v3 on a NVidia 5090: https://huggingface.co/pevers/whisperd-nl And the training code with details about the dataset: https://github.com/pevers/whisperd-nl |
Beta Was this translation helpful? Give feedback.
whisper cannot do this. FYI in case this other effort is helpful -