Keep transcription with stuttering ? #1517

snettah · 2023-07-11T16:01:28Z

snettah
Jul 11, 2023

Hello,

I'm developing an app for people who stutter to help them know where there are stuttering and create some exercices, but whisper do a great job and remove the suttering parts from the transcription.

Do you think that is possible to have the transcription like "T-T-Th-The cat is flying" and not the corrected transcription "The cat is flying" ?

Thank you

Answered by glangford

Jul 11, 2023

whisper cannot do this. FYI in case this other effort is helpful -

Project Euphonia is a Google Research initiative focused on helping people with non-standard speech be better understood. The approach is centered on analyzing speech recordings to better train speech recognition models.
https://sites.research.google/euphonia/about/

View full answer

glangford · 2023-07-11T17:41:07Z

glangford
Jul 11, 2023

whisper cannot do this. FYI in case this other effort is helpful -

Project Euphonia is a Google Research initiative focused on helping people with non-standard speech be better understood. The approach is centered on analyzing speech recordings to better train speech recognition models.
https://sites.research.google/euphonia/about/

1 reply

loneicewolf Jun 29, 2025

Thanks a lot for this answer! even if its 2 years ago, this was immensely helpful. I didn't know it wasn't possible; thanks again!

LaurinmyReha · 2024-09-09T07:44:34Z

LaurinmyReha
Sep 9, 2024

This might help your problem a bit altough probably not perfect for stuttered speech as is, but maybe a good starting point if you have datasets including stuttered speech with appropriate labels:

https://github.com/nyrahealth/CrisperWhisper

2 replies

snettah Sep 10, 2024
Author

Clearly a good starting point, thank you very much

LaurinmyReha Sep 10, 2024

Having finetuned and played with Whisper quite a bit i am actually certain this is possible. Just finding and labeling enough data will be the main challenge. If you find and or otherwise manage to create a appropriate Dataset and find a good labeling format i have no doubt that you will be succesful. For CrisperWhisper i also played with synthesiszed Datasets using Elevenlabs for ,,simulating" stuttered speech to improve verbatim transcripts and found this to improve performance quite a bit. So i would assume the dataset of actual stuttered speech would not need to be too large for things like this to work.

Nanobot234 · 2025-04-03T14:34:17Z

Nanobot234
Apr 3, 2025

hey @snettah that app is a really cool idea. So have you already developed this or are you still working on it now?

1 reply

loneicewolf Jun 29, 2025

I also want to know :D as a stutterer myself I really got curious

pevers · 2025-07-01T20:33:19Z

pevers
Jul 1, 2025

Hey, just stumbled upon this (no pun intended) .

I was looking for the same use-case, and even though CrisperWhisper is good at detecting disfluencies it was not good enough for Dutch. I decided to fine-tune Whisper on a large Dutch dataset (Corpus Gesproken Nederlands). It contains literal transcriptions of 600 hours of audio.

The results are really good. It can catch disfluencies really well like stuttering, eh, uh, mm-hu etc. but it was also able to detect stuttering.

Even though it is only Dutch it should be possible to do for other languages or make it more generic (if you have enough time/money).

I fine-tuned whisper-large-v3 on a NVidia 5090:

https://huggingface.co/pevers/whisperd-nl

And the training code with details about the dataset: https://github.com/pevers/whisperd-nl

0 replies

Keep transcription with stuttering ? #1517

Uh oh!

Replies: 4 comments · 4 replies

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

snettah Sep 10, 2024 Author

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Replies: 4 comments 4 replies

snettah Sep 10, 2024
Author