Replies: 3 comments 1 reply
-
the model isn't that intelligent to correct misspelled words, it just give you the closest sound-alike |
Beta Was this translation helpful? Give feedback.
-
From my testing in English? Whisper handles accents + completely made-up terms fantastically. Like:
How Are Spoken Accents Handled?This is helped by diverse training sets. For example, Mozilla's "Common Voice" is one of the datasets used inside of Whisper's training. They have a giant list of:
which volunteers then:
After users verify the spoken audio matches the text, it gets added to the dataset. In their FAQ, they have a whole section on:
Audio Variability (Helping Accuracy)Different Accents in your audio is just one piece of the puzzle. It's also helpful to get balanced:
and even great to get audio submitted using different:
For example, if your language's audio skewed towards:
Then, if you were an older female trying to use Speech-to-Text, the detection would be awful! Like see this famous article:
Instead, if your language's audio tended towards:
anything you train against it would become more robust against all sorts of spoken differences.
Well, that's where more+better data has to be gathered! Common Voice currently has 112 languages supported (with more getting added all the time). The more hours of validated audio data you can gather for a given language, the better it'll become for all speakers of that language! :) Side Note: Which non-English language do you speak? |
Beta Was this translation helpful? Give feedback.
-
*pronunciation. Also, everything is pronounced with an accent. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
How it deals with words that are not pronounced correctly or pronounced with accent?
This might be challenged with non-english languages. What about the english itself?
Beta Was this translation helpful? Give feedback.
All reactions