Pronounciation and Accent #1414

NomiJ · 2023-06-01T16:20:43Z

NomiJ
Jun 1, 2023

How it deals with words that are not pronounced correctly or pronounced with accent?
This might be challenged with non-english languages. What about the english itself?

phineas-pta · 2023-06-02T08:32:13Z

phineas-pta
Jun 2, 2023

the model isn't that intelligent to correct misspelled words, it just give you the closest sound-alike

0 replies

Tex2002ans · 2023-06-02T20:11:47Z

Tex2002ans
Jun 2, 2023

How it deals with words that are not pronounced correctly or pronounced with accent?

From my testing in English? Whisper handles accents + completely made-up terms fantastically.

Like:

Last Names
- "My name is Mr. John Smitherson."
Company Names
- "I run a company called Whisperion Inc."
Book/Article Titles
URLs / Websites
- "Visit examplewebsite.com"
Usernames
- (social media, emails, etc.)
Technical/Programming Terminology
- "First, open the WebToolkitProxy and WebToolkitDownloader and then do X, Y, and Z."

How Are Spoken Accents Handled?

This is helped by diverse training sets.

For example, Mozilla's "Common Voice" is one of the datasets used inside of Whisper's training.

They have a giant list of:

Sentences (Text)

which volunteers then:

Speak (Audio)

After users verify the spoken audio matches the text, it gets added to the dataset.

In their FAQ, they have a whole section on:

Varying Pronunciations

We welcome different accents! Be very cautious before rejecting a clip on the ground that you think the reader has mispronounced a word, has put the stress in the wrong place, or has ignored punctuation. There are a wide variety of pronunciations in use around the world, some of which you may not have heard in your local community. Please provide a generous margin of appreciation for those who may speak differently from you.

Audio Variability (Helping Accuracy)

Different Accents in your audio is just one piece of the puzzle.

It's also helpful to get balanced:

Ages
- Young -> Old
Sexes
- Male / Female

and even great to get audio submitted using different:

Equipment
- High quality microphones -> crappy webcam/phone
Noise
- From quiet rooms -> loud coffee shops
Accents
- Native speakers -> ESL (English as a Second Language) speakers.

For example, if your language's audio skewed towards:

90% males / 10% females
90% young users / 10% old users

Then, if you were an older female trying to use Speech-to-Text, the detection would be awful! Like see this famous article:

"Siri doesn’t understand singer who gave iPhone her voice" (Metro.co.uk)
- Siri's own voice actress has poor Speech-to-Text accuracy... and her husband's voice recognizes even better than hers!

Instead, if your language's audio tended towards:

50% male / 50% female
33% young / 33% middle-aged / 33% old
50% native speakers / 50% second-language speakers
50% quiet / 50% noisier rooms
25% American / 25% British / 25% African / 25% Indian
- Different accents.

anything you train against it would become more robust against all sorts of spoken differences.

This might be challenged with non-english languages. What about the english itself?

Well, that's where more+better data has to be gathered!

Common Voice currently has 112 languages supported (with more getting added all the time).

The more hours of validated audio data you can gather for a given language, the better it'll become for all speakers of that language! :)

Side Note: Which non-English language do you speak?

1 reply

phineas-pta Jun 2, 2023

openai didn't use commonvoice in training but in evaluating and comparing whisper to other models

tutacat · 2024-12-05T14:23:12Z

tutacat
Dec 5, 2024

*pronunciation.

Also, everything is pronounced with an accent.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Pronounciation and Accent #1414

Uh oh!

{{title}}

Uh oh!

Replies: 3 comments 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Select a reply

Uh oh!

Pronounciation and Accent #1414

Uh oh!

NomiJ Jun 1, 2023

Replies: 3 comments · 1 reply

Uh oh!

phineas-pta Jun 2, 2023

Uh oh!

Uh oh!

Tex2002ans Jun 2, 2023

How Are Spoken Accents Handled?

Audio Variability (Helping Accuracy)

Uh oh!

phineas-pta Jun 2, 2023

Uh oh!

Uh oh!

tutacat Dec 5, 2024

NomiJ
Jun 1, 2023

Replies: 3 comments 1 reply

phineas-pta
Jun 2, 2023

Tex2002ans
Jun 2, 2023

tutacat
Dec 5, 2024