With hundreds of voices available by default across various browsers and OS, it can be tricky for developers to provide sensible defaults and a curated list of voices.
One of the goals of this project is to document higher quality voices available on various platforms and provide an easy way to implement these recommendations using JSON configuration files.
- Providing the best possible default voice per language
- Displaying an ordered list of voices, based on quality
- Displaying user-friendly voice names
- Filtering recommended voices per gender and age (adult vs children)
- Filtering out novelty and low quality voices
- Previewing a voice with a test utterance
The goal of this project is to support all 43 languages available on Windows and macOS.
In its current state, it covers 43 languages:
- Arabic (Algeria, Bahrain, Egypt, Iraq, Jordan, Kuwait, Lebanon, Libya, Morocco, Oman, Qatar, Saudi Arabia, Syria, Tunisia, United Arab Emirates, Yemen)
- Basque
- Bengali (India and Bangladesh)
- Bhojpuri
- Bulgarian
- Catalan
- Chinese:
- Mandarin Chinese (Mainland China, Taiwan)
- Wu Chinese (aka "Shanghainese")
- Yue Chinese (aka "Cantonese")
- Croatian
- Czech
- Danish
- Dutch (Netherlands and Belgium)
- English (United States, United Kingdom, Australia, Canada, Hong Kong, India, Ireland, Kenya, New Zealand, Nigeria, Scotland, Singapore, South Africa and Tanzania)
- Finnish
- French (France, Canada, Belgium and Switzerland)
- Galician
- German (Germany, Austria and Switzerland)
- Greek
- Hebrew
- Hindi
- Hungarian
- Indonesian
- Italian
- Japanese
- Kannada
- Kazakh
- Korean
- Malay
- Marathi
- Norwegian
- Persian
- Polish
- Portuguese (Portugal and Brazil)
- Romanian
- Russian
- Slovak
- Slovenian
- Spanish (Spain, Argentina, Bolivia, Chile, Colombia, Costa Rica, Cuba, Dominican Republic, Ecuador, El Salvador, Equatorial Guinea, Guatemala, Honduras, Mexico, Nicaragua, Panama, Paraguay, Peru, Puerto Rico, United States, Uruguay and Venezuela)
- Swedish
- Tamil (India, Sri Lanka, Malaysia and Singapore)
- Telugu
- Thai
- Turkish
- Ukrainian
- Vietnamese
At the other end up the spectrum, this project also identifies a number of voices that should be filtered out from a voice selector component.
Some of them are harmful to the overall reading experience, while others have a very low quality on platforms where better preloaded options are available.
- Novelty voices (Apple devices)
- Very low quality voices (Apple devices and Chrome OS)
- Each voice list is ordered and meant to provide an optimal listening experience on all browsers/OS/languages covered by this project.
- But each list also includes default options, to make sure that there's always something reliable to lean on.
- With these two goals in mind, higher quality voices are listed on top of the list, while lower quality voices or specialized ones are listed at the bottom.
- The number of voices can look overwhelming (110+ voices in English alone) but in practice, just a few of them will be available to users on each of their device.
- The voice names returned by the Web Speech API are hardly user-friendly, which is the reason why this list provides alternate ones that usually include a first name (or a gender) along with the region associated to the voice.
- Whenever possible, I will always try to include a good mix of high quality and default options for both genders.
- But the list has to be prioritized somehow, female voices are currently listed above their male counterparts. Since the gender associated to each voice is documented, this allows implementers to re-prioritize/filter the list based on this criteria.
- Regional variants are also grouped together in a single list rather than separated in their own files on purpose. On some devices, only two or three voices might be available and separating regional variants wouldn't make much sense.
- But regional variants have to be prioritized somehow in the list. For now, the regions with the best selections of voices are listed above, but it is highly recommended to implementers to consider the user's regional preferences.
A JSON Schema is available for validation or potential contributors interested in opening a PR for new languages or voice additions.
label is required for each recommended voice and provides a human-friendly label for each voice.
This string is localized for the target language and usually contains the following information:
- First name (if available)
- Gender (when the first name is missing)
- Country/region
Example 1: Microsoft Natural voices
While the names documented by Microsoft for their natural voices are easily understandable, they tend to be very long and they're all localized in English.
{
"label": "Isabella (Italia)",
"name": "Microsoft Isabella Online (Natural) - Italian (Italy)",
"language": "it-IT"
}Example 2: Chrome OS voices
Chrome OS provides a number of high quality voices through its Android subsystems, but they come with some of the worst names possibles for an end-user.
{
"label": "Female voice 1 (US)",
"name": "Android Speech Recognition and Synthesis from Google en-us-x-tpc-network",
"language": "en-US"
}name is required for each recommended voice and it's used as the main identifier for voices in this project.
Names are mostly stable across browsers, which means that for most voices, a single string is sufficient.
But there are unfortunately some outliers: Android, iOS, iPadOS and macOS voices.
For those voices, at least a portion of the string is often localized, naming can be inconsistent across browsers and they can change depending on the number of variants installed.
Because of this, each list can also contain the following properties:
altNameswith an array of alternate strings for a given voice- and
localizedNamethat identifies the string pattern used for localizing these voices
Example 3: Alternate version of an Apple preloaded voice
{
"label": "Samantha (US)",
"name": "Samantha",
"localizedName": "apple",
"altNames": [
"Samantha (Enhanced)",
"Samantha (English (United States))"
],
"language": "en-US"
}language is required for each recommended voice.
It contains a BCP 47 language tag where a downcased two-letter language code is followed by an uppercased two-letter country code.
The language and country codes are separated using a hyphen (-).
Somes voices are also capable of handling another language, for example a Spanish voice for the United States might also be capable of handling English.
For this reason, an additionalLanguages property is also available although it is fairly rarely used right now.
It contains a list of languages using only two-letter codes, without a sub-tag.
Some brand new voices from Microsoft are also capable of a multilingual output. The language switch isn't supported in the middle of a sentence, but the output seems capable of auto-detecting the language of each sentence and adopt itself accordingly.
In order to support this, the output might automatically switch to a different voice in the process.
These voices are identified using the multiLingual boolean.
Example 4: Voice with a multilingual output
{
"label": "Emma (US)",
"name": "Microsoft EmmaMultilingual Online (Natural) - English (United States)",
"language": "en-US",
"multiLingual": true
}Example 5: Voice capable of handling a secondary language
{
"label": "Sylvie (Canada)",
"name": "Microsoft Sylvie Online (Natural) - French (Canada)",
"language": "fr-CA",
"otherLanguages": [
"en"
]
}gender is an optional property for each voice, that documents the gender associated to each voice.
The following values are supported: female, male or neutral.
children is also optional and identifies children voices using a boolean.
Example 6: Female children voice
{
"label": "Ana (US)",
"name": "Microsoft Ana Online (Natural) - English (United States)",
"language": "en-US",
"gender": "female",
"children": true
}quality is an optional property for each voice, that documents the quality of the various variants of a voice.
The following values are supported:
- veryHigh
- Very high, almost human-indistinguishable quality of speech synthesis
- high
- High, human-like quality of speech synthesis
- normal
- Normal quality of speech synthesis
- low
- Low, not human-like quality of speech synthesis
- veryLow
- Very low, but still intelligible quality of speech synthesis
Example 7: An Apple voice available in three quality variants
{
"label": "Ava (US)",
"name": "Ava",
"note": "This voice can be installed on all Apple devices and offers three variants. Like all voices that can be installed on Apple devices, it suffers from inconsistent naming due to localization.",
"altNames": [
"Ava (Premium)",
"Ava (Enhanced)",
"Ava (English (United States))",
],
"language": "en-US",
"gender": "female",
"quality": [
"low",
"normal",
"high"
],
"rate": 1,
"pitch": 1,
"os": [
"macOS",
"iOS",
"iPadOS"
]
}Both os and browser are optional properties. They're used to indicate in which operating systems and browsers a voice is available.
These two properties are meant to be interpreted separately and not as a combination.
Example 8: A Microsoft voice available in both Edge and Windows
{
"label": "Denise (France)",
"name": "Microsoft Denise Online (Natural) - French (France)",
"note": "This voice is preloaded in Edge on desktop. In other browsers, it requires the user to run Windows 11 and install the voice pack.",
"language": "fr-FR",
"gender": "female",
"os": [
"Windows"
],
"browser": [
"Edge"
]
}In addition, preloaded indicates if the voice is preloaded in all the OS and browsers that have been identified.
With the current approach, it's not possible to indicate that a voice is available on Chrome and Windows, but requires a download on Windows for example.
Example 9: A Google voice preloaded in Chrome Desktop
{
"label": "Google female voice (UK)",
"name": "Google UK English Female",
"language": "en-GB",
"gender": "female",
"browser": [
"ChromeDesktop"
],
"preloaded": true
}When using the Web Speech API, SpeechSynthesisUtterance supports optional values for:
Each voice documented in this repo supports the following optional properties:
pitchControlis a boolean that defaults totrueand indicates if a voice can be pitch controlledrateis an integer between 0.1 and 10 that defaults to 1 and provides a recommended default speech rate for each voicepitchis an integer between 0 and 2 that defaults to 1 and provides a recommended default pitch for each voice
Example 10: Microsoft voice where the pitch cannot be adjusted
{
"label": "Ana (US)",
"name": "Microsoft Ana Online (Natural) - English (United States)",
"language": "en-US",
"gender": "female",
"pitchControl": false
}Example 11: Google voice with recommended pitch and speed rates
{
"label": "Voix Google féminine (France)",
"name": "Google français",
"language": "fr-FR",
"gender": "female",
"rate": 1,
"pitch": 0.8
}