Skip to content

Commit b445c35

Browse files
Merge pull request #245824 from eric-urban/eur/tts-release-notes
multilingual voices
2 parents 042a380 + 686cef8 commit b445c35

File tree

6 files changed

+54
-14
lines changed

6 files changed

+54
-14
lines changed

articles/ai-services/speech-service/includes/language-support/tts-cnv.md

Lines changed: 3 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -24,7 +24,7 @@ ms.author: eur
2424
| `en-GB` | English (United Kingdom) | Custom Neural Voice Pro<br/><br/>Custom Neural Voice Lite (Preview)<br/><br/>Cross-lingual voice |
2525
| `en-IE` | English (Ireland) | Custom Neural Voice Pro |
2626
| `en-IN` | English (India) | Custom Neural Voice Pro |
27-
| `en-US` | English (United States) | Custom Neural Voice Pro<br/><br/>Custom Neural Voice Lite (Preview)<br/><br/>Cross-lingual voice<br/><br/>Multi-style voice|
27+
| `en-US` | English (United States) | Custom Neural Voice Pro<br/><br/>Custom Neural Voice Lite (Preview)<br/><br/>Cross-lingual voice<br/><br/>Multi-style voice |
2828
| `es-ES` | Spanish (Spain) | Custom Neural Voice Pro<br/><br/>Custom Neural Voice Lite (Preview)<br/><br/>Cross-lingual voice |
2929
| `es-MX` | Spanish (Mexico) | Custom Neural Voice Pro<br/><br/>Custom Neural Voice Lite (Preview)<br/><br/>Cross-lingual voice |
3030
| `fi-FI` | Finnish (Finland) | Custom Neural Voice Pro |
@@ -38,7 +38,7 @@ ms.author: eur
3838
| `hu-HU` | Hungarian (Hungary) | Custom Neural Voice Pro |
3939
| `id-ID` | Indonesian (Indonesia) | Custom Neural Voice Pro<br/><br/>Cross-lingual voice |
4040
| `it-IT` | Italian (Italy) | Custom Neural Voice Pro<br/><br/>Custom Neural Voice Lite (Preview)<br/><br/>Cross-lingual voice |
41-
| `ja-JP` | Japanese (Japan) | Custom Neural Voice Pro<br/><br/>Custom Neural Voice Lite (Preview)<br/><br/>Cross-lingual voice<br/><br/>Multi-style voice<sup>1</sup> |
41+
| `ja-JP` | Japanese (Japan) | Custom Neural Voice Pro<br/><br/>Custom Neural Voice Lite (Preview)<br/><br/>Cross-lingual voice<br/><br/>Multi-style voice (Preview) |
4242
| `ko-KR` | Korean (Korea) | Custom Neural Voice Pro<br/><br/>Custom Neural Voice Lite (Preview)<br/><br/>Cross-lingual voice |
4343
| `ms-MY` | Malay (Malaysia) | Custom Neural Voice Pro |
4444
| `nb-NO` | Norwegian Bokmål (Norway) | Custom Neural Voice Pro |
@@ -57,8 +57,6 @@ ms.author: eur
5757
| `th-TH` | Thai (Thailand) | Custom Neural Voice Pro |
5858
| `tr-TR` | Turkish (Turkey) | Custom Neural Voice Pro |
5959
| `vi-VN` | Vietnamese (Vietnam) | Custom Neural Voice Pro |
60-
| `zh-CN` | Chinese (Mandarin, Simplified) | Custom Neural Voice Pro<br/><br/>Custom Neural Voice Lite (Preview)<br/><br/>Cross-lingual voice<br/><br/>Multi-style voice<sup>1</sup> |
60+
| `zh-CN` | Chinese (Mandarin, Simplified) | Custom Neural Voice Pro<br/><br/>Custom Neural Voice Lite (Preview)<br/><br/>Cross-lingual voice<br/><br/>Multi-style voice (Preview) |
6161
| `zh-HK` | Chinese (Cantonese, Traditional) | Custom Neural Voice Pro |
6262
| `zh-TW` | Chinese (Taiwanese Mandarin, Traditional) | Custom Neural Voice Pro |
63-
64-
<sup>1</sup> `ja-JP` and `zh-CN`are currently in public preview for multi-style voice.

articles/ai-services/speech-service/includes/language-support/tts.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -52,7 +52,7 @@ ms.author: eur
5252
| `en-PH` | English (Philippines) | `en-PH-RosaNeural` (Female)<br/>`en-PH-JamesNeural` (Male) |
5353
| `en-SG` | English (Singapore) | `en-SG-LunaNeural` (Female)<br/>`en-SG-WayneNeural` (Male) |
5454
| `en-TZ` | English (Tanzania) | `en-TZ-ImaniNeural` (Female)<br/>`en-TZ-ElimuNeural` (Male) |
55-
| `en-US` | English (United States) | `en-US-JennyMultilingualNeural`<sup>3</sup> (Female)<br/>`en-US-JennyNeural` (Female)<br/>`en-US-GuyNeural` (Male)<br/>`en-US-AriaNeural` (Female)<br/>`en-US-DavisNeural` (Male)<br/>`en-US-AmberNeural` (Female)<br/>`en-US-AnaNeural` (Female, Child)<br/>`en-US-AshleyNeural` (Female)<br/>`en-US-BrandonNeural` (Male)<br/>`en-US-ChristopherNeural` (Male)<br/>`en-US-CoraNeural` (Female)<br/>`en-US-ElizabethNeural` (Female)<br/>`en-US-EricNeural` (Male)<br/>`en-US-JacobNeural` (Male)<br/>`en-US-JaneNeural` (Female)<br/>`en-US-JasonNeural` (Male)<br/>`en-US-JennyMultilingualV2Neural` (Female)<br/>`en-US-MichelleNeural` (Female)<br/>`en-US-MonicaNeural` (Female)<br/>`en-US-NancyNeural` (Female)<br/>`en-US-RogerNeural` (Male)<br/>`en-US-RyanMultilingualNeural` (Male)<br/>`en-US-SaraNeural` (Female)<br/>`en-US-SteffanNeural` (Male)<br/>`en-US-TonyNeural` (Male)<br/>`en-US-AIGenerate1Neural`<sup>1</sup> (Male)<br/>`en-US-AIGenerate2Neural`<sup>1</sup> (Female)<br/>`en-US-BlueNeural`<sup>1</sup> (Neutral) |
55+
| `en-US` | English (United States) | `en-US-JennyMultilingualNeural`<sup>3</sup> (Female)<br/>`en-US-JennyNeural` (Female)<br/>`en-US-GuyNeural` (Male)<br/>`en-US-AriaNeural` (Female)<br/>`en-US-DavisNeural` (Male)<br/>`en-US-AmberNeural` (Female)<br/>`en-US-AnaNeural` (Female, Child)<br/>`en-US-AshleyNeural` (Female)<br/>`en-US-BrandonNeural` (Male)<br/>`en-US-ChristopherNeural` (Male)<br/>`en-US-CoraNeural` (Female)<br/>`en-US-ElizabethNeural` (Female)<br/>`en-US-EricNeural` (Male)<br/>`en-US-JacobNeural` (Male)<br/>`en-US-JaneNeural` (Female)<br/>`en-US-JasonNeural` (Male)<br/>`en-US-MichelleNeural` (Female)<br/>`en-US-MonicaNeural` (Female)<br/>`en-US-NancyNeural` (Female)<br/>`en-US-RogerNeural` (Male)<br/>`en-US-SaraNeural` (Female)<br/>`en-US-SteffanNeural` (Male)<br/>`en-US-TonyNeural` (Male)<br/>`en-US-AIGenerate1Neural`<sup>1</sup> (Male)<br/>`en-US-AIGenerate2Neural`<sup>1</sup> (Female)<br/>`en-US-BlueNeural`<sup>1</sup> (Neutral)<br/>`en-US-JennyMultilingualV2Neural`<sup>1,3</sup> (Female)<br/>`en-US-RyanMultilingualNeural`<sup>1,3</sup> (Male) |
5656
| `en-ZA` | English (South Africa) | `en-ZA-LeahNeural` (Female)<br/>`en-ZA-LukeNeural` (Male) |
5757
| `es-AR` | Spanish (Argentina) | `es-AR-ElenaNeural` (Female)<br/>`es-AR-TomasNeural` (Male) |
5858
| `es-BO` | Spanish (Bolivia) | `es-BO-SofiaNeural` (Female)<br/>`es-BO-MarceloNeural` (Male) |

articles/ai-services/speech-service/includes/language-support/voice-styles-and-roles.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -38,7 +38,7 @@ ms.author: eur
3838
|zh-CN-XiaozhenNeural|`angry`, `cheerful`, `disgruntled`, `fearful`, `sad`, `serious`|Not supported|
3939
|zh-CN-YunfengNeural|`angry`, `cheerful`, `depressed`, `disgruntled`, `fearful`, `sad`, `serious`|Not supported|
4040
|zh-CN-YunhaoNeural<sup>2</sup>|`advertisement-upbeat`|Not supported|
41-
|zh-CN-YunjianNeural<sup>3,4</sup>|`narration-relaxed`, `sports-commentary`, `sports-commentary-excited`|Not supported|
41+
|zh-CN-YunjianNeural<sup>3,4</sup>|`angry`, `cheerful`, `depressed`, `disgruntled`, `documentary`, `narration-relaxed`, `sad`, `serious`, `sports-commentary`, `sports-commentary-excited`|Not supported|
4242
|zh-CN-YunxiaNeural|`angry`, `calm`, `cheerful`, `fearful`, `sad`|Not supported|
4343
|zh-CN-YunxiNeural|`angry`, `assistant`, `chat`, `cheerful`, `depressed`, `disgruntled`, `embarrassed`, `fearful`, `narration-relaxed`, `newscast`, `sad`, `serious`|`Boy`, `Narrator`, `YoungAdultMale`|
4444
|zh-CN-YunyangNeural|`customerservice`, `narration-professional`, `newscast-casual`|Not supported|

articles/ai-services/speech-service/includes/release-notes/release-notes-tts.md

Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,37 @@ ms.date: 02/28/2023
66
ms.author: eur
77
---
88

9+
### July 2023 release
10+
11+
#### Prebuilt Neural TTS Voices
12+
13+
Introducing new `en-US` gender neutral voice for public preview:
14+
15+
| Locale (BCP-47) | Language | Text to speech voices |
16+
| ----- | ----- | ----- |
17+
| `en-US` | English (United States) | `en-US-BlueNeural` (Neutral) |
18+
19+
Introducing new multilingual voices for public preview:
20+
21+
| Locale (BCP-47) | Language | Text to speech voices |
22+
| ----- | ----- | ----- |
23+
| `en-US` | English (United States) | `en-US-JennyMultilingualV2Neural` (Female) |
24+
| `en-US` | English (United States) | `en-US-RyanMultilingualNeural` (Male) |
25+
26+
These new multilingual voices can speak in 41 languages and accents: `Arabic (Egypt)`, `Arabic (Saudi Arabia)`, `Catalan (Spain)`, `Czech (Czechia)`, `Danish (Denmark)`, `German (Austria)`, `German (Switzerland)`, `German (Germany)`, `English (Australia)`, `English (Canada)`, `English (United Kingdom)`, `English (Hong Kong SAR)`, `English (Ireland)`, `English (India)`, `English (United States)`, `Spanish (Spain)`, `Spanish (Mexico)`, `Finnish (Finland)`, `French (Belgium)`, `French (Canada)`, `French (Switzerland)`, `French (France)`, `Hindi (India)`, `Hungarian (Hungary)`, `Indonesian (Indonesia)`, `Italian (Italy)`, `Japanese (Japan)`, `Korean (Korea)`, `Norwegian Bokmål (Norway)`, `Dutch (Belgium)`, `Dutch (Netherlands)`, `Polish (Poland)`, `Portuguese (Brazil)`, `Portuguese (Portugal)`, `Russian (Russia)`, `Swedish (Sweden)`, `Thai (Thailand)`, `Turkish (Turkey)`, `Chinese (Mandarin, Simplified)`, `Chinese (Cantonese, Traditional)`, `Chinese (Taiwanese Mandarin, Traditional)`.
27+
28+
These multilingual voices don't fully support certain SSML elements, such as break, emphasis, silence, and sub.
29+
30+
> [!IMPORTANT]
31+
> The `en-US-JennyMultilingualV2Neural` voice is provided temporarily in public preview soley for evaluation purposes. It will be removed in the future.
32+
>
33+
> In order to speak in a language other than English, the current implementation of the `en-US-JennyMultilingualNeural` voice requires that you set the `<lang xml:lang>` element. We anticipate that during Q4 calendar year 2023, the `en-US-JennyMultilingualNeural` voice will be updated to speak in the language of the input text without the `<lang xml:lang>` element. This will be in parity with the `en-US-JennyMultilingualV2Neural` voice.
34+
35+
Introducing new features in public preview for below voices:
36+
- Added Latin input for Serbian (Serbia) `sr-RS` voices: `sr-latn-RS-SophieNeural` and `sr-latn-RS-NicholasNeural`.
37+
- Added English pronunciation support for Albanian (Albania) `sq-AL` voices: `sq-AL-AnilaNeural` and `sq-AL-IlirNeural`.
38+
39+
940
### May 2023 release
1041

1142
#### Audio Content Creation

articles/ai-services/speech-service/speech-synthesis-markup-structure.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -95,7 +95,7 @@ Attribute values must be enclosed by double or single quotation marks. For examp
9595

9696
## Speak root element
9797

98-
The `speak` element is the root element that's required for all SSML documents. The `speak` element contains information such as version, language, and the markup vocabulary definition.
98+
The `speak` element contains information such as version, language, and the markup vocabulary definition. The `speak` element is the root element that's required for all SSML documents. You must specify the default language within the `speak` element, whether or not the language is adjusted elsewhere such as within the [`lang`](speech-synthesis-markup-voice.md#adjust-speaking-languages) element.
9999

100100
Here's the syntax for the `speak` element:
101101

articles/ai-services/speech-service/speech-synthesis-markup-voice.md

Lines changed: 17 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -226,7 +226,7 @@ This example uses a custom voice named "my-custom-voice". The custom voice speak
226226

227227
By default, all neural voices are fluent in their own language and English without using the `<lang xml:lang>` element. For example, if the input text in English is "I'm excited to try text to speech" and you use the `es-ES-ElviraNeural` voice, the text is spoken in English with a Spanish accent. With most neural voices, setting a specific speaking language with `<lang xml:lang>` element at the sentence or word level is currently not supported.
228228

229-
You can adjust the speaking language for the `en-US-JennyMultilingualNeural` neural voice at the sentence level and word level by using the `<lang xml:lang>` element. The `en-US-JennyMultilingualNeural` neural voice is multilingual in 14 languages (For example: English, Spanish, and Chinese). The supported languages are provided in a table following the `<lang>` syntax and attribute definitions.
229+
The `<lang xml:lang>` element is primarily intended for multilingual neural voices. You can adjust the speaking language for the multilingual neural voice at the sentence level and word level. The supported languages for multilingual voices are [provided in a table](#multilingual-voices-with-the-lang-element) following the `<lang>` syntax and attribute definitions.
230230

231231
Usage of the `lang` element's attributes are described in the following table.
232232

@@ -237,19 +237,30 @@ Usage of the `lang` element's attributes are described in the following table.
237237
> [!NOTE]
238238
> The `<lang xml:lang>` element is incompatible with the `prosody` and `break` elements. You can't adjust pause and prosody like pitch, contour, rate, or volume in this element.
239239
240+
### Multilingual voices with the lang element
241+
240242
Use this table to determine which speaking languages are supported for each neural voice. If the voice doesn't speak the language of the input text, the Speech service won't output synthesized audio.
241243

242-
| Voice | Primary and default locale | Secondary locales |
243-
| ---------- | ---------- | ---------- |
244-
| `en-US-JennyMultilingualNeural` | `en-US` | `de-DE`, `en-AU`, `en-CA`, `en-GB`, `es-ES`, `es-MX`, `fr-CA`, `fr-FR`, `it-IT`, `ja-JP`, `ko-KR`, `pt-BR`, `zh-CN` |
244+
| Voice | Supported locales |
245+
| ---------- | ---------- |
246+
| `en-US-JennyMultilingualNeural`<sup>1</sup> | `de-DE`, `en-AU`, `en-CA`, `en-GB`, `es-ES`, `es-MX`, `fr-CA`, `fr-FR`, `it-IT`, `ja-JP`, `ko-KR`, `pt-BR`, `zh-CN` |
247+
| `en-US-JennyMultilingualV2Neural`<sup>2</sup> | `ar-EG`, `ar-SA`, `ca-ES`, `cs-CZ`, `da-DK`, `de-AT`, `de-CH`, `de-DE`, `en-AU`, `en-CA`, `en-GB`, `en-HK`, `en-IE`, `en-IN`, `en-US`, `es-ES`, `es-MX`, `fi-FI`, `fr-BE`, `fr-CA`, `fr-CH`, `fr-FR`, `hi-IN`, `hu-HU`, `id-ID`, `it-IT`, `ja-JP`, `ko-KR`, `nb-NO`, `nl-BE`, `nl-NL`, `pl-PL`, `pt-BR`, `pt-PT`, `ru-RU`, `sv-SE`, `th-TH`, `tr-TR`, `zh-CN`, `zh-HK`, `zh-TW`. |
248+
| `en-US-RyanMultilingualNeural` | `ar-EG`, `ar-SA`, `ca-ES`, `cs-CZ`, `da-DK`, `de-AT`, `de-CH`, `de-DE`, `en-AU`, `en-CA`, `en-GB`, `en-HK`, `en-IE`, `en-IN`, `en-US`, `es-ES`, `es-MX`, `fi-FI`, `fr-BE`, `fr-CA`, `fr-CH`, `fr-FR`, `hi-IN`, `hu-HU`, `id-ID`, `it-IT`, `ja-JP`, `ko-KR`, `nb-NO`, `nl-BE`, `nl-NL`, `pl-PL`, `pt-BR`, `pt-PT`, `ru-RU`, `sv-SE`, `th-TH`, `tr-TR`, `zh-CN`, `zh-HK`, `zh-TW`. |
249+
250+
<sup>1</sup> In order to speak in a language other than English, the current implementation of the `en-US-JennyMultilingualNeural` voice requires that you set the `<lang xml:lang>` element. We anticipate that during Q4 calendar year 2023, the `en-US-JennyMultilingualNeural` voice will be updated to speak in the language of the input text without the `<lang xml:lang>` element. This will be in parity with the `en-US-JennyMultilingualV2Neural` voice.
251+
252+
<sup>2</sup> The `en-US-JennyMultilingualV2Neural` voice is provided temporarily in public preview soley for evaluation purposes. It will be removed in the future.
253+
254+
> [!NOTE]
255+
> Multilingual voices don't fully support certain SSML elements, such as break, emphasis, silence, and sub.
245256
246257
### Lang examples
247258

248259
The supported values for attributes of the `lang` element were [described previously](#adjust-speaking-languages).
249260

250-
The primary language for `en-US-JennyMultilingualNeural` is `en-US`. You must specify `en-US` as the default language within the `speak` element, whether or not the language is adjusted elsewhere.
261+
You must specify `en-US` as the default language within the `speak` element, whether or not the language is adjusted elsewhere. In this example, the primary language for `en-US-JennyMultilingualNeural` is `en-US`.
251262

252-
This SSML snippet shows how to use the `lang` element (and `xml:lang` attribute) to speak `de-DE` with the `en-US-JennyMultilingualNeural` neural voice.
263+
This SSML snippet shows how to use `<lang xml:lang>` to speak `de-DE` with the `en-US-JennyMultilingualNeural` neural voice.
253264

254265
```xml
255266
<speak version="1.0" xmlns="http://www.w3.org/2001/10/synthesis"

0 commit comments

Comments
 (0)