multilingual voices

eric-urban · eric-urban · commit 686cef8652f8 · 2023-07-21T08:58:27.000-07:00
diff --git a/articles/ai-services/speech-service/includes/language-support/tts-cnv.md b/articles/ai-services/speech-service/includes/language-support/tts-cnv.md
@@ -24,7 +24,7 @@ ms.author: eur
 | `en-GB` | English (United Kingdom) | Custom Neural Voice Pro<br/><br/>Custom Neural Voice Lite (Preview)<br/><br/>Cross-lingual voice |
 | `en-IE` | English (Ireland) | Custom Neural Voice Pro |
 | `en-IN` | English (India) | Custom Neural Voice Pro |
-| `en-US` | English (United States) | Custom Neural Voice Pro<br/><br/>Custom Neural Voice Lite (Preview)<br/><br/>Cross-lingual voice<br/><br/>Multi-style voice|
+| `en-US` | English (United States) | Custom Neural Voice Pro<br/><br/>Custom Neural Voice Lite (Preview)<br/><br/>Cross-lingual voice<br/><br/>Multi-style voice |
 | `es-ES` | Spanish (Spain) | Custom Neural Voice Pro<br/><br/>Custom Neural Voice Lite (Preview)<br/><br/>Cross-lingual voice |
 | `es-MX` | Spanish (Mexico) | Custom Neural Voice Pro<br/><br/>Custom Neural Voice Lite (Preview)<br/><br/>Cross-lingual voice |
 | `fi-FI` | Finnish (Finland) | Custom Neural Voice Pro |
@@ -38,7 +38,7 @@ ms.author: eur
 | `hu-HU` | Hungarian (Hungary) | Custom Neural Voice Pro |
 | `id-ID` | Indonesian (Indonesia) | Custom Neural Voice Pro<br/><br/>Cross-lingual voice |
 | `it-IT` | Italian (Italy) | Custom Neural Voice Pro<br/><br/>Custom Neural Voice Lite (Preview)<br/><br/>Cross-lingual voice |
-| `ja-JP` | Japanese (Japan) | Custom Neural Voice Pro<br/><br/>Custom Neural Voice Lite (Preview)<br/><br/>Cross-lingual voice<br/><br/>Multi-style voice<sup>1</sup> |
+| `ja-JP` | Japanese (Japan) | Custom Neural Voice Pro<br/><br/>Custom Neural Voice Lite (Preview)<br/><br/>Cross-lingual voice<br/><br/>Multi-style voice (Preview) |
 | `ko-KR` | Korean (Korea) | Custom Neural Voice Pro<br/><br/>Custom Neural Voice Lite (Preview)<br/><br/>Cross-lingual voice |
 | `ms-MY` | Malay (Malaysia) | Custom Neural Voice Pro |
 | `nb-NO` | Norwegian Bokmål (Norway) | Custom Neural Voice Pro |
@@ -57,8 +57,6 @@ ms.author: eur
 | `th-TH` | Thai (Thailand) | Custom Neural Voice Pro |
 | `tr-TR` | Turkish (Turkey) | Custom Neural Voice Pro |
 | `vi-VN` | Vietnamese (Vietnam) | Custom Neural Voice Pro |
-| `zh-CN` | Chinese (Mandarin, Simplified) | Custom Neural Voice Pro<br/><br/>Custom Neural Voice Lite (Preview)<br/><br/>Cross-lingual voice<br/><br/>Multi-style voice<sup>1</sup> |
+| `zh-CN` | Chinese (Mandarin, Simplified) | Custom Neural Voice Pro<br/><br/>Custom Neural Voice Lite (Preview)<br/><br/>Cross-lingual voice<br/><br/>Multi-style voice (Preview) |
 | `zh-HK` | Chinese (Cantonese, Traditional) | Custom Neural Voice Pro |
 | `zh-TW` | Chinese (Taiwanese Mandarin, Traditional) | Custom Neural Voice Pro |
-
-<sup>1</sup> `ja-JP` and `zh-CN`are currently in public preview for multi-style voice.
diff --git a/articles/ai-services/speech-service/includes/language-support/tts.md b/articles/ai-services/speech-service/includes/language-support/tts.md
@@ -52,7 +52,7 @@ ms.author: eur
 | `en-PH` | English (Philippines) | `en-PH-RosaNeural` (Female)<br/>`en-PH-JamesNeural` (Male) |
 | `en-SG` | English (Singapore) | `en-SG-LunaNeural` (Female)<br/>`en-SG-WayneNeural` (Male) |
 | `en-TZ` | English (Tanzania) | `en-TZ-ImaniNeural` (Female)<br/>`en-TZ-ElimuNeural` (Male) |
-| `en-US` | English (United States) | `en-US-JennyMultilingualNeural`<sup>3</sup> (Female)<br/>`en-US-JennyNeural` (Female)<br/>`en-US-GuyNeural` (Male)<br/>`en-US-AriaNeural` (Female)<br/>`en-US-DavisNeural` (Male)<br/>`en-US-AmberNeural` (Female)<br/>`en-US-AnaNeural` (Female, Child)<br/>`en-US-AshleyNeural` (Female)<br/>`en-US-BrandonNeural` (Male)<br/>`en-US-ChristopherNeural` (Male)<br/>`en-US-CoraNeural` (Female)<br/>`en-US-ElizabethNeural` (Female)<br/>`en-US-EricNeural` (Male)<br/>`en-US-JacobNeural` (Male)<br/>`en-US-JaneNeural` (Female)<br/>`en-US-JasonNeural` (Male)<br/>`en-US-JennyMultilingualV2Neural` (Female)<br/>`en-US-MichelleNeural` (Female)<br/>`en-US-MonicaNeural` (Female)<br/>`en-US-NancyNeural` (Female)<br/>`en-US-RogerNeural` (Male)<br/>`en-US-RyanMultilingualNeural` (Male)<br/>`en-US-SaraNeural` (Female)<br/>`en-US-SteffanNeural` (Male)<br/>`en-US-TonyNeural` (Male)<br/>`en-US-AIGenerate1Neural`<sup>1</sup> (Male)<br/>`en-US-AIGenerate2Neural`<sup>1</sup> (Female)<br/>`en-US-BlueNeural`<sup>1</sup> (Neutral) |
+| `en-US` | English (United States) | `en-US-JennyMultilingualNeural`<sup>3</sup> (Female)<br/>`en-US-JennyNeural` (Female)<br/>`en-US-GuyNeural` (Male)<br/>`en-US-AriaNeural` (Female)<br/>`en-US-DavisNeural` (Male)<br/>`en-US-AmberNeural` (Female)<br/>`en-US-AnaNeural` (Female, Child)<br/>`en-US-AshleyNeural` (Female)<br/>`en-US-BrandonNeural` (Male)<br/>`en-US-ChristopherNeural` (Male)<br/>`en-US-CoraNeural` (Female)<br/>`en-US-ElizabethNeural` (Female)<br/>`en-US-EricNeural` (Male)<br/>`en-US-JacobNeural` (Male)<br/>`en-US-JaneNeural` (Female)<br/>`en-US-JasonNeural` (Male)<br/>`en-US-MichelleNeural` (Female)<br/>`en-US-MonicaNeural` (Female)<br/>`en-US-NancyNeural` (Female)<br/>`en-US-RogerNeural` (Male)<br/>`en-US-SaraNeural` (Female)<br/>`en-US-SteffanNeural` (Male)<br/>`en-US-TonyNeural` (Male)<br/>`en-US-AIGenerate1Neural`<sup>1</sup> (Male)<br/>`en-US-AIGenerate2Neural`<sup>1</sup> (Female)<br/>`en-US-BlueNeural`<sup>1</sup> (Neutral)<br/>`en-US-JennyMultilingualV2Neural`<sup>1,3</sup> (Female)<br/>`en-US-RyanMultilingualNeural`<sup>1,3</sup> (Male) |
 | `en-ZA` | English (South Africa) | `en-ZA-LeahNeural` (Female)<br/>`en-ZA-LukeNeural` (Male) |
 | `es-AR` | Spanish (Argentina) | `es-AR-ElenaNeural` (Female)<br/>`es-AR-TomasNeural` (Male) |
 | `es-BO` | Spanish (Bolivia) | `es-BO-SofiaNeural` (Female)<br/>`es-BO-MarceloNeural` (Male) |
diff --git a/articles/ai-services/speech-service/includes/language-support/voice-styles-and-roles.md b/articles/ai-services/speech-service/includes/language-support/voice-styles-and-roles.md
@@ -38,7 +38,7 @@ ms.author: eur
 |zh-CN-XiaozhenNeural|`angry`, `cheerful`, `disgruntled`, `fearful`, `sad`, `serious`|Not supported|
 |zh-CN-YunfengNeural|`angry`, `cheerful`, `depressed`, `disgruntled`, `fearful`, `sad`, `serious`|Not supported|
 |zh-CN-YunhaoNeural<sup>2</sup>|`advertisement-upbeat`|Not supported|
-|zh-CN-YunjianNeural<sup>3,4</sup>|`narration-relaxed`, `sports-commentary`, `sports-commentary-excited`|Not supported|
+|zh-CN-YunjianNeural<sup>3,4</sup>|`angry`, `cheerful`, `depressed`, `disgruntled`, `documentary`, `narration-relaxed`, `sad`, `serious`, `sports-commentary`, `sports-commentary-excited`|Not supported|
 |zh-CN-YunxiaNeural|`angry`, `calm`, `cheerful`, `fearful`, `sad`|Not supported|
 |zh-CN-YunxiNeural|`angry`, `assistant`, `chat`, `cheerful`, `depressed`, `disgruntled`, `embarrassed`, `fearful`, `narration-relaxed`, `newscast`, `sad`, `serious`|`Boy`, `Narrator`, `YoungAdultMale`|
 |zh-CN-YunyangNeural|`customerservice`, `narration-professional`, `newscast-casual`|Not supported|
diff --git a/articles/ai-services/speech-service/includes/release-notes/release-notes-tts.md b/articles/ai-services/speech-service/includes/release-notes/release-notes-tts.md
@@ -6,6 +6,37 @@ ms.date: 02/28/2023
 ms.author: eur
 ---
 
+### July 2023 release
+
+#### Prebuilt Neural TTS Voices
+
+Introducing new `en-US` gender neutral voice for public preview:
+
+| Locale (BCP-47) | Language | Text to speech voices |
+| ----- | ----- | ----- |
+| `en-US` | English (United States) | `en-US-BlueNeural` (Neutral) |
+
+Introducing new multilingual voices for public preview:
+
+| Locale (BCP-47) | Language | Text to speech voices |
+| ----- | ----- | ----- |
+| `en-US` | English (United States) | `en-US-JennyMultilingualV2Neural` (Female) |
+| `en-US` | English (United States) | `en-US-RyanMultilingualNeural` (Male) |
+
+These new multilingual voices can speak in 41 languages and accents: `Arabic (Egypt)`, `Arabic (Saudi Arabia)`, `Catalan (Spain)`, `Czech (Czechia)`, `Danish (Denmark)`, `German (Austria)`, `German (Switzerland)`, `German (Germany)`, `English (Australia)`, `English (Canada)`, `English (United Kingdom)`, `English (Hong Kong SAR)`, `English (Ireland)`, `English (India)`, `English (United States)`, `Spanish (Spain)`, `Spanish (Mexico)`, `Finnish (Finland)`, `French (Belgium)`, `French (Canada)`, `French (Switzerland)`, `French (France)`, `Hindi (India)`, `Hungarian (Hungary)`, `Indonesian (Indonesia)`, `Italian (Italy)`, `Japanese (Japan)`, `Korean (Korea)`, `Norwegian Bokmål (Norway)`, `Dutch (Belgium)`, `Dutch (Netherlands)`, `Polish (Poland)`, `Portuguese (Brazil)`, `Portuguese (Portugal)`, `Russian (Russia)`, `Swedish (Sweden)`, `Thai (Thailand)`, `Turkish (Turkey)`, `Chinese (Mandarin, Simplified)`, `Chinese (Cantonese, Traditional)`, `Chinese (Taiwanese Mandarin, Traditional)`.
+
+These multilingual voices don't fully support certain SSML elements, such as break, emphasis, silence, and sub.
+
+> [!IMPORTANT]
+> The `en-US-JennyMultilingualV2Neural` voice is provided temporarily in public preview soley for evaluation purposes. It will be removed in the future. 
+> 
+> In order to speak in a language other than English, the current implementation of the `en-US-JennyMultilingualNeural` voice requires that you set the `<lang xml:lang>` element. We anticipate that during Q4 calendar year 2023, the `en-US-JennyMultilingualNeural` voice will be updated to speak in the language of the input text without the `<lang xml:lang>` element. This will be in parity with the `en-US-JennyMultilingualV2Neural` voice.
+
+Introducing new features in public preview for below voices:
+- Added Latin input for Serbian (Serbia) `sr-RS` voices: `sr-latn-RS-SophieNeural` and `sr-latn-RS-NicholasNeural`.
+- Added English pronunciation support for Albanian (Albania) `sq-AL` voices: `sq-AL-AnilaNeural` and `sq-AL-IlirNeural`.
+
+
 ### May 2023 release
 
 #### Audio Content Creation
diff --git a/articles/ai-services/speech-service/speech-synthesis-markup-structure.md b/articles/ai-services/speech-service/speech-synthesis-markup-structure.md
@@ -95,7 +95,7 @@ Attribute values must be enclosed by double or single quotation marks. For examp
 
 ## Speak root element
 
-The `speak` element is the root element that's required for all SSML documents. The `speak` element contains information such as version, language, and the markup vocabulary definition. 
+The `speak` element contains information such as version, language, and the markup vocabulary definition. The `speak` element is the root element that's required for all SSML documents. You must specify the default language within the `speak` element, whether or not the language is adjusted elsewhere such as within the [`lang`](speech-synthesis-markup-voice.md#adjust-speaking-languages) element. 
 
 Here's the syntax for the `speak` element:
 
diff --git a/articles/ai-services/speech-service/speech-synthesis-markup-voice.md b/articles/ai-services/speech-service/speech-synthesis-markup-voice.md
@@ -226,7 +226,7 @@ This example uses a custom voice named "my-custom-voice". The custom voice speak
 
 By default, all neural voices are fluent in their own language and English without using the `<lang xml:lang>` element. For example, if the input text in English is "I'm excited to try text to speech" and you use the `es-ES-ElviraNeural` voice, the text is spoken in English with a Spanish accent. With most neural voices, setting a specific speaking language with `<lang xml:lang>` element at the sentence or word level is currently not supported.
 
-You can adjust the speaking language for the `en-US-JennyMultilingualNeural` neural voice at the sentence level and word level by using the `<lang xml:lang>` element. The `en-US-JennyMultilingualNeural` neural voice is multilingual in 14 languages (For example: English, Spanish, and Chinese). The supported languages are provided in a table following the `<lang>` syntax and attribute definitions.
+The `<lang xml:lang>` element is primarily intended for multilingual neural voices. You can adjust the speaking language for the multilingual neural voice at the sentence level and word level. The supported languages for multilingual voices are [provided in a table](#multilingual-voices-with-the-lang-element) following the `<lang>` syntax and attribute definitions.
 
 Usage of the `lang` element's attributes are described in the following table.
 
@@ -237,19 +237,30 @@ Usage of the `lang` element's attributes are described in the following table.
 > [!NOTE]
 > The `<lang xml:lang>` element is incompatible with the `prosody` and `break` elements. You can't adjust pause and prosody like pitch, contour, rate, or volume in this element.
 
+### Multilingual voices with the lang element
+
 Use this table to determine which speaking languages are supported for each neural voice. If the voice doesn't speak the language of the input text, the Speech service won't output synthesized audio.
 
-| Voice | Primary and default locale | Secondary locales |
-| ---------- | ---------- | ---------- |
-| `en-US-JennyMultilingualNeural` | `en-US` | `de-DE`, `en-AU`, `en-CA`, `en-GB`, `es-ES`, `es-MX`, `fr-CA`, `fr-FR`, `it-IT`, `ja-JP`, `ko-KR`, `pt-BR`, `zh-CN` |
+| Voice | Supported locales |
+| ---------- | ---------- |
+| `en-US-JennyMultilingualNeural`<sup>1</sup> | `de-DE`, `en-AU`, `en-CA`, `en-GB`, `es-ES`, `es-MX`, `fr-CA`, `fr-FR`, `it-IT`, `ja-JP`, `ko-KR`, `pt-BR`, `zh-CN` |
+| `en-US-JennyMultilingualV2Neural`<sup>2</sup> | `ar-EG`, `ar-SA`, `ca-ES`, `cs-CZ`, `da-DK`, `de-AT`, `de-CH`, `de-DE`, `en-AU`, `en-CA`, `en-GB`, `en-HK`, `en-IE`, `en-IN`, `en-US`, `es-ES`, `es-MX`, `fi-FI`, `fr-BE`, `fr-CA`, `fr-CH`, `fr-FR`, `hi-IN`, `hu-HU`, `id-ID`, `it-IT`, `ja-JP`, `ko-KR`, `nb-NO`, `nl-BE`, `nl-NL`, `pl-PL`, `pt-BR`, `pt-PT`, `ru-RU`, `sv-SE`, `th-TH`, `tr-TR`, `zh-CN`, `zh-HK`, `zh-TW`. |
+| `en-US-RyanMultilingualNeural` | `ar-EG`, `ar-SA`, `ca-ES`, `cs-CZ`, `da-DK`, `de-AT`, `de-CH`, `de-DE`, `en-AU`, `en-CA`, `en-GB`, `en-HK`, `en-IE`, `en-IN`, `en-US`, `es-ES`, `es-MX`, `fi-FI`, `fr-BE`, `fr-CA`, `fr-CH`, `fr-FR`, `hi-IN`, `hu-HU`, `id-ID`, `it-IT`, `ja-JP`, `ko-KR`, `nb-NO`, `nl-BE`, `nl-NL`, `pl-PL`, `pt-BR`, `pt-PT`, `ru-RU`, `sv-SE`, `th-TH`, `tr-TR`, `zh-CN`, `zh-HK`, `zh-TW`. |
+
+<sup>1</sup> In order to speak in a language other than English, the current implementation of the `en-US-JennyMultilingualNeural` voice requires that you set the `<lang xml:lang>` element. We anticipate that during Q4 calendar year 2023, the `en-US-JennyMultilingualNeural` voice will be updated to speak in the language of the input text without the `<lang xml:lang>` element. This will be in parity with the `en-US-JennyMultilingualV2Neural` voice.
+
+<sup>2</sup> The `en-US-JennyMultilingualV2Neural` voice is provided temporarily in public preview soley for evaluation purposes. It will be removed in the future. 
+
+> [!NOTE] 
+> Multilingual voices don't fully support certain SSML elements, such as break, emphasis, silence, and sub.
 
 ### Lang examples
 
 The supported values for attributes of the `lang` element were [described previously](#adjust-speaking-languages). 
 
-The primary language for `en-US-JennyMultilingualNeural` is `en-US`. You must specify `en-US` as the default language within the `speak` element, whether or not the language is adjusted elsewhere. 
+You must specify `en-US` as the default language within the `speak` element, whether or not the language is adjusted elsewhere. In this example, the primary language for `en-US-JennyMultilingualNeural` is `en-US`. 
 
-This SSML snippet shows how to use the `lang` element (and `xml:lang` attribute) to speak `de-DE` with the `en-US-JennyMultilingualNeural` neural voice.
+This SSML snippet shows how to use `<lang xml:lang>` to speak `de-DE` with the `en-US-JennyMultilingualNeural` neural voice.
 
 ```xml
 <speak version="1.0" xmlns="http://www.w3.org/2001/10/synthesis"