You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/ai-services/speech-service/how-to-speech-synthesis-viseme.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -94,7 +94,7 @@ The blend shapes JSON string is represented as a 2-dimensional matrix. Each row
94
94
To get viseme with your synthesized speech, subscribe to the `VisemeReceived` event in the Speech SDK.
95
95
96
96
> [!NOTE]
97
-
> To request SVG or blend shapes output, you should use the `mstts:viseme` element in SSML. For details, see [how to use viseme element in SSML](speech-synthesis-markup-structure.md#viseme-element).
97
+
> To request SVG or blend shapes output, you should use the `mstts:viseme` element in SSML. For details, see [how to use viseme element in SSML](speech-synthesis-markup-voice.md#viseme-element).
98
98
99
99
The following snippet shows how to subscribe to the viseme event:
Copy file name to clipboardExpand all lines: articles/ai-services/speech-service/includes/language-support/tts.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -168,7 +168,7 @@ ms.custom: references_regions
168
168
169
169
<sup>2</sup> The neural voice is available in public preview in these service [regions](../../regions.md): Central India, East Asia, East US, Southeast Asia, and West US.
170
170
171
-
<sup>3</sup> [Phonemes](../../speech-synthesis-markup-pronunciation.md#phoneme-element), [custom lexicon](../../speech-synthesis-markup-pronunciation.md#custom-lexicon), and [visemes](../../speech-synthesis-markup-structure.md#viseme-element) aren't supported. For details about supported visemes, see [viseme locales](../../language-support.md?tabs=tts#viseme).
171
+
<sup>3</sup> [Phonemes](../../speech-synthesis-markup-pronunciation.md#phoneme-element), [custom lexicon](../../speech-synthesis-markup-pronunciation.md#custom-lexicon), and [visemes](../../speech-synthesis-markup-voice.md#viseme-element) aren't supported. For details about supported visemes, see [viseme locales](../../language-support.md?tabs=tts#viseme).
172
172
173
173
<sup>4</sup> The neural voice is a multilingual voice in Azure AI Speech. Turbo version of Azure OpenAI voices has the similar voice persona as Azure OpenAI voices but supports extra features. Turbo voices support the full set of SSML elements and more features like word boundary, just like other Azure AI Speech voices.
Copy file name to clipboardExpand all lines: articles/ai-services/speech-service/includes/release-notes/release-notes-tts.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -856,7 +856,7 @@ For more information, see the [language and voice list](../../language-support.m
856
856
#### Get facial position with viseme
857
857
858
858
* Added support for blend shapes to drive the facial movements of a 3D character that you designed. Learn more at [how to get facial position with viseme](../../how-to-speech-synthesis-viseme.md).
859
-
* SSML updated to support viseme element. See [speech synthesis markup](../../speech-synthesis-markup-structure.md#viseme-element).
859
+
* SSML updated to support viseme element. See [speech synthesis markup](../../speech-synthesis-markup-voice.md#viseme-element).
Copy file name to clipboardExpand all lines: articles/ai-services/speech-service/language-support.md
+8-1Lines changed: 8 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -94,7 +94,7 @@ Use the following table to determine supported styles and roles for each voice.
94
94
95
95
### Viseme
96
96
97
-
This table lists all the locales supported for [Viseme](speech-synthesis-markup-structure.md#viseme-element). For more information about Viseme, see [Get facial position with viseme](how-to-speech-synthesis-viseme.md) and [Viseme element](speech-synthesis-markup-structure.md#viseme-element).
97
+
This table lists all the locales supported for [Viseme](speech-synthesis-markup-voice.md#viseme-element). For more information about Viseme, see [Get facial position with viseme](how-to-speech-synthesis-viseme.md) and [Viseme element](speech-synthesis-markup-voice.md#viseme-element).
98
98
99
99
[!INCLUDE [Language support include](includes/language-support/viseme.md)]
100
100
@@ -125,6 +125,13 @@ With the cross-lingual feature, you can transfer your custom voice model to spea
125
125
[!INCLUDE [Language support include](includes/language-support/personal-voice.md)]
126
126
127
127
128
+
### Voice conversion
129
+
130
+
[Voice conversion](voice-conversion.md) is a feature that lets you transform the voice characteristics of a given audio to a target voice speaker. The following table summarizes the locales supported for voice conversion.
131
+
132
+
[!INCLUDE [Language support include](includes/language-support/voice-conversion.md)]
The table in this section summarizes the 33 locales supported for pronunciation assessment, and each language is available on all [speech to text regions](regions.md#regions). Latest update extends support from English to 32 more languages and quality enhancements to existing features, including accuracy, fluency, and miscue assessment. You should specify the language that you're learning or practicing improving pronunciation. The default language is set as `en-US`. If you know your target learning language, [set the locale](how-to-pronunciation-assessment.md#get-pronunciation-assessment-results) accordingly. For example, if you're learning British English, you should specify the language as `en-GB`. If you're teaching a broader language, such as Spanish, and are uncertain about which locale to select, you can run various accent models (`es-ES`, `es-MX`) to determine the one that achieves the highest score to suit your specific scenario. If you're interested in languages not listed in the following table, fill out this [intake form](https://aka.ms/speechpa/intake) for further assistance.
@@ -69,6 +70,7 @@ Some examples of contents that are allowed in each element are described in the
69
70
-`math`: This element can only contain text and MathML elements.
70
71
-`mstts:audioduration`: This element can't contain text or any other elements.
71
72
-`mstts:backgroundaudio`: This element can't contain text or any other elements.
73
+
-`<mstts:voiceconversion>`: This element can't contain text or any other elements. It specifies the source audio URL for the voice conversion.
72
74
-`mstts:embedding`: This element can contain text and the following elements: `audio`, `break`, `emphasis`, `lang`, `phoneme`, `prosody`, `say-as`, and `sub`.
73
75
-`mstts:express-as`: This element can contain text and the following elements: `audio`, `break`, `emphasis`, `lang`, `phoneme`, `prosody`, `say-as`, and `sub`.
74
76
-`mstts:silence`: This element can't contain text or any other elements.
@@ -259,36 +261,6 @@ As an example, you might want to know the time offset of each flower word in the
259
261
</speak>
260
262
```
261
263
262
-
## Viseme element
263
-
264
-
A viseme is the visual description of a phoneme in spoken language. It defines the position of the face and mouth while a person is speaking. You can use the `mstts:viseme` element in SSML to request viseme output. For more information, see [Get facial position with viseme](how-to-speech-synthesis-viseme.md).
265
-
266
-
The viseme setting is applied to all input text within its enclosing `voice` element. To reset or change the viseme setting again, you must use a new `voice` element with either the same voice or a different voice.
267
-
268
-
Usage of the `viseme` element's attributes are described in the following table.
269
-
270
-
| Attribute | Description | Required or optional |
271
-
| ---------- | ---------- | ---------- |
272
-
|`type`| The type of viseme output.<ul><li>`redlips_front` – lip-sync with viseme ID and audio offset output </li><li>`FacialExpression` – blend shapes output</li></ul> | Required |
273
-
274
-
> [!NOTE]
275
-
> Currently, `redlips_front` only supports neural voices in `en-US` locale, and `FacialExpression` supports neural voices in `en-US` and `zh-CN` locales.
276
-
277
-
### Viseme examples
278
-
279
-
The supported values for attributes of the `viseme` element were [described previously](#viseme-element).
280
-
281
-
This SSML snippet illustrates how to request blend shapes with your synthesized speech.
0 commit comments