Skip to content

Commit fcf4d69

Browse files
authored
Merge pull request #219324 from eric-urban/eur/ssml-refresh
SSML doc refresh
2 parents 039ce54 + 21176c3 commit fcf4d69

23 files changed

+1115
-1018
lines changed

articles/cognitive-services/Speech-Service/audio-processing-overview.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -44,7 +44,7 @@ The Speech SDK integrates Microsoft Audio Stack (MAS), allowing any application
4444
## Minimum requirements to use Microsoft Audio Stack
4545

4646
Microsoft Audio Stack can be used by any product or application that can meet the following requirements:
47-
* **Raw audio** - Microsoft Audio Stack requires raw (i.e., unprocessed) audio as input to yield the best results. Providing audio that is already processed limits the audio stack’s ability to perform enhancements at high quality.
47+
* **Raw audio** - Microsoft Audio Stack requires raw (unprocessed) audio as input to yield the best results. Providing audio that is already processed limits the audio stack’s ability to perform enhancements at high quality.
4848
* **Microphone geometries** - Geometry information about each microphone on the device is required to correctly perform all enhancements offered by the Microsoft Audio Stack. Information includes the number of microphones, their physical arrangement, and coordinates. Up to 16 input microphone channels are supported.
4949
* **Loopback or reference audio** - An audio channel that represents the audio being played out of the device is required to perform acoustic echo cancellation.
5050
* **Input format** - Microsoft Audio Stack supports down sampling for sample rates that are integral multiples of 16 kHz. A minimum sampling rate of 16 kHz is required. Additionally, the following formats are supported: 32-bit IEEE little endian float, 32-bit little endian signed int, 24-bit little endian signed int, 16-bit little endian signed int, and 8-bit signed int.

articles/cognitive-services/Speech-Service/batch-synthesis.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -359,11 +359,11 @@ Batch synthesis properties are described in the following table.
359359
|`properties.wordBoundaryEnabled`|Determines whether to generate word boundary data. This optional `bool` value ("true" or "false") is "false" by default.<br/><br/>If word boundary data is requested, then a corresponding `[nnnn].word.json` file will be included in the results data ZIP file.|
360360
|`status`|The batch synthesis processing status.<br/><br/>The status should progress from "NotStarted" to "Running", and finally to either "Succeeded" or "Failed".<br/><br/>This property is read-only.|
361361
|`synthesisConfig`|The configuration settings to use for batch synthesis of plain text.<br/><br/>This property is only applicable when `textType` is set to `"PlainText"`.|
362-
|`synthesisConfig.pitch`|The pitch of the audio output.<br/><br/>For information about the accepted values, see the [adjust prosody](speech-synthesis-markup.md#adjust-prosody) table in the Speech Synthesis Markup Language (SSML) documentation. Invalid values are ignored.<br/><br/>This optional property is only applicable when `textType` is set to `"PlainText"`.|
363-
|`synthesisConfig.rate`|The rate of the audio output.<br/><br/>For information about the accepted values, see the [adjust prosody](speech-synthesis-markup.md#adjust-prosody) table in the Speech Synthesis Markup Language (SSML) documentation. Invalid values are ignored.<br/><br/>This optional property is only applicable when `textType` is set to `"PlainText"`.|
362+
|`synthesisConfig.pitch`|The pitch of the audio output.<br/><br/>For information about the accepted values, see the [adjust prosody](speech-synthesis-markup-voice.md#adjust-prosody) table in the Speech Synthesis Markup Language (SSML) documentation. Invalid values are ignored.<br/><br/>This optional property is only applicable when `textType` is set to `"PlainText"`.|
363+
|`synthesisConfig.rate`|The rate of the audio output.<br/><br/>For information about the accepted values, see the [adjust prosody](speech-synthesis-markup-voice.md#adjust-prosody) table in the Speech Synthesis Markup Language (SSML) documentation. Invalid values are ignored.<br/><br/>This optional property is only applicable when `textType` is set to `"PlainText"`.|
364364
|`synthesisConfig.style`|For some voices, you can adjust the speaking style to express different emotions like cheerfulness, empathy, and calm. You can optimize the voice for different scenarios like customer service, newscast, and voice assistant.<br/><br/>For information about the available styles per voice, see [voice styles and roles](language-support.md?tabs=stt-tts#voice-styles-and-roles).<br/><br/>This optional property is only applicable when `textType` is set to `"PlainText"`.|
365365
|`synthesisConfig.voice`|The voice that speaks the audio output.<br/><br/>For information about the available prebuilt neural voices, see [language and voice support](language-support.md?tabs=stt-tts). To use a custom voice, you must specify a valid custom voice and deployment ID mapping in the `customVoices` property.<br/><br/>This property is required when `textType` is set to `"PlainText"`.|
366-
|`synthesisConfig.volume`|The volume of the audio output.<br/><br/>For information about the accepted values, see the [adjust prosody](speech-synthesis-markup.md#adjust-prosody) table in the Speech Synthesis Markup Language (SSML) documentation. Invalid values are ignored.<br/><br/>This optional property is only applicable when `textType` is set to `"PlainText"`.|
366+
|`synthesisConfig.volume`|The volume of the audio output.<br/><br/>For information about the accepted values, see the [adjust prosody](speech-synthesis-markup-voice.md#adjust-prosody) table in the Speech Synthesis Markup Language (SSML) documentation. Invalid values are ignored.<br/><br/>This optional property is only applicable when `textType` is set to `"PlainText"`.|
367367
|`textType`|Indicates whether the `inputs` text property should be plain text or SSML. The possible case-insensitive values are "PlainText" and "SSML". When the `textType` is set to `"PlainText"`, you must also set the `synthesisConfig` voice property.<br/><br/>This property is required.|
368368

369369
## HTTP status codes

articles/cognitive-services/Speech-Service/how-to-audio-content-creation.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -131,7 +131,7 @@ You can get your content into the Audio Content Creation tool in either of two w
131131
132132
```xml
133133
<speak xmlns="http://www.w3.org/2001/10/synthesis" xmlns:mstts="http://www.w3.org/2001/mstts" version="1.0" xml:lang="en-US">
134-
<voice name="Microsoft Server Speech Text to Speech Voice (en-US, JennyNeural)">
134+
<voice name="en-US-JennyNeural">
135135
Welcome to use Audio Content Creation <break time="10ms" />to customize audio output for your products.
136136
</voice>
137137
</speak>

articles/cognitive-services/Speech-Service/how-to-custom-commands-deploy-cicd.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -84,7 +84,7 @@ The scripts are hosted at [Cognitive Services Voice Assistant - Custom Commands]
8484
| SourceAppId | ID of the DEV application |
8585
| TargetAppId | ID of the PROD application |
8686
| SubscriptionKey | The key used for both applications |
87-
| Culture | Culture of the applications (i.e. en-us) |
87+
| Culture | Culture of the applications (en-us) |
8888

8989
> [!div class="mx-imgBorder"]
9090
> ![Send Activity payload](media/custom-commands/cicd-edit-pipeline-variables.png)
@@ -159,7 +159,7 @@ The scripts are hosted at [Cognitive Services Voice Assistant - Custom Commands]
159159
| ------- | --------------- | ----------- |
160160
| TargetAppId | ID of the PROD application |
161161
| SubscriptionKey | The key used for both applications |
162-
| Culture | Culture of the applications (i.e. en-us) |
162+
| Culture | Culture of the applications (en-us) |
163163

164164
1. Click "Run" and then click in the "Job" running.
165165
You should see a list of tasks running that contains: "Import app" & "Train and Publish app"

articles/cognitive-services/Speech-Service/how-to-custom-voice-create-voice.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -85,7 +85,7 @@ To create a custom neural voice in Speech Studio, follow these steps for one of
8585
1. Select the data that you want to use for training. Duplicate audio names will be removed from the training. Make sure the data you select don't contain the same audio names across multiple .zip files. Only successfully processed datasets can be selected for training. Check your data processing status if you do not see your training set in the list.
8686
1. Select **Next**.
8787
1. Optionally, you can add up to 10 custom speaking styles:
88-
1. Select **Add a custom style** and thoughtfully enter a custom style name of your choice. This name will be used by your application within the `style` element of [Speech Synthesis Markup Language (SSML)](speech-synthesis-markup.md#adjust-speaking-styles). You can also use the custom style name as SSML via the [Audio Content Creation](how-to-audio-content-creation.md) tool in [Speech Studio](https://speech.microsoft.com/portal/audiocontentcreation).
88+
1. Select **Add a custom style** and thoughtfully enter a custom style name of your choice. This name will be used by your application within the `style` element of [Speech Synthesis Markup Language (SSML)](speech-synthesis-markup-voice.md#speaking-styles-and-roles). You can also use the custom style name as SSML via the [Audio Content Creation](how-to-audio-content-creation.md) tool in [Speech Studio](https://speech.microsoft.com/portal/audiocontentcreation).
8989
1. Select style samples as training data. It's recommended that the style samples are all from the same voice talent profile.
9090
1. Select **Next**.
9191
1. Select a speaker file with the voice talent statement that corresponds to the speaker in your training data.

articles/cognitive-services/Speech-Service/how-to-deploy-and-use-endpoint.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -93,7 +93,7 @@ speech_config.speech_synthesis_voice_name = "YourCustomVoiceName"
9393
```
9494
::: zone-end
9595

96-
To use a custom neural voice via [Speech Synthesis Markup Language (SSML)](speech-synthesis-markup.md#choose-a-voice-for-text-to-speech), specify the model name as the voice name. This example uses the `YourCustomVoiceName` voice.
96+
To use a custom neural voice via [Speech Synthesis Markup Language (SSML)](speech-synthesis-markup-voice.md#voice-element), specify the model name as the voice name. This example uses the `YourCustomVoiceName` voice.
9797

9898
```xml
9999
<speak version="1.0" xmlns="http://www.w3.org/2001/10/synthesis" xml:lang="en-US">

articles/cognitive-services/Speech-Service/how-to-migrate-to-prebuilt-neural-voice.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -33,7 +33,7 @@ The prebuilt neural voice provides more natural sounding speech output, and thus
3333
> Even without an Azure account, you can listen to voice samples at this [Azure website](https://azure.microsoft.com/services/cognitive-services/text-to-speech/#overview) and determine the right voice for your business needs.
3434
3535
1. Review the [price](https://azure.microsoft.com/pricing/details/cognitive-services/speech-services/) structure and listen to the neural voice [samples](https://azure.microsoft.com/services/cognitive-services/text-to-speech/#overview) at the bottom of that page to determine the right voice for your business needs.
36-
2. To make the change, [follow the sample code](speech-synthesis-markup.md#choose-a-voice-for-text-to-speech) to update the voice name in your speech synthesis request to the supported neural voice names in chosen languages. Please use neural voices for your speech synthesis request, on cloud or on prem. For on-prem container, please use the [neural voice containers](../containers/container-image-tags.md) and follow the [instructions](speech-container-howto.md).
36+
2. To make the change, [follow the sample code](speech-synthesis-markup-voice.md#voice-element) to update the voice name in your speech synthesis request to the supported neural voice names in chosen languages. Use neural voices for your speech synthesis request, on cloud or on prem. For on-premises container, use the [neural voice containers](../containers/container-image-tags.md) and follow the [instructions](speech-container-howto.md).
3737

3838
## Standard voice details (deprecated)
3939

articles/cognitive-services/Speech-Service/how-to-speech-synthesis-viseme.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -95,7 +95,7 @@ The blend shapes JSON string is represented as a 2-dimensional matrix. Each row
9595
To get viseme with your synthesized speech, subscribe to the `VisemeReceived` event in the Speech SDK.
9696

9797
> [!NOTE]
98-
> To request SVG or blend shapes output, you should use the `mstts:viseme` element in SSML. For details, see [how to use viseme element in SSML](speech-synthesis-markup.md#viseme-element).
98+
> To request SVG or blend shapes output, you should use the `mstts:viseme` element in SSML. For details, see [how to use viseme element in SSML](speech-synthesis-markup-structure.md#viseme-element).
9999
100100
The following snippet shows how to subscribe to the viseme event:
101101

articles/cognitive-services/Speech-Service/includes/how-to/speech-synthesis/events.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@ ms.author: eur
1010

1111
| Event | Description | Use case |
1212
| --- | --- | --- |
13-
|`BookmarkReached`|Signals that a bookmark was reached. To trigger a bookmark reached event, a `bookmark` element is required in the [SSML](../../../speech-synthesis-markup.md#bookmark-element). This event reports the output audio's elapsed time between the beginning of synthesis and the `bookmark` element. The event's `Text` property is the string value that you set in the bookmark's `mark` attribute. The `bookmark` elements won't be spoken.|You can use the `bookmark` element to insert custom markers in SSML to get the offset of each marker in the audio stream. The `bookmark` element can be used to reference a specific location in the text or tag sequence.|
13+
|`BookmarkReached`|Signals that a bookmark was reached. To trigger a bookmark reached event, a `bookmark` element is required in the [SSML](../../../speech-synthesis-markup-structure.md#bookmark-element). This event reports the output audio's elapsed time between the beginning of synthesis and the `bookmark` element. The event's `Text` property is the string value that you set in the bookmark's `mark` attribute. The `bookmark` elements won't be spoken.|You can use the `bookmark` element to insert custom markers in SSML to get the offset of each marker in the audio stream. The `bookmark` element can be used to reference a specific location in the text or tag sequence.|
1414
|`SynthesisCanceled`|Signals that the speech synthesis was canceled.|You can confirm when synthesis has been canceled.|
1515
|`SynthesisCompleted`|Signals that speech synthesis has completed.|You can confirm when synthesis has completed.|
1616
|`SynthesisStarted`|Signals that speech synthesis has started.|You can confirm when synthesis has started.|

articles/cognitive-services/Speech-Service/includes/language-support/stt-tts.md

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -163,16 +163,16 @@ ms.author: eur
163163

164164
<sup>1</sup> The neural voice is available in public preview. Voices and styles in public preview are only available in three service [regions](../../regions.md): East US, West Europe, and Southeast Asia.
165165

166-
<sup>2</sup> The neural voice supports speaking styles to express emotions such as cheerfulness, empathy, and calm. You can optimize the voice for different scenarios like customer service, newscast, and voice assistant. For a list of styles that are supported per neural voice, see the [Voice styles and roles](../../language-support.md#voice-styles-and-roles) table. To learn how you can configure and adjust neural voice styles, see [Speech Synthesis Markup Language](../../speech-synthesis-markup.md#adjust-speaking-styles).
166+
<sup>2</sup> The neural voice supports speaking styles to express emotions such as cheerfulness, empathy, and calm. You can optimize the voice for different scenarios like customer service, newscast, and voice assistant. For a list of styles that are supported per neural voice, see the [Voice styles and roles](../../language-support.md#voice-styles-and-roles) table. To learn how you can configure and adjust neural voice styles, see [Speech Synthesis Markup Language](../../speech-synthesis-markup-voice.md#speaking-styles-and-roles).
167167

168-
<sup>3</sup> The neural voice supports role play. With roles, the same voice can act as a different age and gender. For a list of roles that are supported per neural voice, see the [Voice styles and roles](../../language-support.md#voice-styles-and-roles) table. To learn how you can configure and adjust neural voice roles, see [Speech Synthesis Markup Language](../../speech-synthesis-markup.md#adjust-speaking-styles).
168+
<sup>3</sup> The neural voice supports role play. With roles, the same voice can act as a different age and gender. For a list of roles that are supported per neural voice, see the [Voice styles and roles](../../language-support.md#voice-styles-and-roles) table. To learn how you can configure and adjust neural voice roles, see [Speech Synthesis Markup Language](../../speech-synthesis-markup-voice.md#speaking-styles-and-roles).
169169

170-
<sup>4</sup> Visemes are supported for the locale of the neural voice. However, SVG is only supported for neural voices in the `en-US` locale, and blend shapes is only supported for neural voices in the `en-US` and `zh-CN` locales. For more information, see [Get facial position with viseme](../../how-to-speech-synthesis-viseme.md) and [Viseme element](../../speech-synthesis-markup.md#viseme-element).
170+
<sup>4</sup> Visemes are supported for the locale of the neural voice. However, SVG is only supported for neural voices in the `en-US` locale, and blend shapes is only supported for neural voices in the `en-US` and `zh-CN` locales. For more information, see [Get facial position with viseme](../../how-to-speech-synthesis-viseme.md) and [Viseme element](../../speech-synthesis-markup-structure.md#viseme-element).
171171

172-
<sup>5</sup> Phonemes are supported for the locale of the neural voice. For more information, see [SSML phonetic alphabets](../../speech-ssml-phonetic-sets.md) and [Use phonemes to improve pronunciation](../../speech-synthesis-markup.md#use-phonemes-to-improve-pronunciation).
172+
<sup>5</sup> Phonemes are supported for the locale of the neural voice. For more information, see [SSML phonetic alphabets](../../speech-ssml-phonetic-sets.md) and [Use phonemes to improve pronunciation](../../speech-synthesis-markup-pronunciation.md#phoneme-element).
173173

174-
<sup>6</sup> Custom lexicon is supported for the locale of the neural voice. For more information, see [Use custom lexicon to improve pronunciation](../../speech-synthesis-markup.md#use-custom-lexicon-to-improve-pronunciation).
174+
<sup>6</sup> Custom lexicon is supported for the locale of the neural voice. For more information, see [Use custom lexicon to improve pronunciation](../../speech-synthesis-markup-pronunciation.md#custom-lexicon).
175175

176-
<sup>7</sup> For the multilingual voice the primary default locale is `en-US`. Additional locales are supported [using SSML](../../speech-synthesis-markup.md#adjust-speaking-languages).
176+
<sup>7</sup> For the multilingual voice the primary default locale is `en-US`. Additional locales are supported [using SSML](../../speech-synthesis-markup-voice.md#adjust-speaking-languages).
177177

178178
<sup>8</sup> The voice is a child's voice.

0 commit comments

Comments
 (0)