Skip to content

Commit 1e554da

Browse files
Merge pull request #217676 from eric-urban/eur/tts-cnv-updates
promote learn more about voice styles
2 parents fa11722 + 9552cc6 commit 1e554da

File tree

11 files changed

+36
-17
lines changed

11 files changed

+36
-17
lines changed

articles/cognitive-services/Speech-Service/how-to-custom-voice-create-voice.md

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ manager: nitinme
88
ms.service: cognitive-services
99
ms.subservice: speech-service
1010
ms.topic: how-to
11-
ms.date: 10/27/2022
11+
ms.date: 11/10/2022
1212
ms.author: eur
1313
ms.custom: references_regions
1414
---
@@ -49,7 +49,7 @@ To create a custom neural voice in Speech Studio, follow these steps for one of
4949
1. Select the data that you want to use for training. Duplicate audio names will be removed from the training. Make sure the data you select don't contain the same audio names across multiple .zip files. Only successfully processed datasets can be selected for training. Check your data processing status if you do not see your training set in the list.
5050
1. Select a speaker file with the voice talent statement that corresponds to the speaker in your training data.
5151
1. Select **Next**.
52-
1. Optionally, you can check the box next to **Add my own test script** and select test scripts to upload. Each training generates 100 sample audio files automatically, to help you test the model with a default script. You can also provide your own test script with up to 100 utterances. The generated audio files are a combination of the automatic test scripts and custom test scripts. For more information, see [test script requirements](#test-script-requirements).
52+
1. Optionally, you can check the box next to **Add my own test script** and select test scripts to upload. Each training generates 100 sample audio files automatically, to help you test the model with a default script. You can also provide your own test script with up to 100 utterances for the default style. The generated audio files are a combination of the automatic test scripts and custom test scripts. For more information, see [test script requirements](#test-script-requirements).
5353
1. Enter a **Name** and **Description** to help you identify the model. Choose a name carefully. The model name will be used as the voice name in your [speech synthesis request](how-to-deploy-and-use-endpoint.md#use-your-custom-voice) via the SDK and SSML input. Only letters, numbers, and a few punctuation characters are allowed. Use different names for different neural voice models.
5454
1. Optionally, enter the **Description** to help you identify the model. A common use of the description is to record the names of the data that you used to create the model.
5555
1. Select **Next**.
@@ -82,7 +82,9 @@ To create a custom neural voice in Speech Studio, follow these steps for one of
8282
1. Select one or more preset speaking styles to train.
8383
1. Select the data that you want to use for training. Duplicate audio names will be removed from the training. Make sure the data you select don't contain the same audio names across multiple .zip files. Only successfully processed datasets can be selected for training. Check your data processing status if you do not see your training set in the list.
8484
1. Select **Next**.
85-
1. Optionally, you can add up to 10 custom speaking styles. Select **Add a custom style** and enter a custom style name of your choice. Select style samples as training data.
85+
1. Optionally, you can add up to 10 custom speaking styles:
86+
1. Select **Add a custom style** and thoughtfully enter a custom style name of your choice. This name will be used by your application within the `style` element of [Speech Synthesis Markup Language (SSML)](speech-synthesis-markup.md#adjust-speaking-styles). You can also use the custom style name as SSML via the [Audio Content Creation](how-to-audio-content-creation.md) tool in [Speech Studio](https://speech.microsoft.com/portal/audiocontentcreation).
87+
1. Select style samples as training data.
8688
1. Select **Next**.
8789
1. Select a speaker file with the voice talent statement that corresponds to the speaker in your training data.
8890
1. Select **Next**.

articles/cognitive-services/Speech-Service/includes/quickstarts/text-to-speech-basics/cli.md

Lines changed: 14 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -27,7 +27,7 @@ ms.author: eur
2727
Run the following command for speech synthesis to the default speaker output. You can modify the text to be synthesized and the voice.
2828

2929
```console
30-
spx synthesize --text "I'm excited to try text to speech" --voice "en-US-JennyNeural"
30+
spx synthesize --text "I'm excited to try text to speech" --voice "en-US-JennyNeural"
3131
```
3232

3333
> [!div class="nextstepaction"]
@@ -37,7 +37,19 @@ Run the following command for speech synthesis to the default speaker output. Yo
3737
> There is a known issue on Windows 11 that might affect some types of Secure Sockets Layer (SSL) and Transport Layer Security (TLS) connections. For more information, see the [troubleshooting guide](/azure/cognitive-services/speech-service/troubleshooting#connection-closed-or-timeout).
3838
3939
If you don't set a voice name, the default voice for `en-US` will speak. All neural voices are multilingual and fluent in their own language and English. For example, if the input text in English is "I'm excited to try text to speech" and you set `--voice "es-ES-ElviraNeural"`, the text is spoken in English with a Spanish accent. If the voice does not speak the language of the input text, the Speech service won't output synthesized audio.
40-
40+
41+
## Remarks
42+
43+
Now that you've completed the quickstart, here are some additional considerations:
44+
45+
You can have finer control over voice styles, prosody, and other settings by using [Speech Synthesis Markup Language (SSML)](~/articles/cognitive-services/speech-service/speech-synthesis-markup.md).
46+
47+
In the following example, the voice and style ('excited') are provided in the SSML block.
48+
49+
```console
50+
spx synthesize --ssml "<speak version='1.0' xmlns='http://www.w3.org/2001/10/synthesis' xmlns:mstts='https://www.w3.org/2001/mstts' xml:lang='en-US'><voice name='en-US-JennyNeural'><mstts:express-as style='excited'>I'm excited to try text to speech</mstts:express-as></voice></speak>"
51+
```
52+
4153
Run this command for information about additional speech synthesis options such as file input and output:
4254
```console
4355
spx help synthesize

articles/cognitive-services/Speech-Service/includes/quickstarts/text-to-speech-basics/cpp.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -138,7 +138,7 @@ I'm excited to try text to speech
138138
Now that you've completed the quickstart, here are some additional considerations:
139139

140140
This quickstart uses the `SpeakTextAsync` operation to synthesize a short block of text that you enter. You can also get text from files as described in these guides:
141-
- For information about speech synthesis from a file, see [How to synthesize speech](~/articles/cognitive-services/speech-service/how-to-speech-synthesis.md) and [Improve synthesis with Speech Synthesis Markup Language (SSML)](~/articles/cognitive-services/speech-service/speech-synthesis-markup.md).
141+
- For information about speech synthesis from a file and finer control over voice styles, prosody, and other settings, see [How to synthesize speech](~/articles/cognitive-services/speech-service/how-to-speech-synthesis.md) and [Improve synthesis with Speech Synthesis Markup Language (SSML)](~/articles/cognitive-services/speech-service/speech-synthesis-markup.md).
142142
- For information about batch synthesis, see [Synthesize long-form text to speech](~/articles/cognitive-services/speech-service/long-audio-api.md).
143143

144144
## Clean up resources

articles/cognitive-services/Speech-Service/includes/quickstarts/text-to-speech-basics/csharp.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -128,7 +128,7 @@ I'm excited to try text to speech
128128
Now that you've completed the quickstart, here are some additional considerations:
129129

130130
This quickstart uses the `SpeakTextAsync` operation to synthesize a short block of text that you enter. You can also get text from files as described in these guides:
131-
- For information about speech synthesis from a file, see [How to synthesize speech](~/articles/cognitive-services/speech-service/how-to-speech-synthesis.md) and [Improve synthesis with Speech Synthesis Markup Language (SSML)](~/articles/cognitive-services/speech-service/speech-synthesis-markup.md).
131+
- For information about speech synthesis from a file and finer control over voice styles, prosody, and other settings, see [How to synthesize speech](~/articles/cognitive-services/speech-service/how-to-speech-synthesis.md) and [Improve synthesis with Speech Synthesis Markup Language (SSML)](~/articles/cognitive-services/speech-service/speech-synthesis-markup.md).
132132
- For information about batch synthesis, see [Synthesize long-form text to speech](~/articles/cognitive-services/speech-service/long-audio-api.md).
133133

134134
## Clean up resources

articles/cognitive-services/Speech-Service/includes/quickstarts/text-to-speech-basics/java.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -154,7 +154,7 @@ I'm excited to try text to speech
154154
Now that you've completed the quickstart, here are some additional considerations:
155155

156156
This quickstart uses the `SpeakTextAsync` operation to synthesize a short block of text that you enter. You can also get text from files as described in these guides:
157-
- For information about speech synthesis from a file, see [How to synthesize speech](~/articles/cognitive-services/speech-service/how-to-speech-synthesis.md) and [Improve synthesis with Speech Synthesis Markup Language (SSML)](~/articles/cognitive-services/speech-service/speech-synthesis-markup.md).
157+
- For information about speech synthesis from a file and finer control over voice styles, prosody, and other settings, see [How to synthesize speech](~/articles/cognitive-services/speech-service/how-to-speech-synthesis.md) and [Improve synthesis with Speech Synthesis Markup Language (SSML)](~/articles/cognitive-services/speech-service/speech-synthesis-markup.md).
158158
- For information about batch synthesis, see [Synthesize long-form text to speech](~/articles/cognitive-services/speech-service/long-audio-api.md).
159159

160160
## Clean up resources

articles/cognitive-services/Speech-Service/includes/quickstarts/text-to-speech-basics/javascript.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -119,7 +119,7 @@ synthesis finished.
119119
Now that you've completed the quickstart, here are some additional considerations:
120120

121121
This quickstart uses the `SpeakTextAsync` operation to synthesize a short block of text that you enter. You can also get text from files as described in these guides:
122-
- For information about speech synthesis from a file, see [How to synthesize speech](~/articles/cognitive-services/speech-service/how-to-speech-synthesis.md) and [Improve synthesis with Speech Synthesis Markup Language (SSML)](~/articles/cognitive-services/speech-service/speech-synthesis-markup.md).
122+
- For information about speech synthesis from a file and finer control over voice styles, prosody, and other settings, see [How to synthesize speech](~/articles/cognitive-services/speech-service/how-to-speech-synthesis.md) and [Improve synthesis with Speech Synthesis Markup Language (SSML)](~/articles/cognitive-services/speech-service/speech-synthesis-markup.md).
123123
- For information about batch synthesis, see [Synthesize long-form text to speech](~/articles/cognitive-services/speech-service/long-audio-api.md).
124124

125125
## Clean up resources

articles/cognitive-services/Speech-Service/includes/quickstarts/text-to-speech-basics/objectivec.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -93,7 +93,7 @@ After you input some text and select the button in the app, you should hear the
9393
Now that you've completed the quickstart, here are some additional considerations:
9494
9595
This quickstart uses the `SpeakText` operation to synthesize a short block of text that you enter. You can also get text from files as described in these guides:
96-
- For information about speech synthesis from a file, see [How to synthesize speech](~/articles/cognitive-services/speech-service/how-to-speech-synthesis.md) and [Improve synthesis with Speech Synthesis Markup Language (SSML)](~/articles/cognitive-services/speech-service/speech-synthesis-markup.md).
96+
- For information about speech synthesis from a file and finer control over voice styles, prosody, and other settings, see [How to synthesize speech](~/articles/cognitive-services/speech-service/how-to-speech-synthesis.md) and [Improve synthesis with Speech Synthesis Markup Language (SSML)](~/articles/cognitive-services/speech-service/speech-synthesis-markup.md).
9797
- For information about batch synthesis, see [Synthesize long-form text to speech](~/articles/cognitive-services/speech-service/long-audio-api.md).
9898
9999
## Clean up resources

articles/cognitive-services/Speech-Service/includes/quickstarts/text-to-speech-basics/python.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -101,7 +101,7 @@ I'm excited to try text to speech
101101
Now that you've completed the quickstart, here are some additional considerations:
102102

103103
This quickstart uses the `speak_text_async` operation to synthesize a short block of text that you enter. You can also get text from files as described in these guides:
104-
- For information about speech synthesis from a file, see [How to synthesize speech](~/articles/cognitive-services/speech-service/how-to-speech-synthesis.md) and [Improve synthesis with Speech Synthesis Markup Language (SSML)](~/articles/cognitive-services/speech-service/speech-synthesis-markup.md).
104+
- For information about speech synthesis from a file and finer control over voice styles, prosody, and other settings, see [How to synthesize speech](~/articles/cognitive-services/speech-service/how-to-speech-synthesis.md) and [Improve synthesis with Speech Synthesis Markup Language (SSML)](~/articles/cognitive-services/speech-service/speech-synthesis-markup.md).
105105
- For information about batch synthesis, see [Synthesize long-form text to speech](~/articles/cognitive-services/speech-service/long-audio-api.md).
106106

107107
## Clean up resources

articles/cognitive-services/Speech-Service/includes/quickstarts/text-to-speech-basics/swift.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -141,7 +141,7 @@ After you input some text and select the button in the app, you should hear the
141141
Now that you've completed the quickstart, here are some additional considerations:
142142

143143
This quickstart uses the `SpeakText` operation to synthesize a short block of text that you enter. You can also get text from files as described in these guides:
144-
- For information about speech synthesis from a file, see [How to synthesize speech](~/articles/cognitive-services/speech-service/how-to-speech-synthesis.md) and [Improve synthesis with Speech Synthesis Markup Language (SSML)](~/articles/cognitive-services/speech-service/speech-synthesis-markup.md).
144+
- For information about speech synthesis from a file and finer control over voice styles, prosody, and other settings, see [How to synthesize speech](~/articles/cognitive-services/speech-service/how-to-speech-synthesis.md) and [Improve synthesis with Speech Synthesis Markup Language (SSML)](~/articles/cognitive-services/speech-service/speech-synthesis-markup.md).
145145
- For information about batch synthesis, see [Synthesize long-form text to speech](~/articles/cognitive-services/speech-service/long-audio-api.md).
146146

147147
## Clean up resources

articles/cognitive-services/Speech-Service/speech-synthesis-markup.md

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -125,15 +125,17 @@ Styles, style degree, and roles are supported for a subset of neural voices. If
125125

126126
| Attribute | Description | Required or optional |
127127
| ---------- | ---------- | -------------------- |
128-
| `style` | Specifies the speaking style. Speaking styles are voice specific. | Required if adjusting the speaking style for a neural voice. If you're using `mstts:express-as`, the style must be provided. If an invalid value is provided, this element is ignored. |
128+
| `style` | Specifies the [prebuilt](language-support.md?tabs=stt-tts#voice-styles-and-roles) or [custom](how-to-custom-voice-create-voice.md?tabs=multistyle#train-your-custom-neural-voice-model) speaking style. Speaking styles are voice specific. | Required if adjusting the speaking style for a neural voice. If you're using `mstts:express-as`, the style must be provided. If an invalid value is provided, this element is ignored.|
129129
| `styledegree` | Specifies the intensity of the speaking style. **Accepted values**: 0.01 to 2 inclusive. The default value is 1, which means the predefined style intensity. The minimum unit is 0.01, which results in a slight tendency for the target style. A value of 2 results in a doubling of the default style intensity. | Optional. If you don't set the `style` attribute, the `styledegree` attribute is ignored. Speaking style degree adjustments are supported for Chinese (Mandarin, Simplified) neural voices.|
130130
| `role`| Specifies the speaking role-play. The voice acts as a different age and gender, but the voice name isn't changed. | Optional. Role adjustments are supported for these Chinese (Mandarin, Simplified) neural voices: `zh-CN-XiaomoNeural`, `zh-CN-XiaoxuanNeural`, `zh-CN-YunxiNeural`, and `zh-CN-YunyeNeural`. |
131131

132132
### Style
133133

134134
You use the `mstts:express-as` element to express emotions like cheerfulness, empathy, and calm. You can also optimize the voice for different scenarios like customer service, newscast, and voice assistant.
135135

136-
For a list of supported styles per neural voice, see [supported voice styles and roles](language-support.md?tabs=stt-tts#voice-styles-and-roles).
136+
For a list of supported styles for prebuilt neural voices, see [supported voice styles and roles](language-support.md?tabs=stt-tts#voice-styles-and-roles).
137+
138+
To use your [custom style](how-to-custom-voice-create-voice.md?tabs=multistyle#train-your-custom-neural-voice-model), specify the style name that you entered in Speech Studio.
137139

138140
**Syntax**
139141

@@ -928,7 +930,7 @@ All elements from the [MathML 2.0](https://www.w3.org/TR/MathML2/) and [MathML 3
928930
> [!NOTE]
929931
> If an element is not recognized, it will be ignored, and the child elements within it will still be processed.
930932
931-
The MathML entities are not supported by XML syntax, so you must use the their corresponding [unicode characters](https://www.w3.org/2003/entities/2007/htmlmathml.json) to represent the entities, for example, the entity `&copy;` should be represented by its unicode characters `&#x00A9;`, otherwise an error will occur.
933+
The MathML entities are not supported by XML syntax, so you must use the corresponding [unicode characters](https://www.w3.org/2003/entities/2007/htmlmathml.json) to represent the entities, for example, the entity `&copy;` should be represented by its unicode characters `&#x00A9;`, otherwise an error will occur.
932934

933935
## Viseme element
934936

0 commit comments

Comments
 (0)