You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/cognitive-services/Speech-Service/custom-neural-voice.md
+38-6Lines changed: 38 additions & 6 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -8,17 +8,17 @@ manager: nitinme
8
8
ms.service: cognitive-services
9
9
ms.subservice: speech-service
10
10
ms.topic: conceptual
11
-
ms.date: 02/18/2022
11
+
ms.date: 08/01/2022
12
12
ms.author: eur
13
13
---
14
14
15
15
# What is Custom Neural Voice?
16
16
17
-
Custom Neural Voice is a text-to-speech feature that lets you create a one-of-a-kind, customized, synthetic voice for your applications. With Custom Neural Voice, you can build a highly natural-sounding voice by providing your audio samples as training data.
17
+
Custom Neural Voice is a text-to-speech feature that lets you create a one-of-a-kind, customized, synthetic voice for your applications. With Custom Neural Voice, you can build a highly natural-sounding voice by providing your audio samples as training data. If you're looking for ready-to-use options, check out our [text-to-speech](text-to-speech.md) service.
18
18
19
19
Based on the neural text-to-speech technology and the multilingual, multi-speaker, universal model, Custom Neural Voice lets you create synthetic voices that are rich in speaking styles, or adaptable cross languages. The realistic and natural sounding voice of Custom Neural Voice can represent brands, personify machines, and allow users to interact with applications conversationally. See the [supported languages](language-support.md#custom-neural-voice) for Custom Neural Voice.
20
20
21
-
> [!NOTE]
21
+
> [!IMPORTANT]
22
22
> Custom Neural Voice access is limited based on eligibility and usage criteria. Request access on the [intake form](https://aka.ms/customneural).
23
23
24
24
## The basics of Custom Neural Voice
@@ -37,9 +37,9 @@ You can adapt the neural text-to-speech engine to fit your needs. To create a cu
37
37
38
38
## Custom Neural Voice project types
39
39
40
-
Speech Studio provides two Custom Neural Voice (CNV) project types: CNV Pro and CNV Lite.
40
+
Speech Studio provides two Custom Neural Voice (CNV) project types: CNV Lite and CNV Pro.
41
41
42
-
The following table summarizes key differences between the CNV Pro and CNV Lite project types.
42
+
The following table summarizes key differences between the CNV Lite and CNV Pro project types.
43
43
44
44
|**Items**|**Lite (Preview)**|**Pro**|
45
45
|---------------|---------------|---------------|
@@ -83,6 +83,38 @@ Review these CNV Pro articles to learn more and get started.
83
83
| Persona | A persona describes who you want this voice to be. A good persona design will inform all voice creation. This might include choosing an available voice model already created, or starting from scratch by casting and recording a new voice talent.|
84
84
| Script | A script is a text file that contains the utterances to be spoken by your voice talent. (The term *utterances* encompasses both full sentences and shorter phrases.)|
85
85
86
+
## The process for creating a professional custom neural voice
87
+
88
+
Creating a great custom neural voice requires careful quality control in each step, from voice design and data preparation, to the deployment of the voice model to your system. The following sections discuss some key steps you'll take when you're creating a custom neural voice for your organization.
89
+
90
+
### Persona design
91
+
92
+
First, [design a persona](/record-custom-voice-samples.md#choose-your-voice-talent) of the voice that represents your brand by using a persona brief document. This document defines elements such as the features of the voice, and the character behind the voice. This helps to guide the process of creating a custom neural voice model, including defining the scripts, selecting your voice talent, training, and voice tuning.
93
+
94
+
### Script selection
95
+
96
+
Carefully [select the recording script](/record-custom-voice-samples.md#script-selection-criteria) to represent the user scenarios for your voice. For example, you can use the phrases from bot conversations as your recording script if you're creating a customer service bot. Include different sentence types in your scripts, including statements, questions, and exclamations.
97
+
98
+
### Preparing training data
99
+
100
+
It's a good idea to capture the audio recordings in a professional quality recording studio to achieve a high signal-to-noise ratio. The quality of the voice model depends heavily on your training data. Consistent volume, speaking rate, pitch, and consistency in expressive mannerisms of speech are required.
101
+
102
+
After the recordings are ready, [prepare the training data](how-to-custom-voice-prepare-data.md) in the right format.
103
+
104
+
### Training
105
+
106
+
After you've prepared the training data, go to [Speech Studio](https://aka.ms/speechstudio/customvoice) to create your custom neural voice. Select at least 300 utterances to create a custom neural voice. A series of data quality checks are automatically performed when you upload them. To build high-quality voice models, you should fix any errors and submit again.
107
+
108
+
### Testing
109
+
110
+
Prepare test scripts for your voice model that cover the different use cases for your apps. It’s a good idea to use scripts within and outside the training dataset, so you can test the quality more broadly for different content.
111
+
112
+
### Tuning and adjustment
113
+
114
+
The style and the characteristics of the trained voice model depend on the style and the quality of the recordings from the voice talent used for training. However, you can make several adjustments by using [SSML (Speech Synthesis Markup Language)](./speech-synthesis-markup.md?tabs=csharp) when you make the API calls to your voice model to generate synthetic speech.
115
+
116
+
SSML is the markup language used to communicate with the text-to-speech service to convert text into audio. The adjustments you can make include change of pitch, rate, intonation, and pronunciation correction. If the voice model is built with multiple styles, you can also use SSML to switch the styles.
117
+
86
118
## Responsible use of AI
87
119
88
120
To learn how to use Custom Neural Voice responsibly, check the following articles.
@@ -100,4 +132,4 @@ To learn how to use Custom Neural Voice responsibly, check the following article
100
132
## Next steps
101
133
102
134
> [!div class="nextstepaction"]
103
-
> [Get started with Custom Neural Voice](how-to-custom-voice.md)
Copy file name to clipboardExpand all lines: articles/cognitive-services/Speech-Service/how-to-custom-voice-create-voice.md
+7-24Lines changed: 7 additions & 24 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -8,7 +8,7 @@ manager: nitinme
8
8
ms.service: cognitive-services
9
9
ms.subservice: speech-service
10
10
ms.topic: conceptual
11
-
ms.date: 02/18/2022
11
+
ms.date: 08/01/2022
12
12
ms.author: eur
13
13
ms.custom: references_regions
14
14
---
@@ -20,11 +20,6 @@ In [Prepare training data](how-to-custom-voice-prepare-data.md), you learned abo
20
20
> [!NOTE]
21
21
> See [Custom Neural Voice project types](custom-neural-voice.md#custom-neural-voice-project-types) for information about capabilities, requirements, and differences between Custom Neural Voice Pro and Custom Neural Voice Lite projects. This article focuses on the creation of a professional Custom Neural Voice using the Pro project.
22
22
23
-
## Prerequisites
24
-
25
-
*[Create a custom voice project](how-to-custom-voice.md)
26
-
*[Prepare training data](how-to-custom-voice-prepare-data.md)
27
-
28
23
## Set up voice talent
29
24
30
25
A *voice talent* is an individual or target speaker whose voices are recorded and used to create neural voice models. Before you create a voice, define your voice persona and select a right voice talent. For details on recording voice samples, see [the tutorial](record-custom-voice-samples.md).
@@ -34,25 +29,16 @@ To train a neural voice, you must create a voice talent profile with an audio fi
34
29
Upload this audio file to the Speech Studio as shown in the following screenshot. You create a voice talent profile, which is used to verify against your training data when you create a voice model. For more information, see [voice talent verification](/legal/cognitive-services/speech-service/custom-neural-voice/data-privacy-security-custom-neural-voice?context=%2fazure%2fcognitive-services%2fspeech-service%2fcontext%2fcontext).
35
30
36
31
:::image type="content" source="media/custom-voice/upload-verbal-statement.png" alt-text="Screenshot that shows the upload voice talent statement.":::
37
-
38
-
> [!NOTE]
39
-
> Custom Neural Voice is available with limited access. Make sure you understand the [responsible AI requirements](/legal/cognitive-services/speech-service/custom-neural-voice/limited-access-custom-neural-voice?context=%2fazure%2fcognitive-services%2fspeech-service%2fcontext%2fcontext), and then [apply for access](https://aka.ms/customneural).
40
32
41
33
The following steps assume that you've prepared the voice talent verbal consent files. Go to [Speech Studio](https://aka.ms/custom-voice-portal) to select a Custom Neural Voice project, and then follow these steps to create a voice talent profile.
42
34
43
35
1. Go to **Text-to-Speech** > **Custom Voice** > **select a project**, and select **Set up voice talent**.
44
36
45
37
1. Select **Add voice talent**.
46
38
47
-
1. Next, to define voice characteristics, select **Target scenario**. Then describe your **Voice characteristics**.
48
-
49
-
>[!NOTE]
50
-
>The scenarios you provide must be consistent with what you've applied for in the application form.
51
-
52
-
1. Then, go to **Upload voice talent statement**, and follow the instruction to upload the voice talent statement you've prepared beforehand.
39
+
1. Next, to define voice characteristics, select **Target scenario**. Then describe your **Voice characteristics**. The scenarios you provide must be consistent with what you've applied for in the application form.
53
40
54
-
>[!NOTE]
55
-
>Make sure the verbal statement is recorded in the same settings as your training data, including the recording environment and speaking style.
41
+
1. Go to **Upload voice talent statement**, and follow the instruction to upload the voice talent statement you've prepared beforehand. Make sure the verbal statement is recorded in the same settings as your training data, including the recording environment and speaking style.
56
42
57
43
1. Go to **Review and create**, review the settings, and select **Submit**.
58
44
@@ -72,13 +58,12 @@ You can do the following to create and review your training data:
72
58
73
59
> [!NOTE]
74
60
>- Duplicate audio names are removed from the training. Make sure the data you select don't contain the same audio names within the .zip file or across multiple .zip files. If utterance IDs (either in audio or script files) are duplicates, they're rejected.
75
-
>- If you've created data files in the previous version of Speech Studio, you must specify a training set for your data in advance to use them. If you haven't, an exclamation mark is appended to the data name, and the data can't be used.
76
61
77
62
All data you upload must meet the requirements for the data type that you choose. It's important to correctly format your data before it's uploaded, which ensures the data will be accurately processed by the Speech service. Go to [Prepare training data](how-to-custom-voice-prepare-data.md), and confirm that your data is correctly formatted.
78
63
79
64
> [!NOTE]
80
65
> - Standard subscription (S0) users can upload five data files simultaneously. If you reach the limit, wait until at least one of your data files finishes importing. Then try again.
81
-
> - The maximum number of data files allowed to be imported per subscription is 500 .zip files for standard subscription (S0) users.
66
+
> - The maximum number of data files allowed to be imported per subscription is 500 .zip files for standard subscription (S0) users. Please see out [Speech service quotas and limits](speech-services-quotas-and-limits.md#custom-neural-voice) for more details.
82
67
83
68
Data files are automatically validated when you select **Submit**. Data validation includes series of checks on the audio files to verify their file format, size, and sampling rate. If there are any errors, fix them and submit again.
84
69
@@ -191,7 +176,7 @@ After you validate your data files, you can use them to build your Custom Neural
191
176
192
177
>[!NOTE]
193
178
>- To create a custom neural voice, select at least 300 utterances.
194
-
>- To train a neural voice, you must specify a voice talent profile. This profile must provide the audio consent file of the voice talent, acknowledging to use his or her speech data to train a custom neural voice model. Custom Neural Voice is available with limited access. Make sure you understand the [responsible AI requirements](/legal/cognitive-services/speech-service/custom-neural-voice/limited-access-custom-neural-voice?context=%2fazure%2fcognitive-services%2fspeech-service%2fcontext%2fcontext) and [apply the access](https://aka.ms/customneural).
179
+
>- To train a neural voice, you must specify a voice talent profile. This profile must provide the audio consent file of the voice talent, acknowledging to use their speech data to train a custom neural voice model.
195
180
196
181
1. Choose your test script. Each training generates 100 sample audio files automatically, to help you test the model with a default script. You can also provide your own test script, including up to 100 utterances. The test script must exclude the filenames (the ID of each utterance). Otherwise, these IDs are spoken. Here's an example of how the utterances are organized in one .txt file:
197
182
@@ -213,8 +198,7 @@ After you validate your data files, you can use them to build your Custom Neural
213
198
214
199
1. Review the settings, then select **Submit** to start training the model.
215
200
216
-
> [!NOTE]
217
-
> Duplicate audio names will be removed from the training. Make sure the data you select don't contain the same audio names across multiple .zip files.
201
+
Duplicate audio names will be removed from the training. Make sure the data you select don't contain the same audio names across multiple .zip files.
218
202
219
203
The **Train model** table displays a new entry that corresponds to this newly created model.
220
204
@@ -311,5 +295,4 @@ For more information, [learn more about the capabilities and limits of this feat
311
295
-[Deploy and use your voice model](how-to-deploy-and-use-endpoint.md)
312
296
-[How to record voice samples](record-custom-voice-samples.md)
313
297
-[Text-to-Speech API reference](rest-text-to-speech.md)
0 commit comments