Skip to content

Commit b1608c1

Browse files
authored
Merge pull request #203283 from jboback/CNVrefresh
CNV Refresh
2 parents 12b8bfc + 62d1bc7 commit b1608c1

File tree

7 files changed

+119
-139
lines changed

7 files changed

+119
-139
lines changed

articles/cognitive-services/Speech-Service/custom-neural-voice.md

Lines changed: 38 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -8,17 +8,17 @@ manager: nitinme
88
ms.service: cognitive-services
99
ms.subservice: speech-service
1010
ms.topic: conceptual
11-
ms.date: 02/18/2022
11+
ms.date: 08/01/2022
1212
ms.author: eur
1313
---
1414

1515
# What is Custom Neural Voice?
1616

17-
Custom Neural Voice is a text-to-speech feature that lets you create a one-of-a-kind, customized, synthetic voice for your applications. With Custom Neural Voice, you can build a highly natural-sounding voice by providing your audio samples as training data.
17+
Custom Neural Voice is a text-to-speech feature that lets you create a one-of-a-kind, customized, synthetic voice for your applications. With Custom Neural Voice, you can build a highly natural-sounding voice by providing your audio samples as training data. If you're looking for ready-to-use options, check out our [text-to-speech](text-to-speech.md) service.
1818

1919
Based on the neural text-to-speech technology and the multilingual, multi-speaker, universal model, Custom Neural Voice lets you create synthetic voices that are rich in speaking styles, or adaptable cross languages. The realistic and natural sounding voice of Custom Neural Voice can represent brands, personify machines, and allow users to interact with applications conversationally. See the [supported languages](language-support.md#custom-neural-voice) for Custom Neural Voice.
2020

21-
> [!NOTE]
21+
> [!IMPORTANT]
2222
> Custom Neural Voice access is limited based on eligibility and usage criteria. Request access on the [intake form](https://aka.ms/customneural).
2323
2424
## The basics of Custom Neural Voice
@@ -37,9 +37,9 @@ You can adapt the neural text-to-speech engine to fit your needs. To create a cu
3737

3838
## Custom Neural Voice project types
3939

40-
Speech Studio provides two Custom Neural Voice (CNV) project types: CNV Pro and CNV Lite.
40+
Speech Studio provides two Custom Neural Voice (CNV) project types: CNV Lite and CNV Pro.
4141

42-
The following table summarizes key differences between the CNV Pro and CNV Lite project types.
42+
The following table summarizes key differences between the CNV Lite and CNV Pro project types.
4343

4444
|**Items**|**Lite (Preview)**| **Pro**|
4545
|---------------|---------------|---------------|
@@ -83,6 +83,38 @@ Review these CNV Pro articles to learn more and get started.
8383
| Persona | A persona describes who you want this voice to be. A good persona design will inform all voice creation. This might include choosing an available voice model already created, or starting from scratch by casting and recording a new voice talent.|
8484
| Script | A script is a text file that contains the utterances to be spoken by your voice talent. (The term *utterances* encompasses both full sentences and shorter phrases.)|
8585

86+
## The process for creating a professional custom neural voice
87+
88+
Creating a great custom neural voice requires careful quality control in each step, from voice design and data preparation, to the deployment of the voice model to your system. The following sections discuss some key steps you'll take when you're creating a custom neural voice for your organization.
89+
90+
### Persona design
91+
92+
First, [design a persona](/record-custom-voice-samples.md#choose-your-voice-talent) of the voice that represents your brand by using a persona brief document. This document defines elements such as the features of the voice, and the character behind the voice. This helps to guide the process of creating a custom neural voice model, including defining the scripts, selecting your voice talent, training, and voice tuning.
93+
94+
### Script selection
95+
96+
Carefully [select the recording script](/record-custom-voice-samples.md#script-selection-criteria) to represent the user scenarios for your voice. For example, you can use the phrases from bot conversations as your recording script if you're creating a customer service bot. Include different sentence types in your scripts, including statements, questions, and exclamations.
97+
98+
### Preparing training data
99+
100+
It's a good idea to capture the audio recordings in a professional quality recording studio to achieve a high signal-to-noise ratio. The quality of the voice model depends heavily on your training data. Consistent volume, speaking rate, pitch, and consistency in expressive mannerisms of speech are required.
101+
102+
After the recordings are ready, [prepare the training data](how-to-custom-voice-prepare-data.md) in the right format.
103+
104+
### Training
105+
106+
After you've prepared the training data, go to [Speech Studio](https://aka.ms/speechstudio/customvoice) to create your custom neural voice. Select at least 300 utterances to create a custom neural voice. A series of data quality checks are automatically performed when you upload them. To build high-quality voice models, you should fix any errors and submit again.
107+
108+
### Testing
109+
110+
Prepare test scripts for your voice model that cover the different use cases for your apps. It’s a good idea to use scripts within and outside the training dataset, so you can test the quality more broadly for different content.
111+
112+
### Tuning and adjustment
113+
114+
The style and the characteristics of the trained voice model depend on the style and the quality of the recordings from the voice talent used for training. However, you can make several adjustments by using [SSML (Speech Synthesis Markup Language)](./speech-synthesis-markup.md?tabs=csharp) when you make the API calls to your voice model to generate synthetic speech.
115+
116+
SSML is the markup language used to communicate with the text-to-speech service to convert text into audio. The adjustments you can make include change of pitch, rate, intonation, and pronunciation correction. If the voice model is built with multiple styles, you can also use SSML to switch the styles.
117+
86118
## Responsible use of AI
87119

88120
To learn how to use Custom Neural Voice responsibly, check the following articles.
@@ -100,4 +132,4 @@ To learn how to use Custom Neural Voice responsibly, check the following article
100132
## Next steps
101133

102134
> [!div class="nextstepaction"]
103-
> [Get started with Custom Neural Voice](how-to-custom-voice.md)
135+
> [Create a Project](how-to-custom-voice.md)

articles/cognitive-services/Speech-Service/how-to-custom-voice-create-voice.md

Lines changed: 7 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ manager: nitinme
88
ms.service: cognitive-services
99
ms.subservice: speech-service
1010
ms.topic: conceptual
11-
ms.date: 02/18/2022
11+
ms.date: 08/01/2022
1212
ms.author: eur
1313
ms.custom: references_regions
1414
---
@@ -20,11 +20,6 @@ In [Prepare training data](how-to-custom-voice-prepare-data.md), you learned abo
2020
> [!NOTE]
2121
> See [Custom Neural Voice project types](custom-neural-voice.md#custom-neural-voice-project-types) for information about capabilities, requirements, and differences between Custom Neural Voice Pro and Custom Neural Voice Lite projects. This article focuses on the creation of a professional Custom Neural Voice using the Pro project.
2222
23-
## Prerequisites
24-
25-
* [Create a custom voice project](how-to-custom-voice.md)
26-
* [Prepare training data](how-to-custom-voice-prepare-data.md)
27-
2823
## Set up voice talent
2924

3025
A *voice talent* is an individual or target speaker whose voices are recorded and used to create neural voice models. Before you create a voice, define your voice persona and select a right voice talent. For details on recording voice samples, see [the tutorial](record-custom-voice-samples.md).
@@ -34,25 +29,16 @@ To train a neural voice, you must create a voice talent profile with an audio fi
3429
Upload this audio file to the Speech Studio as shown in the following screenshot. You create a voice talent profile, which is used to verify against your training data when you create a voice model. For more information, see [voice talent verification](/legal/cognitive-services/speech-service/custom-neural-voice/data-privacy-security-custom-neural-voice?context=%2fazure%2fcognitive-services%2fspeech-service%2fcontext%2fcontext).
3530

3631
:::image type="content" source="media/custom-voice/upload-verbal-statement.png" alt-text="Screenshot that shows the upload voice talent statement.":::
37-
38-
> [!NOTE]
39-
> Custom Neural Voice is available with limited access. Make sure you understand the [responsible AI requirements](/legal/cognitive-services/speech-service/custom-neural-voice/limited-access-custom-neural-voice?context=%2fazure%2fcognitive-services%2fspeech-service%2fcontext%2fcontext), and then [apply for access](https://aka.ms/customneural).
4032

4133
The following steps assume that you've prepared the voice talent verbal consent files. Go to [Speech Studio](https://aka.ms/custom-voice-portal) to select a Custom Neural Voice project, and then follow these steps to create a voice talent profile.
4234

4335
1. Go to **Text-to-Speech** > **Custom Voice** > **select a project**, and select **Set up voice talent**.
4436

4537
1. Select **Add voice talent**.
4638

47-
1. Next, to define voice characteristics, select **Target scenario**. Then describe your **Voice characteristics**.
48-
49-
>[!NOTE]
50-
>The scenarios you provide must be consistent with what you've applied for in the application form.
51-
52-
1. Then, go to **Upload voice talent statement**, and follow the instruction to upload the voice talent statement you've prepared beforehand.
39+
1. Next, to define voice characteristics, select **Target scenario**. Then describe your **Voice characteristics**. The scenarios you provide must be consistent with what you've applied for in the application form.
5340

54-
>[!NOTE]
55-
>Make sure the verbal statement is recorded in the same settings as your training data, including the recording environment and speaking style.
41+
1. Go to **Upload voice talent statement**, and follow the instruction to upload the voice talent statement you've prepared beforehand. Make sure the verbal statement is recorded in the same settings as your training data, including the recording environment and speaking style.
5642

5743
1. Go to **Review and create**, review the settings, and select **Submit**.
5844

@@ -72,13 +58,12 @@ You can do the following to create and review your training data:
7258

7359
> [!NOTE]
7460
>- Duplicate audio names are removed from the training. Make sure the data you select don't contain the same audio names within the .zip file or across multiple .zip files. If utterance IDs (either in audio or script files) are duplicates, they're rejected.
75-
>- If you've created data files in the previous version of Speech Studio, you must specify a training set for your data in advance to use them. If you haven't, an exclamation mark is appended to the data name, and the data can't be used.
7661
7762
All data you upload must meet the requirements for the data type that you choose. It's important to correctly format your data before it's uploaded, which ensures the data will be accurately processed by the Speech service. Go to [Prepare training data](how-to-custom-voice-prepare-data.md), and confirm that your data is correctly formatted.
7863

7964
> [!NOTE]
8065
> - Standard subscription (S0) users can upload five data files simultaneously. If you reach the limit, wait until at least one of your data files finishes importing. Then try again.
81-
> - The maximum number of data files allowed to be imported per subscription is 500 .zip files for standard subscription (S0) users.
66+
> - The maximum number of data files allowed to be imported per subscription is 500 .zip files for standard subscription (S0) users. Please see out [Speech service quotas and limits](speech-services-quotas-and-limits.md#custom-neural-voice) for more details.
8267
8368
Data files are automatically validated when you select **Submit**. Data validation includes series of checks on the audio files to verify their file format, size, and sampling rate. If there are any errors, fix them and submit again.
8469

@@ -191,7 +176,7 @@ After you validate your data files, you can use them to build your Custom Neural
191176

192177
>[!NOTE]
193178
>- To create a custom neural voice, select at least 300 utterances.
194-
>- To train a neural voice, you must specify a voice talent profile. This profile must provide the audio consent file of the voice talent, acknowledging to use his or her speech data to train a custom neural voice model. Custom Neural Voice is available with limited access. Make sure you understand the [responsible AI requirements](/legal/cognitive-services/speech-service/custom-neural-voice/limited-access-custom-neural-voice?context=%2fazure%2fcognitive-services%2fspeech-service%2fcontext%2fcontext) and [apply the access](https://aka.ms/customneural).
179+
>- To train a neural voice, you must specify a voice talent profile. This profile must provide the audio consent file of the voice talent, acknowledging to use their speech data to train a custom neural voice model.
195180
196181
1. Choose your test script. Each training generates 100 sample audio files automatically, to help you test the model with a default script. You can also provide your own test script, including up to 100 utterances. The test script must exclude the filenames (the ID of each utterance). Otherwise, these IDs are spoken. Here's an example of how the utterances are organized in one .txt file:
197182

@@ -213,8 +198,7 @@ After you validate your data files, you can use them to build your Custom Neural
213198

214199
1. Review the settings, then select **Submit** to start training the model.
215200

216-
> [!NOTE]
217-
> Duplicate audio names will be removed from the training. Make sure the data you select don't contain the same audio names across multiple .zip files.
201+
Duplicate audio names will be removed from the training. Make sure the data you select don't contain the same audio names across multiple .zip files.
218202

219203
The **Train model** table displays a new entry that corresponds to this newly created model.
220204

@@ -311,5 +295,4 @@ For more information, [learn more about the capabilities and limits of this feat
311295
- [Deploy and use your voice model](how-to-deploy-and-use-endpoint.md)
312296
- [How to record voice samples](record-custom-voice-samples.md)
313297
- [Text-to-Speech API reference](rest-text-to-speech.md)
314-
- [Long Audio API](long-audio-api.md)
315-
298+
- [Long Audio API](long-audio-api.md)

0 commit comments

Comments
 (0)