You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/cognitive-services/Speech-Service/custom-neural-voice.md
+48-10Lines changed: 48 additions & 10 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -7,8 +7,8 @@ author: eric-urban
7
7
manager: nitinme
8
8
ms.service: cognitive-services
9
9
ms.subservice: speech-service
10
-
ms.topic: overview
11
-
ms.date: 01/23/2022
10
+
ms.topic: conceptual
11
+
ms.date: 02/18/2022
12
12
ms.author: eur
13
13
---
14
14
@@ -19,7 +19,7 @@ Custom Neural Voice is a text-to-speech feature that lets you create a one-of-a-
19
19
Based on the neural text-to-speech technology and the multilingual, multi-speaker, universal model, Custom Neural Voice lets you create synthetic voices that are rich in speaking styles, or adaptable cross languages. The realistic and natural sounding voice of Custom Neural Voice can represent brands, personify machines, and allow users to interact with applications conversationally. See the [supported languages](language-support.md#custom-neural-voice) for Custom Neural Voice.
20
20
21
21
> [!NOTE]
22
-
> Custom Neural Voice requires registration, and access to it is limited based on eligibility and use criteria. To use this feature, register your use cases by using the [intake form](https://aka.ms/customneural).
22
+
> Custom Neural Voice access is limited based on eligibility and usage criteria. Request access on the [intake form](https://aka.ms/customneural).
23
23
24
24
## The basics of Custom Neural Voice
25
25
@@ -35,19 +35,47 @@ the recording samples of human voices. For more information, see [this Microsoft
35
35
36
36
You can adapt the neural text-to-speech engine to fit your needs. To create a custom neural voice, use [Speech Studio](https://speech.microsoft.com/customvoice) to upload the recorded audio and corresponding scripts, train the model, and deploy the voice to a custom endpoint. Custom Neural Voice can use text provided by the user to convert text into speech in real time, or generate audio content offline with text input. You can do this by using the [REST API](./rest-text-to-speech.md), the [Speech SDK](./get-started-text-to-speech.md), or the [web portal](https://speech.microsoft.com/audiocontentcreation).
37
37
38
-
## Get started
38
+
## Custom Neural Voice project types
39
39
40
-
The following articles help you start using this feature:
40
+
Speech Studio provides two Custom Neural Voice (CNV) project types: CNV Pro and CNV Lite.
41
+
42
+
The following table summarizes key differences between the CNV Pro and CNV Lite project types.
43
+
44
+
|**Items**|**Lite (Preview)**|**Pro**|
45
+
|---------------|---------------|---------------|
46
+
|Target scenarios |Demonstration or evaluation |Professional scenarios like brand and character voices for chat bots, or audio content reading.|
47
+
|Training data |Record online using Speech Studio |Bring your own data. Recording in a professional studio is recommended. |
48
+
|Scripts for recording |Provided in Speech Studio |Use your own scripts that match the use case scenario. Microsoft provides [example scripts](https://github.com/Azure-Samples/Cognitive-Speech-TTS/tree/master/CustomVoice/script) for reference. |
49
+
|Required data size |20-50 utterances |300-2000 utterances|
50
+
|Training time |Less than 1 compute hour| Approximately 20-40 compute hours |
51
+
|Voice quality |Moderate quality|High quality |
52
+
|Availability |Anyone can record samples online and train a model for demo and evaluation purpose. Full access to Custom Neural Voice is required if you want to deploy the CNV Lite model for business use. |Data upload is not restricted, but you can only train and deploy a CNV Pro model after access is approved. CNV Pro access is limited based on eligibility and usage criteria. Request access on the [intake form](https://aka.ms/customneural).|
53
+
|Pricing |Per unit prices apply equally for both the CNV Lite and CNV Pro projects. Check the [pricing details here](https://azure.microsoft.com/pricing/details/cognitive-services/speech-services/). |Per unit prices apply equally for both the CNV Lite and CNV Pro projects. Check the [pricing details here](https://azure.microsoft.com/pricing/details/cognitive-services/speech-services/). |
54
+
55
+
### Custom Neural Voice Lite (preview)
56
+
57
+
Custom Neural Voice (CNV) Lite is a new project type in public preview. You can demo and evaluate Custom Neural Voice before investing in professional recordings to create a higher-quality voice.
58
+
59
+
With a CNV Lite project, you record your voice online by reading 20-50 pre-defined scripts provided by Microsoft. After you've recorded at least 20 samples, you can start to train a model. Once the model is trained successfully, you can review the model and check out 20 output samples produced with another set of pre-defined scripts.
60
+
61
+
Full access to Custom Neural Voice is required if you want to deploy a CNV Lite model and use it beyond reading the pre-defined scripts. A verbal statement recorded by the voice talent is also required before you can deploy the model for your business use.
62
+
63
+
### Custom Neural Voice Pro
64
+
65
+
Custom Neural Voice (CNV) Pro allows you to upload your training data collected through professional recording studios and create a higher-quality voice that is nearly indistinguishable from its human samples. Training a voice in a CNV Pro project is restricted to those who are approved.
66
+
67
+
Review these CNV Pro articles to learn more and get started.
41
68
42
-
* To get started with Custom Neural Voice and create a project, see [Get started with Custom Neural Voice](how-to-custom-voice.md).
43
69
* To prepare and upload your audio data, see [Prepare training data](how-to-custom-voice-prepare-data.md).
44
-
* To train and deploy your models, see [Train your voice model](how-to-custom-voice-create-voice.md) and [Deploy and use your voice model](how-to-deploy-and-use-endpoint.md).
70
+
* To train your model, see [Train your voice model](how-to-custom-voice-create-voice.md).
71
+
* To deploy your model and use it in your apps, see [Deploy and use your voice model](how-to-deploy-and-use-endpoint.md).
72
+
* Learn how to prepare for the script and record your voice samples, see [How to record voice samples](record-custom-voice-samples.md).
| Voice model | A text-to-speech model that can mimic the unique vocal characteristics of a target speaker. A *voice model* is also known as a *voice font* or *synthetic voice*. A voice model is a set of parameters in binary format that is not human readable and does not contain audio recordings. It can't be reverse engineered to derive or construct the audio of a human voice. |
78
+
| Voice model | A text-to-speech model that can mimic the unique vocal characteristics of a target speaker. A *voice model* is also known as a *voice font* or *synthetic voice*. A voice model is a set of parameters in binary format that isn't human readable and doesn't contain audio recordings. It can't be reverse engineered to derive or construct the audio of a human voice. |
51
79
| Voice talent | Individuals or target speakers whose voices are recorded and used to create voice models. These voice models are intended to sound like the voice talent’s voice.|
52
80
| Standard text-to-speech | The standard, or "traditional," method of text-to-speech. This method breaks down spoken language into phonetic snippets so that they can be remixed and matched by using classical programming or statistical methods.|
53
81
| Neural text-to-speech | This method synthesizes speech by using deep neural networks. These networks have "learned" the way phonetics are combined in natural human speech, rather than using procedural programming or statistical methods. In addition to the recordings of a target voice talent, neural text-to-speech uses a source library or base model that is built with voice recordings from many different speakers. |
@@ -57,7 +85,17 @@ The following articles help you start using this feature:
57
85
58
86
## Responsible use of AI
59
87
60
-
To learn how to use Custom Neural Voice responsibly, see the [transparency note](/legal/cognitive-services/speech-service/custom-neural-voice/transparency-note-custom-neural-voice?context=/azure/cognitive-services/speech-service/context/context). Transparency notes are intended to help you understand how the AI technology from Microsoft works, and the choices system owners can make that influence system performance and behavior. Transparency notes also discuss the importance of thinking about the whole system, including the technology, the people, and the environment.
88
+
To learn how to use Custom Neural Voice responsibly, check the following articles.
89
+
90
+
*[Transparency note and use cases for Custom Neural Voice](/legal/cognitive-services/speech-service/custom-neural-voice/transparency-note-custom-neural-voice?context=/azure/cognitive-services/speech-service/context/context)
91
+
*[Characteristics and limitations for using Custom Neural Voice](/legal/cognitive-services/speech-service/custom-neural-voice/characteristics-and-limitations-custom-neural-voice?context=/azure/cognitive-services/speech-service/context/context)
92
+
*[Limited access to Custom Neural Voice](/legal/cognitive-services/speech-service/custom-neural-voice/limited-access-custom-neural-voice?context=/azure/cognitive-services/speech-service/context/context)
93
+
*[Guidelines for responsible deployment of synthetic voice technology](/legal/cognitive-services/speech-service/custom-neural-voice/concepts-guidelines-responsible-deployment-synthetic?context=/azure/cognitive-services/speech-service/context/context)
94
+
*[Disclosure for voice talent](/legal/cognitive-services/speech-service/disclosure-voice-talent?context=/azure/cognitive-services/speech-service/context/context)
*[Code of Conduct for Text-to-Speech integrations](/legal/cognitive-services/speech-service/tts-code-of-conduct?context=/azure/cognitive-services/speech-service/context/context)
98
+
*[Data, privacy, and security for Custom Neural Voice](/legal/cognitive-services/speech-service/custom-neural-voice/data-privacy-security-custom-neural-voice?context=/azure/cognitive-services/speech-service/context/context)
Copy file name to clipboardExpand all lines: articles/cognitive-services/Speech-Service/how-to-custom-voice-create-voice.md
+17-4Lines changed: 17 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -7,8 +7,8 @@ author: eric-urban
7
7
manager: nitinme
8
8
ms.service: cognitive-services
9
9
ms.subservice: speech-service
10
-
ms.topic: how-to
11
-
ms.date: 01/23/2022
10
+
ms.topic: conceptual
11
+
ms.date: 02/18/2022
12
12
ms.author: eur
13
13
ms.custom: references_regions
14
14
---
@@ -17,6 +17,9 @@ ms.custom: references_regions
17
17
18
18
In [Prepare training data](how-to-custom-voice-prepare-data.md), you learned about the different data types you can use to train a custom neural voice, and the different format requirements. After you've prepared your data and the voice talent verbal statement, you can start to upload them to [Speech Studio](https://aka.ms/custom-voice-portal). In this article, you learn how to train a custom neural voice through the Speech Studio portal.
19
19
20
+
> [!NOTE]
21
+
> See [Custom Neural Voice project types](custom-neural-voice.md#custom-neural-voice-project-types) for information about capabilities, requirements, and differences between Custom Neural Voice Pro and Custom Neural Voice Lite projects. This article focuses on the creation of a professional Custom Neural Voice using the Pro project.
22
+
20
23
## Prerequisites
21
24
22
25
*[Create a custom voice project](how-to-custom-voice.md)
@@ -75,7 +78,7 @@ All data you upload must meet the requirements for the data type that you choose
75
78
76
79
> [!NOTE]
77
80
> - Standard subscription (S0) users can upload five data files simultaneously. If you reach the limit, wait until at least one of your data files finishes importing. Then try again.
78
-
> - The maximum number of data files allowed to be imported per subscription is 10 .zip files for free subscription (F0) users, and 500 for standard subscription (S0) users.
81
+
> - The maximum number of data files allowed to be imported per subscription is 500 .zip files for standard subscription (S0) users.
79
82
80
83
Data files are automatically validated when you select **Submit**. Data validation includes series of checks on the audio files to verify their file format, size, and sampling rate. If there are any errors, fix them and submit again.
81
84
@@ -85,9 +88,15 @@ A higher signal-to-noise ratio (SNR) indicates lower noise in your audio. You ca
85
88
86
89
Consider re-recording any utterances with low pronunciation scores or poor signal-to-noise ratios. If you can't re-record, consider excluding those utterances from your data.
87
90
91
+
### Typical data issues
92
+
88
93
On **Data details**, you can check the data details of the training set. If there are any typical issues with the data, follow the instructions in the message that appears, to fix them before training.
89
94
90
-
The issues are divided into three types. Refer to the following tables to check the respective types of errors. Data with these errors will be excluded during training.
95
+
The issues are divided into three types. Refer to the following tables to check the respective types of errors.
96
+
97
+
**Auto-rejected**
98
+
99
+
Data with these errors will be excluded during training.
@@ -103,13 +112,17 @@ The issues are divided into three types. Refer to the following tables to check
103
112
| Audio | Too long audio| Audio duration is longer than 30 seconds. Split the long audio into multiple files. It's a good idea to make utterances shorter than 15 seconds.|
104
113
| Audio | No valid audio| No valid audio is found in this dataset. Check your audio data and upload again.|
105
114
115
+
**Auto-fixed**
116
+
106
117
The following errors are fixed automatically, but you should confirm that the fixes have been made.
| Mismatch |Silence auto fixed |The start silence is detected to be shorter than 100 ms, and has been extended to 100 ms automatically. Download the normalized dataset and review it. |
111
122
| Mismatch |Silence auto fixed | The end silence is detected to be shorter than 100 ms, and has been extended to 100 ms automatically. Download the normalized dataset and review it.|
112
123
124
+
**Manual check required**
125
+
113
126
Unresolved errors listed in the next table affect the quality of training, but data with these errors won't be excluded during training. For higher-quality training, it's a good idea to fix these errors manually.
0 commit comments