Skip to content

Commit 043a85f

Browse files
authored
Merge pull request #188974 from sally-baolian/Add_CNVLite_220217
Add Custom Neural Voice Lite Contents
2 parents 38aedd7 + 2a5ad0c commit 043a85f

File tree

6 files changed

+134
-64
lines changed

6 files changed

+134
-64
lines changed

articles/cognitive-services/Speech-Service/custom-neural-voice.md

Lines changed: 48 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -7,8 +7,8 @@ author: eric-urban
77
manager: nitinme
88
ms.service: cognitive-services
99
ms.subservice: speech-service
10-
ms.topic: overview
11-
ms.date: 01/23/2022
10+
ms.topic: conceptual
11+
ms.date: 02/18/2022
1212
ms.author: eur
1313
---
1414

@@ -19,7 +19,7 @@ Custom Neural Voice is a text-to-speech feature that lets you create a one-of-a-
1919
Based on the neural text-to-speech technology and the multilingual, multi-speaker, universal model, Custom Neural Voice lets you create synthetic voices that are rich in speaking styles, or adaptable cross languages. The realistic and natural sounding voice of Custom Neural Voice can represent brands, personify machines, and allow users to interact with applications conversationally. See the [supported languages](language-support.md#custom-neural-voice) for Custom Neural Voice.
2020

2121
> [!NOTE]
22-
> Custom Neural Voice requires registration, and access to it is limited based on eligibility and use criteria. To use this feature, register your use cases by using the [intake form](https://aka.ms/customneural).
22+
> Custom Neural Voice access is limited based on eligibility and usage criteria. Request access on the [intake form](https://aka.ms/customneural).
2323
2424
## The basics of Custom Neural Voice
2525

@@ -35,19 +35,47 @@ the recording samples of human voices. For more information, see [this Microsoft
3535

3636
You can adapt the neural text-to-speech engine to fit your needs. To create a custom neural voice, use [Speech Studio](https://speech.microsoft.com/customvoice) to upload the recorded audio and corresponding scripts, train the model, and deploy the voice to a custom endpoint. Custom Neural Voice can use text provided by the user to convert text into speech in real time, or generate audio content offline with text input. You can do this by using the [REST API](./rest-text-to-speech.md), the [Speech SDK](./get-started-text-to-speech.md), or the [web portal](https://speech.microsoft.com/audiocontentcreation).
3737

38-
## Get started
38+
## Custom Neural Voice project types
3939

40-
The following articles help you start using this feature:
40+
Speech Studio provides two Custom Neural Voice (CNV) project types: CNV Pro and CNV Lite.
41+
42+
The following table summarizes key differences between the CNV Pro and CNV Lite project types.
43+
44+
|**Items**|**Lite (Preview)**| **Pro**|
45+
|---------------|---------------|---------------|
46+
|Target scenarios |Demonstration or evaluation |Professional scenarios like brand and character voices for chat bots, or audio content reading.|
47+
|Training data |Record online using Speech Studio |Bring your own data. Recording in a professional studio is recommended. |
48+
|Scripts for recording |Provided in Speech Studio |Use your own scripts that match the use case scenario. Microsoft provides [example scripts](https://github.com/Azure-Samples/Cognitive-Speech-TTS/tree/master/CustomVoice/script) for reference. |
49+
|Required data size |20-50 utterances |300-2000 utterances|
50+
|Training time |Less than 1 compute hour| Approximately 20-40 compute hours |
51+
|Voice quality |Moderate quality|High quality |
52+
|Availability |Anyone can record samples online and train a model for demo and evaluation purpose. Full access to Custom Neural Voice is required if you want to deploy the CNV Lite model for business use. |Data upload is not restricted, but you can only train and deploy a CNV Pro model after access is approved. CNV Pro access is limited based on eligibility and usage criteria. Request access on the [intake form](https://aka.ms/customneural).|
53+
|Pricing |Per unit prices apply equally for both the CNV Lite and CNV Pro projects. Check the [pricing details here](https://azure.microsoft.com/pricing/details/cognitive-services/speech-services/). |Per unit prices apply equally for both the CNV Lite and CNV Pro projects. Check the [pricing details here](https://azure.microsoft.com/pricing/details/cognitive-services/speech-services/). |
54+
55+
### Custom Neural Voice Lite (preview)
56+
57+
Custom Neural Voice (CNV) Lite is a new project type in public preview. You can demo and evaluate Custom Neural Voice before investing in professional recordings to create a higher-quality voice.
58+
59+
With a CNV Lite project, you record your voice online by reading 20-50 pre-defined scripts provided by Microsoft. After you've recorded at least 20 samples, you can start to train a model. Once the model is trained successfully, you can review the model and check out 20 output samples produced with another set of pre-defined scripts.
60+
61+
Full access to Custom Neural Voice is required if you want to deploy a CNV Lite model and use it beyond reading the pre-defined scripts. A verbal statement recorded by the voice talent is also required before you can deploy the model for your business use.
62+
63+
### Custom Neural Voice Pro
64+
65+
Custom Neural Voice (CNV) Pro allows you to upload your training data collected through professional recording studios and create a higher-quality voice that is nearly indistinguishable from its human samples. Training a voice in a CNV Pro project is restricted to those who are approved.
66+
67+
Review these CNV Pro articles to learn more and get started.
4168

42-
* To get started with Custom Neural Voice and create a project, see [Get started with Custom Neural Voice](how-to-custom-voice.md).
4369
* To prepare and upload your audio data, see [Prepare training data](how-to-custom-voice-prepare-data.md).
44-
* To train and deploy your models, see [Train your voice model](how-to-custom-voice-create-voice.md) and [Deploy and use your voice model](how-to-deploy-and-use-endpoint.md).
70+
* To train your model, see [Train your voice model](how-to-custom-voice-create-voice.md).
71+
* To deploy your model and use it in your apps, see [Deploy and use your voice model](how-to-deploy-and-use-endpoint.md).
72+
* Learn how to prepare for the script and record your voice samples, see [How to record voice samples](record-custom-voice-samples.md).
4573

4674
## Terms and definitions
4775

48-
| **Term** | **Definition** |
76+
| **Term** | **Definition** |
4977
|---------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
50-
| Voice model | A text-to-speech model that can mimic the unique vocal characteristics of a target speaker. A *voice model* is also known as a *voice font* or *synthetic voice*. A voice model is a set of parameters in binary format that is not human readable and does not contain audio recordings. It can't be reverse engineered to derive or construct the audio of a human voice. |
78+
| Voice model | A text-to-speech model that can mimic the unique vocal characteristics of a target speaker. A *voice model* is also known as a *voice font* or *synthetic voice*. A voice model is a set of parameters in binary format that isn't human readable and doesn't contain audio recordings. It can't be reverse engineered to derive or construct the audio of a human voice. |
5179
| Voice talent | Individuals or target speakers whose voices are recorded and used to create voice models. These voice models are intended to sound like the voice talent’s voice.|
5280
| Standard text-to-speech | The standard, or "traditional," method of text-to-speech. This method breaks down spoken language into phonetic snippets so that they can be remixed and matched by using classical programming or statistical methods.|
5381
| Neural text-to-speech | This method synthesizes speech by using deep neural networks. These networks have "learned" the way phonetics are combined in natural human speech, rather than using procedural programming or statistical methods. In addition to the recordings of a target voice talent, neural text-to-speech uses a source library or base model that is built with voice recordings from many different speakers. |
@@ -57,7 +85,17 @@ The following articles help you start using this feature:
5785

5886
## Responsible use of AI
5987

60-
To learn how to use Custom Neural Voice responsibly, see the [transparency note](/legal/cognitive-services/speech-service/custom-neural-voice/transparency-note-custom-neural-voice?context=/azure/cognitive-services/speech-service/context/context). Transparency notes are intended to help you understand how the AI technology from Microsoft works, and the choices system owners can make that influence system performance and behavior. Transparency notes also discuss the importance of thinking about the whole system, including the technology, the people, and the environment.
88+
To learn how to use Custom Neural Voice responsibly, check the following articles.
89+
90+
* [Transparency note and use cases for Custom Neural Voice](/legal/cognitive-services/speech-service/custom-neural-voice/transparency-note-custom-neural-voice?context=/azure/cognitive-services/speech-service/context/context)
91+
* [Characteristics and limitations for using Custom Neural Voice](/legal/cognitive-services/speech-service/custom-neural-voice/characteristics-and-limitations-custom-neural-voice?context=/azure/cognitive-services/speech-service/context/context)
92+
* [Limited access to Custom Neural Voice](/legal/cognitive-services/speech-service/custom-neural-voice/limited-access-custom-neural-voice?context=/azure/cognitive-services/speech-service/context/context)
93+
* [Guidelines for responsible deployment of synthetic voice technology](/legal/cognitive-services/speech-service/custom-neural-voice/concepts-guidelines-responsible-deployment-synthetic?context=/azure/cognitive-services/speech-service/context/context)
94+
* [Disclosure for voice talent](/legal/cognitive-services/speech-service/disclosure-voice-talent?context=/azure/cognitive-services/speech-service/context/context)
95+
* [Disclosure design guidelines](/legal/cognitive-services/speech-service/custom-neural-voice/concepts-disclosure-guidelines?context=/azure/cognitive-services/speech-service/context/context)
96+
* [Disclosure design patterns](/legal/cognitive-services/speech-service/custom-neural-voice/concepts-disclosure-patterns?context=/azure/cognitive-services/speech-service/context/context)
97+
* [Code of Conduct for Text-to-Speech integrations](/legal/cognitive-services/speech-service/tts-code-of-conduct?context=/azure/cognitive-services/speech-service/context/context)
98+
* [Data, privacy, and security for Custom Neural Voice](/legal/cognitive-services/speech-service/custom-neural-voice/data-privacy-security-custom-neural-voice?context=/azure/cognitive-services/speech-service/context/context)
6199

62100
## Next steps
63101

articles/cognitive-services/Speech-Service/how-to-custom-voice-create-voice.md

Lines changed: 17 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -7,8 +7,8 @@ author: eric-urban
77
manager: nitinme
88
ms.service: cognitive-services
99
ms.subservice: speech-service
10-
ms.topic: how-to
11-
ms.date: 01/23/2022
10+
ms.topic: conceptual
11+
ms.date: 02/18/2022
1212
ms.author: eur
1313
ms.custom: references_regions
1414
---
@@ -17,6 +17,9 @@ ms.custom: references_regions
1717

1818
In [Prepare training data](how-to-custom-voice-prepare-data.md), you learned about the different data types you can use to train a custom neural voice, and the different format requirements. After you've prepared your data and the voice talent verbal statement, you can start to upload them to [Speech Studio](https://aka.ms/custom-voice-portal). In this article, you learn how to train a custom neural voice through the Speech Studio portal.
1919

20+
> [!NOTE]
21+
> See [Custom Neural Voice project types](custom-neural-voice.md#custom-neural-voice-project-types) for information about capabilities, requirements, and differences between Custom Neural Voice Pro and Custom Neural Voice Lite projects. This article focuses on the creation of a professional Custom Neural Voice using the Pro project.
22+
2023
## Prerequisites
2124

2225
* [Create a custom voice project](how-to-custom-voice.md)
@@ -75,7 +78,7 @@ All data you upload must meet the requirements for the data type that you choose
7578

7679
> [!NOTE]
7780
> - Standard subscription (S0) users can upload five data files simultaneously. If you reach the limit, wait until at least one of your data files finishes importing. Then try again.
78-
> - The maximum number of data files allowed to be imported per subscription is 10 .zip files for free subscription (F0) users, and 500 for standard subscription (S0) users.
81+
> - The maximum number of data files allowed to be imported per subscription is 500 .zip files for standard subscription (S0) users.
7982
8083
Data files are automatically validated when you select **Submit**. Data validation includes series of checks on the audio files to verify their file format, size, and sampling rate. If there are any errors, fix them and submit again.
8184

@@ -85,9 +88,15 @@ A higher signal-to-noise ratio (SNR) indicates lower noise in your audio. You ca
8588

8689
Consider re-recording any utterances with low pronunciation scores or poor signal-to-noise ratios. If you can't re-record, consider excluding those utterances from your data.
8790

91+
### Typical data issues
92+
8893
On **Data details**, you can check the data details of the training set. If there are any typical issues with the data, follow the instructions in the message that appears, to fix them before training.
8994

90-
The issues are divided into three types. Refer to the following tables to check the respective types of errors. Data with these errors will be excluded during training.
95+
The issues are divided into three types. Refer to the following tables to check the respective types of errors.
96+
97+
**Auto-rejected**
98+
99+
Data with these errors will be excluded during training.
91100

92101
| Category | Name | Description |
93102
| --------- | ----------- | --------------------------- |
@@ -103,13 +112,17 @@ The issues are divided into three types. Refer to the following tables to check
103112
| Audio | Too long audio| Audio duration is longer than 30 seconds. Split the long audio into multiple files. It's a good idea to make utterances shorter than 15 seconds.|
104113
| Audio | No valid audio| No valid audio is found in this dataset. Check your audio data and upload again.|
105114

115+
**Auto-fixed**
116+
106117
The following errors are fixed automatically, but you should confirm that the fixes have been made.
107118

108119
| Category | Name | Description |
109120
| --------- | ----------- | --------------------------- |
110121
| Mismatch |Silence auto fixed |The start silence is detected to be shorter than 100 ms, and has been extended to 100 ms automatically. Download the normalized dataset and review it. |
111122
| Mismatch |Silence auto fixed | The end silence is detected to be shorter than 100 ms, and has been extended to 100 ms automatically. Download the normalized dataset and review it.|
112123

124+
**Manual check required**
125+
113126
Unresolved errors listed in the next table affect the quality of training, but data with these errors won't be excluded during training. For higher-quality training, it's a good idea to fix these errors manually.
114127

115128
| Category | Name | Description |

0 commit comments

Comments
 (0)