You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/cognitive-services/Speech-Service/custom-neural-voice.md
+4-4Lines changed: 4 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -14,12 +14,12 @@ ms.author: eur
14
14
15
15
# What is Custom Neural Voice?
16
16
17
-
Custom Neural Voice (CNV) is a text-to-speech feature that lets you create a one-of-a-kind, customized, synthetic voice for your applications. With Custom Neural Voice, you can build a highly natural-sounding voice by providing your audio samples as training data.
17
+
Custom Neural Voice (CNV) is a text-to-speech feature that lets you create a one-of-a-kind, customized, synthetic voice for your applications. With Custom Neural Voice, you can build a highly natural-sounding voice for your brand or characters by providing human speech samples as training data.
18
18
19
19
> [!IMPORTANT]
20
20
> Custom Neural Voice access is [limited](/legal/cognitive-services/speech-service/custom-neural-voice/limited-access-custom-neural-voice?context=%2fazure%2fcognitive-services%2fspeech-service%2fcontext%2fcontext) based on eligibility and usage criteria. Request access on the [intake form](https://aka.ms/customneural).
21
21
22
-
Out of the box, [text-to-speech](text-to-speech.md) can be used with prebuilt neural voices for each [supported language](language-support.md?tabs=stt-tts). The prebuilt neural voices work very well in most text-to-speech scenarios.
22
+
Out of the box, [text-to-speech](text-to-speech.md) can be used with prebuilt neural voices for each [supported language](language-support.md?tabs=stt-tts). The prebuilt neural voices work very well in most text-to-speech scenarios if a unique voice isn't required.
23
23
24
24
Custom Neural Voice is based on the neural text-to-speech technology and the multilingual, multi-speaker, universal model. You can create synthetic voices that are rich in speaking styles, or adaptable cross languages. The realistic and natural sounding voice of Custom Neural Voice can represent brands, personify machines, and allow users to interact with applications conversationally. See the [supported languages](language-support.md?tabs=stt-tts) for Custom Neural Voice.
25
25
@@ -37,9 +37,9 @@ Before you get started in Speech Studio, here are some considerations:
37
37
-[Design a persona](record-custom-voice-samples.md#choose-your-voice-talent) of the voice that represents your brand by using a persona brief document. This document defines elements such as the features of the voice, and the character behind the voice. This helps to guide the process of creating a custom neural voice model, including defining the scripts, selecting your voice talent, training, and voice tuning.
38
38
-[Select the recording script](record-custom-voice-samples.md#script-selection-criteria) to represent the user scenarios for your voice. For example, you can use the phrases from bot conversations as your recording script if you're creating a customer service bot. Include different sentence types in your scripts, including statements, questions, and exclamations.
39
39
40
-
Here's an overview of the steps to create a Custom Neural Voice in Speech Studio:
40
+
Here's an overview of the steps to create a custom neural voice in Speech Studio:
41
41
42
-
1.[Create a project](how-to-custom-voice.md) to contain your data, voice models, tests, and endpoints. Each project is specific to a country and language.
42
+
1.[Create a project](how-to-custom-voice.md) to contain your data, voice models, tests, and endpoints. Each project is specific to a country and language. If you are going to create multiple voices, it's recommended that you create a project for each voice.
43
43
1.[Set up voice talent](how-to-custom-voice.md). Before you can train a neural voice, you must submit a recording of the voice talent's consent statement. The voice talent statement is a recording of the voice talent reading a statement that they consent to the usage of their speech data to train a custom voice model.
44
44
1.[Prepare training data](how-to-custom-voice-prepare-data.md) in the right [format](how-to-custom-voice-training-data.md). It's a good idea to capture the audio recordings in a professional quality recording studio to achieve a high signal-to-noise ratio. The quality of the voice model depends heavily on your training data. Consistent volume, speaking rate, pitch, and consistency in expressive mannerisms of speech are required.
45
45
1.[Train your voice model](how-to-custom-voice-create-voice.md). Select at least 300 utterances to create a custom neural voice. A series of data quality checks are automatically performed when you upload them. To build high-quality voice models, you should fix any errors and submit again.
Copy file name to clipboardExpand all lines: articles/cognitive-services/Speech-Service/how-to-custom-voice-create-voice.md
+5-3Lines changed: 5 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -31,9 +31,11 @@ After you validate your data files, you can use them to build your Custom Neural
31
31
32
32
-[Neural](?tabs=neural#train-your-custom-neural-voice-model): Create a voice in the same language of your training data, select **Neural** method.
33
33
34
-
-[Neural - cross lingual](?tabs=crosslingual#train-your-custom-neural-voice-model) (Preview): Create a secondary language for your voice model to speak a different language from your training data. For example, with the `zh-CN` training data, you can create a voice that speaks `en-US`. The language of the training data and the target language must both be one of the [languages that are supported](language-support.md?tabs=stt-tts) for cross lingual training. You don't need to prepare training data in the target language, but your test script must be in the target language.
34
+
-[Neural - cross lingual](?tabs=crosslingual#train-your-custom-neural-voice-model) (Preview): Create a secondary language for your voice model to speak a different language from your training data. For example, with the `zh-CN` training data, you can create a voice that speaks `en-US`. The language of the training data and the target language must both be one of the [languages that are supported](language-support.md?tabs=stt-tts) for cross lingual voice training. You don't need to prepare training data in the target language, but your test script must be in the target language.
35
35
36
-
-[Neural - multi style](?tabs=multistyle#train-your-custom-neural-voice-model) (Preview): Create a custom neural voice that speaks in multiple styles/emotions, without adding new training data. Multi-style voices are particularly useful for video game characters, conversational chatbots, audiobook and content readers, and more. To create a multi-style voice, you just need to prepare a set of general training data (at least 300 utterances), and select one or more of the preset target speaking styles. You can also create up to 10 custom styles by providing style samples as additional training data for the same voice.
36
+
-[Neural - multi style](?tabs=multistyle#train-your-custom-neural-voice-model) (Preview): Create a custom neural voice that speaks in multiple styles and emotions, without adding new training data. Multi-style voices are particularly useful for video game characters, conversational chatbots, audiobooks, content readers, and more. To create a multi-style voice, you just need to prepare a set of general training data (at least 300 utterances), and select one or more of the preset target speaking styles. You can also create up to 10 custom styles by providing style samples as additional training data for the same voice.
37
+
38
+
The language of the training data must be one of the [languages that are supported](language-support.md?tabs=stt-tts) for custom neural voice neural, cross-lingual, or multi-style training.
37
39
38
40
## Train your Custom Neural Voice model
39
41
@@ -84,7 +86,7 @@ To create a custom neural voice in Speech Studio, follow these steps for one of
84
86
1. Select **Next**.
85
87
1. Optionally, you can add up to 10 custom speaking styles:
86
88
1. Select **Add a custom style** and thoughtfully enter a custom style name of your choice. This name will be used by your application within the `style` element of [Speech Synthesis Markup Language (SSML)](speech-synthesis-markup.md#adjust-speaking-styles). You can also use the custom style name as SSML via the [Audio Content Creation](how-to-audio-content-creation.md) tool in [Speech Studio](https://speech.microsoft.com/portal/audiocontentcreation).
87
-
1. Select style samples as training data.
89
+
1. Select style samples as training data. It's recommended that the style samples are all from the same voice talent profile.
88
90
1. Select **Next**.
89
91
1. Select a speaker file with the voice talent statement that corresponds to the speaker in your training data.
0 commit comments