Skip to content

Commit de78e78

Browse files
authored
Merge pull request #167006 from sally-baolian/message_script_update
git commit
2 parents f07cfc4 + 9b4e3ce commit de78e78

File tree

2 files changed

+48
-38
lines changed

2 files changed

+48
-38
lines changed

articles/cognitive-services/Speech-Service/how-to-custom-voice-create-voice.md

Lines changed: 27 additions & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -38,9 +38,9 @@ The following steps assume you've prepared the voice talent verbal consent files
3838

3939
1. Navigate to **Text-to-Speech** > **Custom Voice** > **select a project** > **Set up voice talent**.
4040

41-
2. Click **Add voice talent**.
41+
2. Select **Add voice talent**.
4242

43-
3. Next, to define voice characteristics, click **Target scenario** to be used. Then describe your **Voice characteristics**.
43+
3. Next, to define voice characteristics, select **Target scenario** to be used. Then describe your **Voice characteristics**.
4444

4545
> [!NOTE]
4646
> The scenarios you provide must be consistent with what you've applied for in the application form.
@@ -50,37 +50,37 @@ The following steps assume you've prepared the voice talent verbal consent files
5050
> [!NOTE]
5151
> Make sure the verbal statement is recorded in the same settings as your training data, including the recording environment and speaking style.
5252
53-
5. Finally, go to **Review and submit**, you can review the settings and click **Submit**.
53+
5. Finally, go to **Review and create**, you can review the settings and select **Submit**.
5454

55-
## Upload your datasets
55+
## Upload your data
5656

57-
When you're ready to upload your data, go to the **Prepare training data** tab to add your first training set and upload data. A training set is a set of audio utterances and their mapping scripts used for training a voice model. You can use a training set to organize your training data. Data readiness checking will be done per each training set. You can import multiple datasets to a training set.
57+
When you're ready to upload your data, go to the **Prepare training data** tab to add your first training set and upload data. A training set is a set of audio utterances and their mapping scripts used for training a voice model. You can use a training set to organize your training data. Data readiness checking will be done per each training set. You can import multiple data to a training set.
5858

5959
You can do the following to create and review your training data.
6060

61-
1. On the **Prepare training data** tab, click **Add training set** to enter **Name** and **Description** > **Create** to add a new training set.
61+
1. On the **Prepare training data** tab, select **Add training set** to enter **Name** and **Description** > **Create** to add a new training set.
6262

6363
When the training set is successfully created, you can start to upload your data.
6464

65-
2. To upload data, click **Upload data** > **Choose data type** > **Upload data** and **Specify the target training set** > Enter **Name** and **Description** for your dataset > review the settings and click **Upload**.
65+
2. To upload data, select **Upload data** > **Choose data type** > **Upload data** and **Specify the target training set** > Enter **Name** and **Description** for your data > review the settings and select **Submit**.
6666

6767
> [!NOTE]
68-
>- Duplicate audio names will be removed from the training. Make sure the datasets you select don't contain the same audio names within the .zip file or across multiple .zip files. If utterance IDs (either in audio or script files) are duplicate, they'll be rejected.
69-
>- If you've created datasets in the previous version of Speech Studio, you must specify a training set for your datasets in advance to use them. Or else, an exclamation mark will be appended to the dataset name, and the dataset could not be used.
68+
>- Duplicate audio names will be removed from the training. Make sure the data you select don't contain the same audio names within the .zip file or across multiple .zip files. If utterance IDs (either in audio or script files) are duplicate, they'll be rejected.
69+
>- If you've created data files in the previous version of Speech Studio, you must specify a training set for your data in advance to use them. Or else, an exclamation mark will be appended to the data name, and the data could not be used.
7070
71-
Each dataset you upload must meet the requirements for the data type that you choose. It's important to correctly format your data before it's uploaded, which ensures the data will be accurately processed by the Custom Neural Voice service. Go to [Prepare training data](how-to-custom-voice-prepare-data.md) and make sure your data has been rightly formatted.
71+
Each data you upload must meet the requirements for the data type that you choose. It's important to correctly format your data before it's uploaded, which ensures the data will be accurately processed by the Custom Neural Voice service. Go to [Prepare training data](how-to-custom-voice-prepare-data.md) and make sure your data has been rightly formatted.
7272

7373
> [!NOTE]
74-
> - Standard subscription (S0) users can upload five datasets simultaneously. If you reach the limit, wait until at least one of your datasets finishes importing. Then try again.
75-
> - The maximum number of datasets allowed to be imported per subscription is 10 .zip files for free subscription (F0) users and 500 for standard subscription (S0) users.
74+
> - Standard subscription (S0) users can upload five data files simultaneously. If you reach the limit, wait until at least one of your data files finishes importing. Then try again.
75+
> - The maximum number of data files allowed to be imported per subscription is 10 .zip files for free subscription (F0) users and 500 for standard subscription (S0) users.
7676
77-
Datasets are automatically validated once you hit the **Upload** button. Data validation includes series of checks on the audio files to verify their file format, size, and sampling rate. Fix the errors if any and submit again.
77+
Data files are automatically validated once you hit the **Submit** button. Data validation includes series of checks on the audio files to verify their file format, size, and sampling rate. Fix the errors if any and submit again.
7878

79-
Once the data is uploaded, you can check the details in the training set detail view. On the **Overview** tab, you can further check the pronunciation scores and the noise level for each of your datasets. The pronunciation score ranges from 0 to 100. A score below 70 normally indicates a speech error or script mismatch. A heavy accent can reduce your pronunciation score and impact the generated digital voice.
79+
Once the data is uploaded, you can check the details in the training set detail view. On the **Overview** tab, you can further check the pronunciation scores and the noise level for each of your data. The pronunciation score ranges from 0 to 100. A score below 70 normally indicates a speech error or script mismatch. A heavy accent can reduce your pronunciation score and affect the generated digital voice.
8080

8181
A higher signal-to-noise ratio (SNR) indicates lower noise in your audio. You can typically reach a 50+ SNR by recording at professional studios. Audio with an SNR below 20 can result in obvious noise in your generated voice.
8282

83-
Consider re-recording any utterances with low pronunciation scores or poor signal-to-noise ratios. If you can't re-record, consider excluding those utterances from your dataset.
83+
Consider re-recording any utterances with low pronunciation scores or poor signal-to-noise ratios. If you can't re-record, consider excluding those utterances from your data.
8484

8585
On the **Data details**, you can check the data details of the training set. If there are any typical issues with the data, follow the instructions in the message displayed to fix them before training.
8686

@@ -126,43 +126,43 @@ If the third type of errors listed in the table below aren't fixed, although the
126126
| Volume | Volume overflow| Overflowing volume is detected at {}s. Adjust the recording equipment to avoid the volume overflow at its peak value.|
127127
| Volume | Start silence issue | The first 100 ms silence isn't clean. Reduce the recording noise floor level and leave the first 100 ms at the start silent.|
128128
| Volume| End silence issue| The last 100 ms silence isn't clean. Reduce the recording noise floor level and leave the last 100 ms at the end silent.|
129-
| Mismatch | Script and audio mismatch|Review the script and the audio content to make sure they match and control the noise floor level. Reduce the length of long silence or split the audio into multiple utterances if it's too long.|
129+
| Mismatch | Low scored words|Review the script and the audio content to make sure they match and control the noise floor level. Reduce the length of long silence or split the audio into multiple utterances if it's too long.|
130130
| Mismatch | Start silence issue |Extra audio was heard before the first word. Review the script and the audio content to make sure they match, control the noise floor level, and make the first 100 ms silent.|
131131
| Mismatch | End silence issue| Extra audio was heard after the last word. Review the script and the audio content to make sure they match, control the noise floor level, and make the last 100 ms silent.|
132132
| Mismatch | Low signal-noise ratio | Audio SNR level is lower than 20 dB. At least 35 dB is recommended.|
133133
| Mismatch | No score available |Failed to recognize speech content in this audio. Check the audio and the script content to make sure the audio is valid, and matches the script.|
134134

135135
## Train your custom neural voice model
136136

137-
After your dataset has been validated, you can use it to build your custom neural voice model.
137+
After your data files have been validated, you can use them to build your custom neural voice model.
138138

139-
1. On the **Train model** tab, click **Train model** to create a voice model with the data you have uploaded.
139+
1. On the **Train model** tab, select **Train model** to create a voice model with the data you have uploaded.
140140

141141
2. Select the neural training method for your model and target language.
142142

143143
By default, your voice model is trained in the same language of your training data. You can also select to create a secondary language (preview) for your voice model. Check the languages supported for custom neural voice and cross-lingual feature: [language for customization](language-support.md#customization).
144144

145-
3. Next, choose the dataset you want to use for training, and specify a speaker file.
145+
3. Next, choose the data you want to use for training, and specify a speaker file.
146146

147147
>[!NOTE]
148148
>- You need to select at least 300 utterances to create a custom neural voice.
149149
>- To train a neural voice, you must specify a voice talent profile with the audio consent file provided of the voice talent acknowledging to use his/her speech data to train a custom voice model. Custom Neural Voice is available with limited access. Make sure you understand the [responsible AI requirements](/legal/cognitive-services/speech-service/custom-neural-voice/limited-access-custom-neural-voice?context=%2fazure%2fcognitive-services%2fspeech-service%2fcontext%2fcontext) and [apply the access here](https://aka.ms/customneural).
150-
>- On this page you can also select to upload your script for testing. The testing script must be a txt file, less than 1Mb. Supported encoding format includes ANSI/ASCII, UTF-8, UTF-8-BOM, UTF-16-LE, or UTF-16-BE. Each paragraph of the utterance will result in a separate audio. If you want to combine all sentences into one audio, make them in one paragraph.
150+
>- On this page you can also select to upload your script for testing. The testing script must be a txt file, less than 1 Mb. Supported encoding format includes ANSI/ASCII, UTF-8, UTF-8-BOM, UTF-16-LE, or UTF-16-BE. Each paragraph of the utterance will result in a separate audio. If you want to combine all sentences into one audio, make them in one paragraph.
151151
152152
4. Then, enter a **Name** and **Description** to help you identify this model.
153153

154-
Choose a name carefully. The name you enter here will be the name you use to specify the voice in your request for speech synthesis as part of the SSML input. Only letters, numbers, and a few punctuation characters such as -, \_, and (', ') are allowed. Use different names for different neural voice models.
154+
Choose a name carefully. The name you enter here will be the name you use to specify the voice in your request for speech synthesis as part of the SSML input. Only letters, numbers, and a few punctuation characters such as -, _, and (', ') are allowed. Use different names for different neural voice models.
155155

156-
A common use of the **Description** field is to record the names of the datasets that were used to create the model.
156+
A common use of the **Description** field is to record the names of the data that were used to create the model.
157157

158-
5. Review the settings, then click **Submit** to start training the model.
158+
5. Review the settings, then select **Submit** to start training the model.
159159

160160
> [!NOTE]
161-
> Duplicate audio names will be removed from the training. Make sure the datasets you select don't contain the same audio names across multiple .zip files.
161+
> Duplicate audio names will be removed from the training. Make sure the data you select don't contain the same audio names across multiple .zip files.
162162
163163
The **Train model** table displays a new entry that corresponds to this newly created model. The table also displays the status: Processing, Succeeded, Failed.
164164

165-
The status that's shown reflects the process of converting your dataset to a voice model, as shown here.
165+
The status that's shown reflects the process of converting your data to a voice model, as shown here.
166166

167167
| State | Meaning |
168168
| ----- | ------- |
@@ -187,10 +187,10 @@ After you've successfully created and tested your voice model, you deploy it in
187187

188188
You can do the following to create a custom neural voice endpoint.
189189

190-
1. On the **Deploy model** tab, click **Deploy models**.
190+
1. On the **Deploy model** tab, select **Deploy model**.
191191
2. Next, enter a **Name** and **Description** for your custom endpoint.
192192
3. Then, select a voice model you would like to associate with this endpoint.
193-
4. Finally, click **Deploy** to create your endpoint.
193+
4. Finally, select **Deploy** to create your endpoint.
194194

195195
After you've clicked the **Deploy** button, in the endpoint table, you'll see an entry for your new endpoint. It may take a few minutes to instantiate a new endpoint. When the status of the deployment is **Succeeded**, the endpoint is ready for use.
196196

0 commit comments

Comments
 (0)