Skip to content

Commit 1cc0d32

Browse files
Update record-custom-voice-samples.md
Add the difference of script for VT to read (no-TN), and the script to upload as training data (TN)
1 parent 54a3034 commit 1cc0d32

File tree

1 file changed

+11
-3
lines changed

1 file changed

+11
-3
lines changed

articles/cognitive-services/Speech-Service/record-custom-voice-samples.md

Lines changed: 11 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -103,13 +103,21 @@ Below are some general guidelines that you can follow to create a good corpus (r
103103

104104
With that, make sure your voice talent pronounces these words in the expected way. Keep your script and recordings match consistently during the training process.
105105

106-
> [!NOTE]
107-
> The scripts prepared for your voice talent need to follow the native reading conventions, such as 50% and $45, while the scripts used for training need to be normalized to make sure that the scripts match the audio content, such as *fifty percent* and *forty-five dollars*. Check the scripts used for training against the recordings of your voice talent, to make sure they match.
108-
109106
- Your script should include many different words and sentences with different kinds of sentence lengths, structures, and moods.
110107

111108
- Check the script carefully for errors. If possible, have someone else check it too. When you run through the script with your talent, you'll probably catch a few more mistakes.
112109

110+
### Difference between script for voice talent and script for training
111+
112+
The sample scripts we provided on [GitHub](https://github.com/Azure-Samples/Cognitive-Speech-TTS/tree/master/CustomVoice/script) are just defined for voice talent. If you use the sample scripts to upload for traning, you must normalize them in their spoken form. The scripts prepared for voice talent need to follow the native reading conventions, such as 50% and $45, while the scripts used for training need to be normalized to make sure that the scripts match the audio content, such as *fifty percent* and *forty-five dollars*. Make sure the scripts used for training match the recordings of your voice talent, especially scripts contaning digits, symbols, abbreviation, date, and time. We provide a few examples of text normalization rules and explain the difference between script for voice talent and script for training.
113+
114+
| Category |Script for voice talent<br> (non-normalized) | Script for training <br> (normalized) |
115+
| --------- | --------- | --------------------------- |
116+
| Digits, for example, '123'. |'123'| Normalize '123' according to the recordings, such as 'one hundred and twenty-three' . |
117+
| Symbols, for example, '50%'. | '50%' | Normalize '50%' according to the recordings, such as 'fifty percent'. |
118+
| Abbreviation, for example, 'ASAP'. | 'ASAP' | Normalize 'ASAP' according to the recordings, such as 'as soon as possible'. |
119+
| Date or time, for example, '2:30 PM'. | '2:30 PM' | Normalize '2:30 PM' according to the recordings, such as 'two thirty PM'. |
120+
113121
### Typical defects of a script
114122

115123
The script's poor quality can adversely affect the training results. To achieve high-quality training results, it's crucial to avoid the defects.

0 commit comments

Comments
 (0)