Skip to content

Commit 63698a9

Browse files
Update how-to-custom-voice-training-data.md
1 parent 34ab9ea commit 63698a9

File tree

1 file changed

+6
-6
lines changed

1 file changed

+6
-6
lines changed

articles/ai-services/speech-service/how-to-custom-voice-training-data.md

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -55,14 +55,14 @@ Follow these guidelines when preparing audio.
5555
| -------- | ----- |
5656
| File format | RIFF (.wav), grouped into a .zip file |
5757
| File name | File name characters supported by Windows OS, with .wav extension.<br>The characters `\ / : * ? " < > \|` aren't allowed. <br>It can't start or end with a space, and can't start with a dot. <br>No duplicate file names allowed. |
58-
| Sampling rate | When you create a custom neural voice, 24,000 Hz is required. |
58+
| Sampling rate | 24 KHz and higher required when creating a custom neural voice. |
5959
| Sample format | PCM, at least 16-bit |
6060
| Audio length | Shorter than 15 seconds |
6161
| Archive format | .zip |
6262
| Maximum archive size | 2048 MB |
6363

6464
> [!NOTE]
65-
> The default sampling rate for a custom neural voice is 24,000 Hz. Audio files with a sampling rate lower than 16,000 Hz will be rejected. If a .zip file contains .wav files with different sample rates, only those equal to or higher than 16,000 Hz will be imported. Your audio files with a sampling rate higher than 16,000 Hz and lower than 24,000 Hz will be up-sampled to 24,000 Hz to train a neural voice. It's recommended that you should use a sample rate of 24,000 Hz for your training data.
65+
> The default sampling rate for a custom neural voice is 24 KHz. Audio files with a sampling rate lower than 16,000 Hz will be rejected. If a .zip file contains .wav files with different sample rates, only those equal to or higher than 16,000 Hz will be imported. Your audio files with a sampling rate higher than 16,000 Hz and lower than 24 KHz will be up-sampled to 24 KHz to train a neural voice. It's recommended that you should use a sample rate of 24 KHz and higher for your training data.
6666
6767
### Transcription data for Individual utterances + matching transcript
6868

@@ -104,14 +104,14 @@ Follow these guidelines when preparing audio for segmentation.
104104
| -------- | ----- |
105105
| File format | RIFF (.wav) or .mp3, grouped into a .zip file |
106106
| File name | File name characters supported by Windows OS, with .wav extension. <br>The characters `\ / : * ? " < > \|` aren't allowed. <br>It can't start or end with a space, and can't start with a dot. <br>No duplicate file names allowed. |
107-
| Sampling rate | When you create a custom neural voice, 24,000 Hz is required. |
107+
| Sampling rate | 24 KHz and higher required when creating a custom neural voice. |
108108
| Sample format |RIFF(.wav): PCM, at least 16-bit.<br/><br/>mp3: At least 256 KBps bit rate.|
109109
| Audio length | Longer than 20 seconds |
110110
| Archive format | .zip |
111111
| Maximum archive size | 2048 MB, at most 1,000 audio files included |
112112

113113
> [!NOTE]
114-
> The default sampling rate for a custom neural voice is 24,000 Hz. Audio files with a sampling rate lower than 16,000 Hz will be rejected. Your audio files with a sampling rate higher than 16,000 Hz and lower than 24,000 Hz will be up-sampled to 24,000 Hz to train a neural voice. It's recommended that you should use a sample rate of 24,000 Hz for your training data.
114+
> The default sampling rate for a custom neural voice is 24 KHz. Audio files with a sampling rate lower than 16,000 Hz will be rejected. Your audio files with a sampling rate higher than 16,000 Hz and lower than 24 KHz will be up-sampled to 24 KHz to train a neural voice. It's recommended that you should use a sample rate of 24 KHz and higher for your training data.
115115
116116
All audio files should be grouped into a zip file. It's OK to put .wav files and .mp3 files into the same zip file. For example, you can upload a 45-second audio file named 'kingstory.wav' and a 200-second long audio file named 'queenstory.mp3' in the same zip file. All .mp3 files will be transformed into the .wav format after processing.
117117

@@ -147,14 +147,14 @@ Follow these guidelines when preparing audio.
147147
| -------- | ----- |
148148
| File format | RIFF (.wav) or .mp3, grouped into a .zip file |
149149
| File name | File name characters supported by Windows OS, with .wav extension. <br>The characters `\ / : * ? " < > \|` aren't allowed. <br>It can't start or end with a space, and can't start with a dot. <br>No duplicate file names allowed. |
150-
| Sampling rate | When you create a custom neural voice, 24,000 Hz is required. |
150+
| Sampling rate | 24 KHz and higher required when creating a custom neural voice. |
151151
| Sample format |RIFF(.wav): PCM, at least 16-bit<br>mp3: At least 256 KBps bit rate.|
152152
| Audio length | No limit |
153153
| Archive format | .zip |
154154
| Maximum archive size | 2048 MB, at most 1,000 audio files included |
155155

156156
> [!NOTE]
157-
> The default sampling rate for a custom neural voice is 24,000 Hz. Your audio files with a sampling rate higher than 16,000 Hz and lower than 24,000 Hz will be up-sampled to 24,000 Hz to train a neural voice. It's recommended that you should use a sample rate of 24,000 Hz for your training data.
157+
> The default sampling rate for a custom neural voice is 24 KHz. Your audio files with a sampling rate higher than 16,000 Hz and lower than 24 KHz will be up-sampled to 24 KHz to train a neural voice. It's recommended that you should use a sample rate of 24 KHz and higher for your training data.
158158
159159
All audio files should be grouped into a zip file. Once your dataset is successfully uploaded, the Speech service helps you segment the audio file into utterances based on our speech batch transcription service. Unique IDs are assigned to the segmented utterances automatically. Matching transcripts are generated through speech recognition. All .mp3 files will be transformed into the .wav format after processing. You can check the segmented utterances and the matching transcripts by downloading the dataset.
160160

0 commit comments

Comments
 (0)