You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
|**Individual utterances + matching transcript**| A collection (.zip) of audio files (.wav) as individual utterances. Each audio file should be 15 seconds or less in length, paired with a formatted transcript (.txt). | Professional recordings with matching transcripts | Ready for training. |
44
-
|**Long audio + transcript (beta)**| A collection (.zip) of long, unsegmented audio files (.wav or .mp3, longer than 20 seconds), paired with a collection (.zip) of transcripts that contains all spoken words. | You have audio files and matching transcripts, but they aren't segmented into utterances. | Segmentation (using batch transcription).<br>Audio format transformation where required. |
45
-
|**Audio only (beta)**| A collection (.zip) of audio files (.wav or .mp3) without a transcript. | You only have audio files available, without transcripts. | Segmentation + transcript generation (using batch transcription).<br>Audio format transformation where required.|
44
+
|**Long audio + transcript (beta)**| A collection (.zip) of long, unsegmented audio files (.wav or .mp3, longer than 20 seconds, at most 1000 audio files), paired with a collection (.zip) of transcripts that contains all spoken words. | You have audio files and matching transcripts, but they aren't segmented into utterances. | Segmentation (using batch transcription).<br>Audio format transformation where required. |
45
+
|**Audio only (beta)**| A collection (.zip) of audio files (.wav or .mp3, at most 1000 audio files) without a transcript. | You only have audio files available, without transcripts. | Segmentation + transcript generation (using batch transcription).<br>Audio format transformation where required.|
46
46
47
47
Files should be grouped by type into a dataset and uploaded as a zip file. Each dataset can only contain a single data type.
48
48
@@ -118,7 +118,7 @@ Follow these guidelines when preparing audio for segmentation.
118
118
| Sample format |RIFF(.wav): PCM, at least 16-bit<br>mp3: at least 256 KBps bit rate|
119
119
| Audio length | Longer than 20 seconds |
120
120
| Archive format | .zip |
121
-
| Maximum archive size | 2048 MB |
121
+
| Maximum archive size | 2048 MB, at most 1000 audio files included|
122
122
123
123
> [!NOTE]
124
124
> The default sampling rate for a custom neural voice is 24,000 Hz. Audio files with a sampling rate lower than 16,000 Hz will be rejected. Your audio files with a sampling rate higher than 16,000 Hz and lower than 24,000 Hz will be up-sampled to 24,000 Hz to train a neural voice. It’s recommended that you should use a sample rate of 24,000 Hz for your training data.
@@ -158,7 +158,7 @@ Follow these guidelines when preparing audio.
158
158
| Sample format |RIFF(.wav): PCM, at least 16-bit<br>mp3: at least 256 KBps bit rate|
159
159
| Audio length | No limit |
160
160
| Archive format | .zip |
161
-
| Maximum archive size | 2048 MB |
161
+
| Maximum archive size | 2048 MB, at most 1000 audio files included|
162
162
163
163
> [!NOTE]
164
164
> The default sampling rate for a custom neural voice is 24,000 Hz. Your audio files with a sampling rate higher than 16,000 Hz and lower than 24,000 Hz will be up-sampled to 24,000 Hz to train a neural voice. It’s recommended that you should use a sample rate of 24,000 Hz for your training data.
0 commit comments