-
Notifications
You must be signed in to change notification settings - Fork 133
HuggingFace audio dataset
Albert Zeyer edited this page Sep 29, 2025
·
8 revisions
(Generic) HuggingFace dataset in RETURNN: https://github.com/rwth-i6/returnn/issues/1257
(The underlying format is Arrow. (Parquet is another binary format supported, but not optimized for reading speed.))
HuggingFace audio datasets have some special structure. https://huggingface.co/docs/hub/en/datasets-audio
Examples: https://huggingface.co/datasets/openslr/librispeech_asr
{'chapter_id': 141231,
'file': '/home/albert/.cache/.../dev_clean/1272/141231/1272-141231-0000.flac',
'audio': {
'array': array([-0.00048828, -0.00018311, -0.00137329, ..., 0.00079346, 0.00091553, 0.00085449], dtype=float32),
'sampling_rate': 16000
},
'id': '1272-141231-0000',
'speaker_id': 1272,
'text': 'A MAN SAID TO THE UNIVERSE SIR I EXIST'}
https://huggingface.co/datasets/speechcolab/gigaspeech
{
'segment_id': 'YOU0000000315_S0000660',
'speaker': 'N/A',
'text': "AS THEY'RE LEAVING <COMMA> CAN KASH PULL ZAHRA ASIDE REALLY QUICKLY <QUESTIONMARK>",
'audio':
{
# in streaming mode 'path' will be 'xs_chunks_0000/YOU0000000315_S0000660.wav'
'path': '/home/user/.cache/huggingface/datasets/downloads/extracted/9d48cf31/xs_chunks_0000/YOU0000000315_S0000660.wav',
'array': array([0.0005188 , 0.00085449, 0.00012207, ..., 0.00125122, 0.00076294, 0.00036621], dtype=float32),
'sampling_rate': 16000
},
'begin_time': 2941.889892578125,
'end_time': 2945.070068359375,
'audio_id': 'YOU0000000315',
'title': 'Return to Vasselheim | Critical Role: VOX MACHINA | Episode 43',
'url': 'https://www.youtube.com/watch?v=zr2n1fLVasU',
'source': 2,
'category': 24,
'original_full_path': 'audio/youtube/P0004/YOU0000000315.opus'
}