-
Couldn't load subscription status.
- Fork 133
HuggingFace audio dataset
Albert Zeyer edited this page Sep 29, 2025
·
8 revisions
(Generic) HuggingFace dataset in RETURNN: https://github.com/rwth-i6/returnn/issues/1257
(The underlying format is Arrow. (Parquet is another binary format supported, but not optimized for reading speed.))
HuggingFace audio datasets have some special structure. https://huggingface.co/docs/hub/en/datasets-audio
Examples: https://huggingface.co/datasets/openslr/librispeech_asr
{'chapter_id': 141231,
'file': '/home/albert/.cache/.../dev_clean/1272/141231/1272-141231-0000.flac',
'audio': {
'array': array([-0.00048828, -0.00018311, -0.00137329, ..., 0.00079346, 0.00091553, 0.00085449], dtype=float32),
'sampling_rate': 16000
},
'id': '1272-141231-0000',
'speaker_id': 1272,
'text': 'A MAN SAID TO THE UNIVERSE SIR I EXIST'}