Skip to content

HuggingFace audio dataset

Albert Zeyer edited this page Sep 29, 2025 · 8 revisions

(Generic) HuggingFace dataset in RETURNN: https://github.com/rwth-i6/returnn/issues/1257

(The underlying format is Arrow. (Parquet is another binary format supported, but not optimized for reading speed.))

HuggingFace audio datasets have some special structure. https://huggingface.co/docs/hub/en/datasets-audio

Examples: https://huggingface.co/datasets/openslr/librispeech_asr

{'chapter_id': 141231,
 'file': '/home/albert/.cache/.../dev_clean/1272/141231/1272-141231-0000.flac',
 'audio': {
    'array': array([-0.00048828, -0.00018311, -0.00137329, ...,  0.00079346, 0.00091553,  0.00085449], dtype=float32),
    'sampling_rate': 16000
 },
 'id': '1272-141231-0000',
 'speaker_id': 1272,
 'text': 'A MAN SAID TO THE UNIVERSE SIR I EXIST'}
Clone this wiki locally