HuggingFace audio dataset

(Generic) HuggingFace dataset in RETURNN: https://github.com/rwth-i6/returnn/issues/1257

(The underlying format is Arrow. (Parquet is another binary format supported, but not optimized for reading speed.))

HuggingFace audio datasets have some special structure. https://huggingface.co/docs/hub/en/datasets-audio

Examples: https://huggingface.co/datasets/openslr/librispeech_asr

{'chapter_id': 141231,
 'file': '/home/albert/.cache/.../dev_clean/1272/141231/1272-141231-0000.flac',
 'audio': {
    'array': array([-0.00048828, -0.00018311, -0.00137329, ...,  0.00079346, 0.00091553,  0.00085449], dtype=float32),
    'sampling_rate': 16000
 },
 'id': '1272-141231-0000',
 'speaker_id': 1272,
 'text': 'A MAN SAID TO THE UNIVERSE SIR I EXIST'}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

HuggingFace audio dataset

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally