INTERSPEECH-2025-MLC-SLM-Challenge-Dataset

Description

The INTERSPEECH 2025 MLC-SLM Challenge Dataset, curated by Datatang, is derived from fifteen proprietary conversational speech corpora. Distinguished by exceptional annotation accuracy and operational reliability, this dataset is engineered to address critical challenges in multilingual automatic speech recognition (ASR) and long-context comprehension. It meticulously replicates real-world complexities including spontaneous interruptions and speaker overlaps across 11 languages (1500 hours total duration), thereby providing robust training resources for developing world-ready ASR systems.

For more details, please refer to the link: https://www.nexdata.ai/datasets/speechrecog/1892?source=Github

Specifications

Format

16kHz, 16bit, uncompressed wav, mono channel;

Recording Environment

quiet indoor environment, without echo;

Recording content

dozens of topics are specified, and the speakers make dialogue under those topics while the recording is performed;

Annotation

annotating for the transcription text, speaker identification, gender;

Device

Android mobile phone, iPhone;

Language

American English/British English/Filipino English/Australian English/Indian English/French/German/Italian/Japanese/Korean/Portuguese(Europe)/Russian/Spanish(Spain)/Thai/Vietnamese.

Licensing Information

Commercial License

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

INTERSPEECH-2025-MLC-SLM-Challenge-Dataset

Description

Specifications

Format

Recording Environment

Recording content

Annotation

Device

Language

Licensing Information

About

Uh oh!

Releases

Packages

Nexdata-AI/INTERSPEECH-2025-MLC-SLM-Challenge-Dataset

Folders and files

Latest commit

History

Repository files navigation

INTERSPEECH-2025-MLC-SLM-Challenge-Dataset

Description

Specifications

Format

Recording Environment

Recording content

Annotation

Device

Language

Licensing Information

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Packages