Skip to content

Nexdata-AI/INTERSPEECH-2025-MLC-SLM-Challenge-Dataset

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 

Repository files navigation

INTERSPEECH-2025-MLC-SLM-Challenge-Dataset

Description

The INTERSPEECH 2025 MLC-SLM Challenge Dataset, curated by Datatang, is derived from fifteen proprietary conversational speech corpora. Distinguished by exceptional annotation accuracy and operational reliability, this dataset is engineered to address critical challenges in multilingual automatic speech recognition (ASR) and long-context comprehension. It meticulously replicates real-world complexities including spontaneous interruptions and speaker overlaps across 11 languages (1500 hours total duration), thereby providing robust training resources for developing world-ready ASR systems.

For more details, please refer to the link: https://www.nexdata.ai/datasets/speechrecog/1892?source=Github

Specifications

Format

16kHz, 16bit, uncompressed wav, mono channel;

Recording Environment

quiet indoor environment, without echo;

Recording content

dozens of topics are specified, and the speakers make dialogue under those topics while the recording is performed;

Annotation

annotating for the transcription text, speaker identification, gender;

Device

Android mobile phone, iPhone;

Language

American English/British English/Filipino English/Australian English/Indian English/French/German/Italian/Japanese/Korean/Portuguese(Europe)/Russian/Spanish(Spain)/Thai/Vietnamese.

Licensing Information

Commercial License

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published