672-hour Multi-person Meeting Multi-channel Speech Dataset covers meeting scenarios with 3-6 participants, collected in various conference room environments, mirroring real-world meeting interactions. Transcribed with text content, speaker's ID, gender, location and other attributes.
For more details, please refer to the link: https://www.nexdata.ai/datasets/speechrecog/1203?source=Github
48kHz, 16bit, wav, 16channels;
8kHz, 16bit, wav, 8 channels;
48kHz, 16bit, wav, mono channel;
16kHz, 16bit, wav, mono channel.
Four different-sized conference rooms, with each size specification including three different rooms.
Simulate a real meeting scenario;
984 Chinese;
extract and annotate individual sentences with their start and end timestamps, speaker identification, and spoken text content;
16-microphone array, 8-microphone array, high-fidelity microphone, mobile phone;
mandarin;
speech recognition; voiceprint recognition;
sentences accuracy rate of 97%
Commercial License