Skip to content

Latest commit

 

History

History
26 lines (21 loc) · 2.5 KB

File metadata and controls

26 lines (21 loc) · 2.5 KB

🔊 Audio Understanding

Audio Understanding refers to the task category involving the process of comprehending the content and context of an audio signal. These signals include ambient sounds, music, speaker, pitches, etc. This task category aims to capture the unique characteristics of the audio inputs.

Sample Config

dataset_metric:
  # All datasets within **Audio Understanding** task category
  - all
  
  # All datasets within **scene_understanding** sub-task category
  - ['scene_understanding', 'llm_judge_detailed']

  # Individual dataset
  - ["mu_chomusic_test", "llm_judge_binary"]

📊 Supported Datasets for Audio Understanding

Dataset Name Task type config Description License
MU_CHOMUSIC Music Understanding music_understanding/mu_chomusic Benchmark designed to evaluate music understanding in multimodal audio-language models CC-BY-SA-4.0
AUDIOCAPS Scene Understanding scene_understanding/audiocaps Large-scale audio captioning dataset for sound in the wild MIT
AUDIOCAPS_QA Scene Understanding scene_understanding/audiocaps_qa Audio Question Answering dataset for evaluating interactive audio understanding MIT
CLOTHO_AQA Scene Understanding scene_understanding/clotho_aqa Multimodal translation task where a system analyzes an audio signal and a natural language question, to generate a desirable natural language answer MIT
WAVCAPS_QA Scene Understanding scene_understanding/wavcaps_qa Large-scale Audio Question Answering dataset for evaluating interactive audio understanding CC-BY-NC 4.0
WAVCAPS Scene Understanding scene_understanding/wavcaps Large-scale weakly-labelled audio captioning dataset CC-BY-NC 4.0