🔊 Audio Understanding

Audio Understanding refers to the task category involving the process of comprehending the content and context of an audio signal. These signals include ambient sounds, music, speaker, pitches, etc. This task category aims to capture the unique characteristics of the audio inputs.

Sample Config

dataset_metric:
  # All datasets within **Audio Understanding** task category
  - all
  
  # All datasets within **scene_understanding** sub-task category
  - ['scene_understanding', 'llm_judge_detailed']

  # Individual dataset
  - ["mu_chomusic_test", "llm_judge_binary"]

📊 Supported Datasets for Audio Understanding

Dataset Name	Task type	config	Description	License
MU_CHOMUSIC	Music Understanding	music_understanding/mu_chomusic	Benchmark designed to evaluate music understanding in multimodal audio-language models	CC-BY-SA-4.0
AUDIOCAPS	Scene Understanding	scene_understanding/audiocaps	Large-scale audio captioning dataset for sound in the wild	MIT
AUDIOCAPS_QA	Scene Understanding	scene_understanding/audiocaps_qa	Audio Question Answering dataset for evaluating interactive audio understanding	MIT
CLOTHO_AQA	Scene Understanding	scene_understanding/clotho_aqa	Multimodal translation task where a system analyzes an audio signal and a natural language question, to generate a desirable natural language answer	MIT
WAVCAPS_QA	Scene Understanding	scene_understanding/wavcaps_qa	Large-scale Audio Question Answering dataset for evaluating interactive audio understanding	CC-BY-NC 4.0
WAVCAPS	Scene Understanding	scene_understanding/wavcaps	Large-scale weakly-labelled audio captioning dataset	CC-BY-NC 4.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

🔊 Audio Understanding

Sample Config

📊 Supported Datasets for Audio Understanding

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

🔊 Audio Understanding

Sample Config

📊 Supported Datasets for Audio Understanding