Question about ID mapping between AudioSkills and original AudioSet

Hi, thanks for releasing the great [nvidia/AudioSkills](https://huggingface.co/datasets/nvidia/AudioSkills) dataset!

I’m currently exploring the dataset (e.g., audioskills_xl/AudioSet.json) and noticed that the "id" fields (e.g., "YJRaxh5RfawI", "YFHuxuM-iRo4") don’t seem to correspond to any YouTube IDs in the official AudioSet CSVs (balanced_train_segments.csv, unbalanced_train_segments.csv, eval_segments.csv) from [AudioSet](https://research.google.com/audioset/).

I checked using:

grep -E "YJRaxh5RfawI|YFHuxuM-iRo4" Audioset/{eval_segments.csv,unbalanced_train_segments.csv,balanced_train_segments.csv}

No matches were found.

My questions:
	•	Are these IDs derived from original AudioSet YouTube IDs?
	•	If not, what’s the correct way to align AudioSkills samples with entries from the original AudioSet (for example, to obtain labels or categories)?
	•	Is there a mapping file between AudioSkills IDs and original AudioSet segment IDs?

Any clarification would be greatly appreciated — thanks again for this excellent work!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about ID mapping between AudioSkills and original AudioSet #50

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Question about ID mapping between AudioSkills and original AudioSet #50

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions