Skip to content

Archivoice/ACV-001

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

21 Commits
 
 
 
 

Repository files navigation

ACV-001

public male singing voice dataset

Dataset Info:

Format and Specs:

The dataset is manually labeled with the following systems (the native/non native refers to level of fluency):

Cantonese (language tag: ca/, non native)
Mandarin Chinese (language tag: cn/, native)
arpabet for English (language tag: en/, native)
Japanese via romaji (language tag: ja/, non native)
Korean via Team Coda's system (language tag: ko/, non native)
Taiwanese (language tag: tw/, native)

The dataset is around 3/5 fully manually labeled, around 2/5 is generated via WFL and manually corrected.
Overall, all labels have been manually checked and should be correct, save for a few incorrectly labeled phonemes.

To train exclusively in specific language, you can use tools such as notepad++ to search for / in the files via the "Find in Files" function. Which will make a list showing every sample that has a language tag.
It's recommended to use / instead of the full language tab, since in the case of Chinese and English, there can be more than one additional language.
Cantonese and Taiwanese do not have extra language tags.

The dataset is recorded at 16 bit 44.1k Hz in wav format and labeled in HTK label format (.lab).
Audio has been dereverbed, denoised, and partially normalized for more even consistency.

The dataset is released with two versions, full length and segmented.
The full length dataset only includes wav and lab files, whereas the segmented dataset includes ds files and a transcription.csv for diffsinger usage.
The ds contains f0 and note slur data. (Aside from English, all other data has been manually slur cut, English samples were processed via SOME)

Additional Info:

In the full length version of the dataset, humming is labeled MM instead of m to prevent the converter from merging it with SP and AP phonemes.
For English, since consonant clusters can have varying placement in a note. For example, in "drought", the r is treated as a vowel, but when it comes to "had read", the r is a consonant.
To prevent placement issues, consonant clusters where the semivowel (r, w, y, l) become vowels, the consonant before it is labeled in all caps (ie. "straight": s T r ey t).
Due to in Korean being eu i, but both are vowels, in cases of eu i, it's labeled eu ii instead to keep the phonemes in the same note. When using, do make sure to convert all instances of ii in the Korean language segment back to i to prevent issues.

The above only apply to full length samples, the segmented samples have already been converted back.

The dataset includes the following global phonemes: [exh,vf,cl,mlem,hx,axh], exh/axh for unvoiced and voiced exhales, vf for vocal fry, cl for stops, mlem for mouth clicks, hx for unvoiced sounds (ie. the h sound in "white")

Song List:

Moved to song list

Credits:

Voice provided by Jonathan Huang 黃奕晨, owner of ArchiVoice, X/Twitter

License:

ACV-001 © 2025 by YiChen Huang is licensed under CC BY-SA 4.0

The license only applies to direct use of the dataset and models mainly featuring the voice of ACV-001, and does not apply to models trained via parallel training.
Models trained using ACV-001 as supplementary data can follow its own license.

About

public male singing voice dataset

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors