|
| 1 | +# TTS Datasets |
| 2 | +<!-- |
| 3 | +see https://openslr.org/ |
| 4 | +--> |
| 5 | +## Mandarin |
| 6 | +- [CSMSC](https://www.data-baker.com/open_source.html): Chinese Standard Mandarin Speech Copus |
| 7 | + - Duration/h: 12 |
| 8 | + - Number of Sentences: 10,000 |
| 9 | + - Size: 2.14GB |
| 10 | + - Speaker: 1 female, ages 20 ~30 |
| 11 | + - Sample Rate: 48 kHz、16bit |
| 12 | + - Mean Words per Clip: 16 |
| 13 | +- [AISHELL-3](http://www.aishelltech.com/aishell_3) |
| 14 | + - Duration/h: 85 |
| 15 | + - Number of Sentences: 88,035 |
| 16 | + - Size: 17.75GB |
| 17 | + - Speaker: 218 |
| 18 | + - Sample Rate: 44.1 kHz、16bit |
| 19 | + |
| 20 | +## English |
| 21 | +- [LJSpeech](https://keithito.com/LJ-Speech-Dataset/) |
| 22 | + - Duration/h: 24 |
| 23 | + - Number of Sentences: 13,100 |
| 24 | + - Size: 2.56GB |
| 25 | + - Speaker: 1, age 20 ~30 |
| 26 | + - Sample Rate: 22050 Hz、16bit |
| 27 | + - Mean Words per Clip: 17.23 |
| 28 | +- [VCTK](https://datashare.ed.ac.uk/handle/10283/3443) |
| 29 | + - Number of Sentences: 44,583 |
| 30 | + - Size: 10.94GB |
| 31 | + - Speaker: 110 |
| 32 | + - Sample Rate: 48 kHz、16bit |
| 33 | + - Mean Words per Clip: 17.23 |
| 34 | + |
| 35 | +## Japanese |
| 36 | +<!-- |
| 37 | +see https://sites.google.com/site/shinnosuketakamichi/publication/corpus |
| 38 | +--> |
| 39 | + |
| 40 | +- [tri-jek](https://sites.google.com/site/shinnosuketakamichi/research-topics/tri-jek_corpus): Japanese-English-Korean tri-lingual corpus |
| 41 | +- [JSSS-misc](https://sites.google.com/site/shinnosuketakamichi/research-topics/jsss-misc_corpus): misc tasks of JSSS corpus |
| 42 | +- [JTubeSpeech](https://github.com/sarulab-speech/jtubespeech): Corpus of Japanese speech collected from YouTube |
| 43 | +- [J-MAC](https://sites.google.com/site/shinnosuketakamichi/research-topics/j-mac_corpus): Japanese multi-speaker audiobook corpus |
| 44 | +- [J-KAC](https://sites.google.com/site/shinnosuketakamichi/research-topics/j-kac_corpus): Japanese Kamishibai and audiobook corpus |
| 45 | +- [JMD](https://sites.google.com/site/shinnosuketakamichi/research-topics/jmd_corpus): Japanese multi-dialect corpus |
| 46 | +- [JSSS](https://sites.google.com/site/shinnosuketakamichi/research-topics/jsss_corpus): Japanese multi-style (summarization and simplification) corpus |
| 47 | +- [RWCP-SSD-Onomatopoeia](https://www.ksuke.net/dataset/rwcp-ssd-onomatopoeia): onomatopoeic word dataset for environmental sounds |
| 48 | +- [Life-m](https://sites.google.com/site/shinnosuketakamichi/research-topics/life-m_corpus): landmark image-themed music corpus |
| 49 | +- [PJS](https://sites.google.com/site/shinnosuketakamichi/research-topics/pjs_corpus): Phoneme-balanced Japanese singing voice corpus |
| 50 | +- [JVS-MuSiC](https://sites.google.com/site/shinnosuketakamichi/research-topics/jvs_music): Japanese multi-speaker singing-voice corpus |
| 51 | +- [JVS](https://sites.google.com/site/shinnosuketakamichi/research-topics/jvs_corpus): Japanese multi-speaker voice corpus |
| 52 | +- [JSUT-book](https://sites.google.com/site/shinnosuketakamichi/publication/jsut-book): audiobook corpus by a single Japanese speaker |
| 53 | +- [JSUT-vi](https://sites.google.com/site/shinnosuketakamichi/publication/jsut-vi): vocal imitation corpus by a single Japanese speaker |
| 54 | +- [JSUT-song](https://sites.google.com/site/shinnosuketakamichi/publication/jsut-song): singing voice corpus by a single Japanese singer |
| 55 | +- [JSUT](https://sites.google.com/site/shinnosuketakamichi/publication/jsut): a large-scaled corpus of reading-style Japanese speech by a single speaker |
| 56 | + |
| 57 | +## Emotions |
| 58 | +### English |
| 59 | +- [CREMA-D](https://github.com/CheyneyComputerScience/CREMA-D) |
| 60 | +- [Seen and Unseen emotional style transfer for voice conversion with a new emotional speech dataset](https://kunzhou9646.github.io/controllable-evc/) |
| 61 | + - paper : [Seen and Unseen emotional style transfer for voice conversion with a new emotional speech dataset](https://arxiv.org/abs/2010.14794) |
| 62 | +### Mandarin |
| 63 | +- [EMOVIE Dataset](https://viem-ccy.github.io/EMOVIE/dataset_release ) |
| 64 | + - paper: [EMOVIE: A Mandarin Emotion Speech Dataset with a Simple Emotional Text-to-Speech Model](https://arxiv.org/abs/2106.09317) |
| 65 | +- MASC |
| 66 | + - paper: [MASC: A Speech Corpus in Mandarin for Emotion Analysis and Affective Speaker Recognition](https://ieeexplore.ieee.org/document/4013501) |
| 67 | +### English && Mandarin |
| 68 | +- [Emotional Voice Conversion: Theory, Databases and ESD](https://github.com/HLTSingapore/Emotional-Speech-Data) |
| 69 | + - paper: [Emotional Voice Conversion: Theory, Databases and ESD](https://arxiv.org/abs/2105.14762) |
| 70 | + |
| 71 | +## Music |
| 72 | +- [GiantMIDI-Piano](https://github.com/bytedance/GiantMIDI-Piano) |
| 73 | +- [MAESTRO Dataset](https://magenta.tensorflow.org/datasets/maestro) |
| 74 | + - [tf code](https://www.tensorflow.org/tutorials/audio/music_generation) |
| 75 | +- [Opencpop](https://wenet.org.cn/opencpop/) |
0 commit comments