Skip to content

Commit 320bb0f

Browse files
authored
Merge pull request #1363 from yt605155624/add_datasets
[doc]add tts datasets doc, test=doc
2 parents c8a5d1d + 867faa3 commit 320bb0f

File tree

1 file changed

+75
-0
lines changed

1 file changed

+75
-0
lines changed

docs/source/tts/tts_datasets.md

Lines changed: 75 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,75 @@
1+
# TTS Datasets
2+
<!--
3+
see https://openslr.org/
4+
-->
5+
## Mandarin
6+
- [CSMSC](https://www.data-baker.com/open_source.html): Chinese Standard Mandarin Speech Copus
7+
- Duration/h: 12
8+
- Number of Sentences: 10,000
9+
- Size: 2.14GB
10+
- Speaker: 1 female, ages 20 ~30
11+
- Sample Rate: 48 kHz、16bit
12+
- Mean Words per Clip: 16
13+
- [AISHELL-3](http://www.aishelltech.com/aishell_3)
14+
- Duration/h: 85
15+
- Number of Sentences: 88,035
16+
- Size: 17.75GB
17+
- Speaker: 218
18+
- Sample Rate: 44.1 kHz、16bit
19+
20+
## English
21+
- [LJSpeech](https://keithito.com/LJ-Speech-Dataset/)
22+
- Duration/h: 24
23+
- Number of Sentences: 13,100
24+
- Size: 2.56GB
25+
- Speaker: 1, age 20 ~30
26+
- Sample Rate: 22050 Hz、16bit
27+
- Mean Words per Clip: 17.23
28+
- [VCTK](https://datashare.ed.ac.uk/handle/10283/3443)
29+
- Number of Sentences: 44,583
30+
- Size: 10.94GB
31+
- Speaker: 110
32+
- Sample Rate: 48 kHz、16bit
33+
- Mean Words per Clip: 17.23
34+
35+
## Japanese
36+
<!--
37+
see https://sites.google.com/site/shinnosuketakamichi/publication/corpus
38+
-->
39+
40+
- [tri-jek](https://sites.google.com/site/shinnosuketakamichi/research-topics/tri-jek_corpus): Japanese-English-Korean tri-lingual corpus
41+
- [JSSS-misc](https://sites.google.com/site/shinnosuketakamichi/research-topics/jsss-misc_corpus): misc tasks of JSSS corpus
42+
- [JTubeSpeech](https://github.com/sarulab-speech/jtubespeech): Corpus of Japanese speech collected from YouTube
43+
- [J-MAC](https://sites.google.com/site/shinnosuketakamichi/research-topics/j-mac_corpus): Japanese multi-speaker audiobook corpus
44+
- [J-KAC](https://sites.google.com/site/shinnosuketakamichi/research-topics/j-kac_corpus): Japanese Kamishibai and audiobook corpus
45+
- [JMD](https://sites.google.com/site/shinnosuketakamichi/research-topics/jmd_corpus): Japanese multi-dialect corpus
46+
- [JSSS](https://sites.google.com/site/shinnosuketakamichi/research-topics/jsss_corpus): Japanese multi-style (summarization and simplification) corpus
47+
- [RWCP-SSD-Onomatopoeia](https://www.ksuke.net/dataset/rwcp-ssd-onomatopoeia): onomatopoeic word dataset for environmental sounds
48+
- [Life-m](https://sites.google.com/site/shinnosuketakamichi/research-topics/life-m_corpus): landmark image-themed music corpus
49+
- [PJS](https://sites.google.com/site/shinnosuketakamichi/research-topics/pjs_corpus): Phoneme-balanced Japanese singing voice corpus
50+
- [JVS-MuSiC](https://sites.google.com/site/shinnosuketakamichi/research-topics/jvs_music): Japanese multi-speaker singing-voice corpus
51+
- [JVS](https://sites.google.com/site/shinnosuketakamichi/research-topics/jvs_corpus): Japanese multi-speaker voice corpus
52+
- [JSUT-book](https://sites.google.com/site/shinnosuketakamichi/publication/jsut-book): audiobook corpus by a single Japanese speaker
53+
- [JSUT-vi](https://sites.google.com/site/shinnosuketakamichi/publication/jsut-vi): vocal imitation corpus by a single Japanese speaker
54+
- [JSUT-song](https://sites.google.com/site/shinnosuketakamichi/publication/jsut-song): singing voice corpus by a single Japanese singer
55+
- [JSUT](https://sites.google.com/site/shinnosuketakamichi/publication/jsut): a large-scaled corpus of reading-style Japanese speech by a single speaker
56+
57+
## Emotions
58+
### English
59+
- [CREMA-D](https://github.com/CheyneyComputerScience/CREMA-D)
60+
- [Seen and Unseen emotional style transfer for voice conversion with a new emotional speech dataset](https://kunzhou9646.github.io/controllable-evc/)
61+
- paper : [Seen and Unseen emotional style transfer for voice conversion with a new emotional speech dataset](https://arxiv.org/abs/2010.14794)
62+
### Mandarin
63+
- [EMOVIE Dataset](https://viem-ccy.github.io/EMOVIE/dataset_release )
64+
- paper: [EMOVIE: A Mandarin Emotion Speech Dataset with a Simple Emotional Text-to-Speech Model](https://arxiv.org/abs/2106.09317)
65+
- MASC
66+
- paper: [MASC: A Speech Corpus in Mandarin for Emotion Analysis and Affective Speaker Recognition](https://ieeexplore.ieee.org/document/4013501)
67+
### English && Mandarin
68+
- [Emotional Voice Conversion: Theory, Databases and ESD](https://github.com/HLTSingapore/Emotional-Speech-Data)
69+
- paper: [Emotional Voice Conversion: Theory, Databases and ESD](https://arxiv.org/abs/2105.14762)
70+
71+
## Music
72+
- [GiantMIDI-Piano](https://github.com/bytedance/GiantMIDI-Piano)
73+
- [MAESTRO Dataset](https://magenta.tensorflow.org/datasets/maestro)
74+
- [tf code](https://www.tensorflow.org/tutorials/audio/music_generation)
75+
- [Opencpop](https://wenet.org.cn/opencpop/)

0 commit comments

Comments
 (0)