You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+13-7Lines changed: 13 additions & 7 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -93,7 +93,7 @@ Here in an audio samples on valid set. [tacotron-2](https://drive.google.com/ope
93
93
94
94
Prepare a dataset in the following format:
95
95
```
96
-
|- datasets/
96
+
|- [NAME_DATASET]/
97
97
| |- metadata.csv
98
98
| |- wav/
99
99
| |- file1.wav
@@ -102,6 +102,8 @@ Prepare a dataset in the following format:
102
102
103
103
where `metadata.csv` has the following format: `id|transcription`. This is a ljspeech-like format, you can ignore preprocessing steps if you have other format dataset.
104
104
105
+
Note that `NAME_DATASET` should be `[ljspeech/kss/baker/libritts]` for example.
106
+
105
107
## Preprocessing
106
108
107
109
The preprocessing has two steps:
@@ -116,20 +118,22 @@ The preprocessing has two steps:
Right now we only support [`ljspeech`](https://keithito.com/LJ-Speech-Dataset/), [`kss`](https://www.kaggle.com/bryanpark/korean-single-speaker-speech-dataset), [`baker`](https://weixinxcxdb.oss-cn-beijing.aliyuncs.com/gwYinPinKu/BZNSYP.rar) for dataset argument. In the future, we intend to support more datasets.
125
+
Right now we only support [`ljspeech`](https://keithito.com/LJ-Speech-Dataset/), [`kss`](https://www.kaggle.com/bryanpark/korean-single-speaker-speech-dataset), [`baker`](https://weixinxcxdb.oss-cn-beijing.aliyuncs.com/gwYinPinKu/BZNSYP.rar) and [`libritts`](http://www.openslr.org/60/) for dataset argument. In the future, we intend to support more datasets.
126
+
127
+
**Note**: To runing `libritts` preprocessing, please first read the instruction in [examples/fastspeech2_libritts](https://github.com/TensorSpeech/TensorFlowTTS/tree/master/examples/fastspeech2_libritts). We need reformat it first before run preprocessing.
124
128
125
129
After preprocessing, the structure of the project folder should be:
126
130
```
127
-
|- datasets/
131
+
|- [NAME_DATASET]/
128
132
| |- metadata.csv
129
133
| |- wav/
130
134
| |- file1.wav
131
135
| |- ...
132
-
|- dump/
136
+
|- dump_[ljspeech/kss/baker/libritts]/
133
137
| |- train/
134
138
| |- ids/
135
139
| |- LJ001-0001-ids.npy
@@ -190,6 +194,7 @@ We use suffix (`ids`, `raw-feats`, `raw-energy`, `raw-f0`, `norm-feats` and `wav
190
194
191
195
**IMPORTANT NOTES**:
192
196
- This preprocessing step is based on [ESPnet](https://github.com/espnet/espnet) so you can combine all models here with other models from ESPnet repository.
197
+
- Regardless how your dataset is formatted, the final structure of `dump` folder **SHOULD** follow above structure to be able use the training script or you can modify by yourself 😄.
193
198
194
199
## Training models
195
200
@@ -198,6 +203,7 @@ To know how to training model from scratch or fine-tune with other datasets/lang
198
203
- For Tacotron-2 tutorial, pls see [example/tacotron2](https://github.com/dathudeptrai/TensorflowTTS/tree/master/examples/tacotron2)
199
204
- For FastSpeech tutorial, pls see [example/fastspeech](https://github.com/dathudeptrai/TensorflowTTS/tree/master/examples/fastspeech)
200
205
- For FastSpeech2 tutorial, pls see [example/fastspeech2](https://github.com/dathudeptrai/TensorflowTTS/tree/master/examples/fastspeech2)
206
+
- For FastSpeech2 + MFA tutorial, pls see [example/fastspeech2_libritts](https://github.com/dathudeptrai/TensorflowTTS/tree/master/examples/fastspeech2_libritts)
201
207
- For MelGAN tutorial, pls see [example/melgan](https://github.com/dathudeptrai/TensorflowTTS/tree/master/examples/melgan)
202
208
- For MelGAN + STFT Loss tutorial, pls see [example/melgan.stft](https://github.com/dathudeptrai/TensorflowTTS/tree/master/examples/melgan.stft)
203
209
- For Multiband-MelGAN tutorial, pls see [example/multiband_melgan](https://github.com/dathudeptrai/TensorflowTTS/tree/master/examples/multiband_melgan)
Overrall, Almost models here are licensed under the [Apache 2.0](http://www.apache.org/licenses/LICENSE-2.0) for all countries in the world, except in **Viet Nam** this framework cannot be used for production in any way without permission from TensorflowTTS's Authors. There is an exception, Tacotron-2 can be used with any perpose. So, if you are VietNamese and want to use this framework for production, you **Must** contact our in andvance.
Copy file name to clipboardExpand all lines: examples/fastspeech2_libritts/README.md
+11-11Lines changed: 11 additions & 11 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -3,15 +3,15 @@
3
3
## Prepare
4
4
Everything is done from main repo folder so TensorflowTTS/
5
5
6
-
0. Optional*[Download](http://www.openslr.org/60/) and prepare libritts (helper to prepare libri in examples/fastspeech2_multispeaker/libri_experiment/prepare_libri.ipynb)
6
+
0. Optional*[Download](http://www.openslr.org/60/) and prepare libritts (helper to prepare libri in examplesfastspeech2_libritts/libri_experiment/prepare_libri.ipynb)
7
7
- Dataset structure after finish this step:
8
8
```
9
9
|- TensorFlowTTS/
10
10
| |- LibriTTS/
11
11
| |- |- train-clean-100/
12
12
| |- |- SPEAKERS.txt
13
13
| |- |- ...
14
-
| |- dataset/
14
+
| |- libritts/
15
15
| |- |- 200/
16
16
| |- |- |- 200_124139_000001_000000.txt
17
17
| |- |- |- 200_124139_000001_000000.wav
@@ -25,32 +25,32 @@ Everything is done from main repo folder so TensorflowTTS/
25
25
1. Extract Duration (use examples/mfa_extraction or pretrained tacotron2)
6. Change CharactorDurationF0EnergyMelDataset speaker mapper in fastspeech2_dataset to match your dataset (if you use libri with mfa_extraction you didnt need to change anything)
51
51
7. Change train_libri.sh to match your dataset and run:
0 commit comments