Skip to content

Commit 3921fc7

Browse files
authored
Merge branch 'master' into cppinference
2 parents a0e0a66 + 5f21de9 commit 3921fc7

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

53 files changed

+1819
-1249
lines changed

.gitignore

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -35,4 +35,10 @@ ljspeech
3535
LibriTTS/
3636
dataset/
3737
mfa/
38-
kss
38+
kss/
39+
baker/
40+
libritts/
41+
dump_baker/
42+
dump_ljspeech/
43+
dump_kss/
44+
dump_libritts/

README.md

Lines changed: 16 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,7 @@
1919
:zany_face: TensorflowTTS provides real-time state-of-the-art speech synthesis architectures such as Tacotron-2, Melgan, Multiband-Melgan, FastSpeech, FastSpeech2 based-on TensorFlow 2. With Tensorflow 2, we can speed-up training/inference progress, optimizer further by using [fake-quantize aware](https://www.tensorflow.org/model_optimization/guide/quantization/training_comprehensive_guide) and [pruning](https://www.tensorflow.org/model_optimization/guide/pruning/pruning_with_keras), make TTS models can be run faster than real-time and be able to deploy on mobile devices or embedded systems.
2020

2121
## What's new
22+
- 2020/08/18 **(NEW!)** Update [new base processor](https://github.com/TensorSpeech/TensorFlowTTS/blob/master/tensorflow_tts/processor/base_processor.py). Add [AutoProcessor](https://github.com/TensorSpeech/TensorFlowTTS/blob/master/tensorflow_tts/inference/auto_processor.py) and [pretrained processor](https://github.com/TensorSpeech/TensorFlowTTS/blob/master/tensorflow_tts/processor/pretrained/) json file.
2223
- 2020/08/14 **(NEW!)** Support Chinese TTS. Pls see the [colab](https://colab.research.google.com/drive/1YpSHRBRPBI7cnTkQn1UcVTWEQVbsUm1S?usp=sharing). Thank [@azraelkuan](https://github.com/azraelkuan).
2324
- 2020/08/05 **(NEW!)** Support Korean TTS. Pls see the [colab](https://colab.research.google.com/drive/1ybWwOS5tipgPFttNulp77P6DAB5MtiuN?usp=sharing). Thank [@crux153](https://github.com/crux153).
2425
- 2020/07/17 Support MultiGPU for all Trainer.
@@ -93,7 +94,7 @@ Here in an audio samples on valid set. [tacotron-2](https://drive.google.com/ope
9394

9495
Prepare a dataset in the following format:
9596
```
96-
|- datasets/
97+
|- [NAME_DATASET]/
9798
| |- metadata.csv
9899
| |- wav/
99100
| |- file1.wav
@@ -102,6 +103,8 @@ Prepare a dataset in the following format:
102103

103104
where `metadata.csv` has the following format: `id|transcription`. This is a ljspeech-like format, you can ignore preprocessing steps if you have other format dataset.
104105

106+
Note that `NAME_DATASET` should be `[ljspeech/kss/baker/libritts]` for example.
107+
105108
## Preprocessing
106109

107110
The preprocessing has two steps:
@@ -116,20 +119,22 @@ The preprocessing has two steps:
116119

117120
To reproduce the steps above:
118121
```
119-
tensorflow-tts-preprocess --rootdir ./datasets --outdir ./dump --config preprocess/[ljspeech/kss/baker]_preprocess.yaml --dataset [ljspeech/kss/baker]
120-
tensorflow-tts-normalize --rootdir ./dump --outdir ./dump --config preprocess/[ljspeech/kss/baker]_preprocess.yaml --dataset [ljspeech/kss/baker]
122+
tensorflow-tts-preprocess --rootdir ./[ljspeech/kss/baker/libritts] --outdir ./dump_[ljspeech/kss/baker/libritts] --config preprocess/[ljspeech/kss/baker]_preprocess.yaml --dataset [ljspeech/kss/baker/libritts]
123+
tensorflow-tts-normalize --rootdir ./dump_[ljspeech/kss/baker/libritts] --outdir ./dump_[ljspeech/kss/baker/libritts] --config preprocess/[ljspeech/kss/baker/libritts]_preprocess.yaml --dataset [ljspeech/kss/baker/libritts]
121124
```
122125

123-
Right now we only support [`ljspeech`](https://keithito.com/LJ-Speech-Dataset/), [`kss`](https://www.kaggle.com/bryanpark/korean-single-speaker-speech-dataset), [`baker`](https://weixinxcxdb.oss-cn-beijing.aliyuncs.com/gwYinPinKu/BZNSYP.rar) for dataset argument. In the future, we intend to support more datasets.
126+
Right now we only support [`ljspeech`](https://keithito.com/LJ-Speech-Dataset/), [`kss`](https://www.kaggle.com/bryanpark/korean-single-speaker-speech-dataset), [`baker`](https://weixinxcxdb.oss-cn-beijing.aliyuncs.com/gwYinPinKu/BZNSYP.rar) and [`libritts`](http://www.openslr.org/60/) for dataset argument. In the future, we intend to support more datasets.
127+
128+
**Note**: To runing `libritts` preprocessing, please first read the instruction in [examples/fastspeech2_libritts](https://github.com/TensorSpeech/TensorFlowTTS/tree/master/examples/fastspeech2_libritts). We need reformat it first before run preprocessing.
124129

125130
After preprocessing, the structure of the project folder should be:
126131
```
127-
|- datasets/
132+
|- [NAME_DATASET]/
128133
| |- metadata.csv
129134
| |- wav/
130135
| |- file1.wav
131136
| |- ...
132-
|- dump/
137+
|- dump_[ljspeech/kss/baker/libritts]/
133138
| |- train/
134139
| |- ids/
135140
| |- LJ001-0001-ids.npy
@@ -190,6 +195,7 @@ We use suffix (`ids`, `raw-feats`, `raw-energy`, `raw-f0`, `norm-feats` and `wav
190195

191196
**IMPORTANT NOTES**:
192197
- This preprocessing step is based on [ESPnet](https://github.com/espnet/espnet) so you can combine all models here with other models from ESPnet repository.
198+
- Regardless how your dataset is formatted, the final structure of `dump` folder **SHOULD** follow above structure to be able use the training script or you can modify by yourself 😄.
193199

194200
## Training models
195201

@@ -198,6 +204,7 @@ To know how to training model from scratch or fine-tune with other datasets/lang
198204
- For Tacotron-2 tutorial, pls see [example/tacotron2](https://github.com/dathudeptrai/TensorflowTTS/tree/master/examples/tacotron2)
199205
- For FastSpeech tutorial, pls see [example/fastspeech](https://github.com/dathudeptrai/TensorflowTTS/tree/master/examples/fastspeech)
200206
- For FastSpeech2 tutorial, pls see [example/fastspeech2](https://github.com/dathudeptrai/TensorflowTTS/tree/master/examples/fastspeech2)
207+
- For FastSpeech2 + MFA tutorial, pls see [example/fastspeech2_libritts](https://github.com/dathudeptrai/TensorflowTTS/tree/master/examples/fastspeech2_libritts)
201208
- For MelGAN tutorial, pls see [example/melgan](https://github.com/dathudeptrai/TensorflowTTS/tree/master/examples/melgan)
202209
- For MelGAN + STFT Loss tutorial, pls see [example/melgan.stft](https://github.com/dathudeptrai/TensorflowTTS/tree/master/examples/melgan.stft)
203210
- For Multiband-MelGAN tutorial, pls see [example/multiband_melgan](https://github.com/dathudeptrai/TensorflowTTS/tree/master/examples/multiband_melgan)
@@ -241,10 +248,9 @@ import yaml
241248

242249
import tensorflow as tf
243250

244-
from tensorflow_tts.processor import LJSpeechProcessor
245-
246251
from tensorflow_tts.inference import AutoConfig
247252
from tensorflow_tts.inference import TFAutoModel
253+
from tensorflow_tts.inference import AutoProcessor
248254

249255
# initialize fastspeech model.
250256
fs_config = AutoConfig.from_pretrained('/examples/fastspeech/conf/fastspeech.v1.yaml')
@@ -263,7 +269,7 @@ melgan = TFAutoModel.from_pretrained(
263269

264270

265271
# inference
266-
processor = LJSpeechProcessor(None, cleaner_names="english_cleaners")
272+
processor = AutoProcessor.from_pretrained(pretrained_path="./test/files/ljspeech_mapper.json")
267273

268274
ids = processor.text_to_sequence("Recent research at Harvard has shown meditating for as little as 8 weeks, can actually increase the grey matter in the parts of the brain responsible for emotional regulation, and learning.")
269275
ids = tf.expand_dims(ids, 0)
@@ -285,7 +291,7 @@ sf.write('./audio_after.wav', audio_after, 22050, "PCM_16")
285291
```
286292

287293
# Contact
288-
[Minh Nguyen Quan Anh](https://github.com/dathudeptrai): [email protected], [erogol](https://github.com/erogol): [email protected], [Kuan Chen](https://github.com/azraelkuan): [email protected], [Takuya Ebata](https://github.com/MokkeMeguru): [email protected], [Trinh Le Quang](https://github.com/l4zyf9x): [email protected]
294+
[Minh Nguyen Quan Anh](https://github.com/dathudeptrai): [email protected], [erogol](https://github.com/erogol): [email protected], [Kuan Chen](https://github.com/azraelkuan): [email protected], [Dawid Kobus](https://github.com/machineko): [email protected], [Takuya Ebata](https://github.com/MokkeMeguru): [email protected], [Trinh Le Quang](https://github.com/l4zyf9x): trinhle.cse@gmail.com, [Yunchao He](https://github.com/candlewill): [email protected], [Alejandro Miguel Velasquez](https://github.com/ZDisket): xml506ok@gmail.com
289295

290296
# License
291297
Overrall, Almost models here are licensed under the [Apache 2.0](http://www.apache.org/licenses/LICENSE-2.0) for all countries in the world, except in **Viet Nam** this framework cannot be used for production in any way without permission from TensorflowTTS's Authors. There is an exception, Tacotron-2 can be used with any perpose. So, if you are VietNamese and want to use this framework for production, you **Must** contact our in andvance.

examples/fastspeech2_multispeaker/README.md renamed to examples/fastspeech2_libritts/README.md

Lines changed: 11 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -3,15 +3,15 @@
33
## Prepare
44
Everything is done from main repo folder so TensorflowTTS/
55

6-
0. Optional* [Download](http://www.openslr.org/60/) and prepare libritts (helper to prepare libri in examples/fastspeech2_multispeaker/libri_experiment/prepare_libri.ipynb)
6+
0. Optional* [Download](http://www.openslr.org/60/) and prepare libritts (helper to prepare libri in examplesfastspeech2_libritts/libri_experiment/prepare_libri.ipynb)
77
- Dataset structure after finish this step:
88
```
99
|- TensorFlowTTS/
1010
| |- LibriTTS/
1111
| |- |- train-clean-100/
1212
| |- |- SPEAKERS.txt
1313
| |- |- ...
14-
| |- dataset/
14+
| |- libritts/
1515
| |- |- 200/
1616
| |- |- |- 200_124139_000001_000000.txt
1717
| |- |- |- 200_124139_000001_000000.wav
@@ -25,32 +25,32 @@ Everything is done from main repo folder so TensorflowTTS/
2525
1. Extract Duration (use examples/mfa_extraction or pretrained tacotron2)
2626
2. Optional* build docker
2727
- ```
28-
bash examples/fastspeech2_multispeaker/scripts/build.sh
28+
bash examples/fastspeech2_libritts/scripts/build.sh
2929
```
3030
3. Optional* run docker
3131
- ```
32-
bash examples/fastspeech2_multispeaker/scripts/interactive.sh
32+
bash examples/fastspeech2_libritts/scripts/interactive.sh
3333
```
3434
4. Preprocessing:
3535
- ```
36-
tensorflow-tts-preprocess --rootdir ./dataset \
37-
--outdir ./dump \
36+
tensorflow-tts-preprocess --rootdir ./libritts \
37+
--outdir ./dump_libritts \
3838
--config preprocess/preprocess_libritts.yaml \
39-
--dataset multispeaker
39+
--dataset libritts
4040
```
4141

4242
5. Normalization:
4343
- ```
44-
tensorflow-tts-normalize --rootdir ./dump \
45-
--outdir ./dump \
44+
tensorflow-tts-normalize --rootdir ./dump_libritts \
45+
--outdir ./dump_libritts \
4646
--config preprocess/preprocess_libritts.yaml \
47-
--dataset multispeaker
47+
--dataset libritts
4848
```
4949

5050
6. Change CharactorDurationF0EnergyMelDataset speaker mapper in fastspeech2_dataset to match your dataset (if you use libri with mfa_extraction you didnt need to change anything)
5151
7. Change train_libri.sh to match your dataset and run:
5252
- ```
53-
bash examples/fastspeech2_multispeaker/scripts/train_libri.sh
53+
bash examples/fastspeech2_libritts/scripts/train_libri.sh
5454
```
5555
8. Optional* If u have problems with tensor sizes mismatch check step 5 in `examples/mfa_extraction` directory
5656

examples/fastspeech2_multispeaker/conf/fastspeech2libritts.yaml renamed to examples/fastspeech2_libritts/conf/fastspeech2libritts.yaml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,7 @@ format: "npy"
1515
model_type: fastspeech2
1616

1717
fastspeech2_params:
18+
dataset: "libritts"
1819
n_speakers: 20
1920
encoder_hidden_size: 384
2021
encoder_num_hidden_layers: 4

examples/fastspeech2_multispeaker/fastspeech2_dataset.py renamed to examples/fastspeech2_libritts/fastspeech2_dataset.py

File renamed without changes.

examples/fastspeech2_multispeaker/libri_experiment/prepare_libri.ipynb renamed to examples/fastspeech2_libritts/libri_experiment/prepare_libri.ipynb

Lines changed: 10 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -9,9 +9,10 @@
99
"import os\n",
1010
"import random\n",
1111
"import shutil\n",
12+
"import sys\n",
1213
"\n",
13-
"libri_path = \"...../TensorflowTTS/LibriTTS\" # absolute path to TensorFlowTTS.\n",
14-
"dataset_path = \"...../TensorflowTTS/dataset\" # Change to your paths\n",
14+
"libri_path = \"....../LibriTTS\" # absolute path to TensorFlowTTS.\n",
15+
"dataset_path = \"....../libritts\" # Change to your paths. This is a output of re-format dataset.\n",
1516
"subset = \"train-clean-100\""
1617
]
1718
},
@@ -122,6 +123,13 @@
122123
" shutil.copy(j, os.path.join(dataset_path, sp_id, f_name))\n",
123124
" shutil.copy(j.replace(\".wav\", \".normalized.txt\"), os.path.join(dataset_path, sp_id, text_f_name))"
124125
]
126+
},
127+
{
128+
"cell_type": "code",
129+
"execution_count": null,
130+
"metadata": {},
131+
"outputs": [],
132+
"source": []
125133
}
126134
],
127135
"metadata": {
File renamed without changes.

examples/fastspeech2_multispeaker/scripts/docker/Dockerfile renamed to examples/fastspeech2_libritts/scripts/docker/Dockerfile

File renamed without changes.

examples/fastspeech2_multispeaker/scripts/interactive.sh renamed to examples/fastspeech2_libritts/scripts/interactive.sh

File renamed without changes.

examples/fastspeech2_multispeaker/scripts/train_libri.sh renamed to examples/fastspeech2_libritts/scripts/train_libri.sh

File renamed without changes.

0 commit comments

Comments
 (0)