You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+22-12Lines changed: 22 additions & 12 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -19,6 +19,8 @@
19
19
:zany_face: TensorflowTTS provides real-time state-of-the-art speech synthesis architectures such as Tacotron-2, Melgan, Multiband-Melgan, FastSpeech, FastSpeech2 based-on TensorFlow 2. With Tensorflow 2, we can speed-up training/inference progress, optimizer further by using [fake-quantize aware](https://www.tensorflow.org/model_optimization/guide/quantization/training_comprehensive_guide) and [pruning](https://www.tensorflow.org/model_optimization/guide/pruning/pruning_with_keras), make TTS models can be run faster than real-time and be able to deploy on mobile devices or embedded systems.
20
20
21
21
## What's new
22
+
- 2020/08/18 **(NEW!)** Update [new base processor](https://github.com/TensorSpeech/TensorFlowTTS/blob/master/tensorflow_tts/processor/base_processor.py). Add [AutoProcessor](https://github.com/TensorSpeech/TensorFlowTTS/blob/master/tensorflow_tts/inference/auto_processor.py) and [pretrained processor](https://github.com/TensorSpeech/TensorFlowTTS/blob/master/tensorflow_tts/processor/pretrained/) json file.
23
+
- 2020/08/14 **(NEW!)** Support Chinese TTS. Pls see the [colab](https://colab.research.google.com/drive/1YpSHRBRPBI7cnTkQn1UcVTWEQVbsUm1S?usp=sharing). Thank [@azraelkuan](https://github.com/azraelkuan).
22
24
- 2020/08/05 **(NEW!)** Support Korean TTS. Pls see the [colab](https://colab.research.google.com/drive/1ybWwOS5tipgPFttNulp77P6DAB5MtiuN?usp=sharing). Thank [@crux153](https://github.com/crux153).
23
25
- 2020/07/17 Support MultiGPU for all Trainer.
24
26
- 2020/07/05 Support Convert Tacotron-2, FastSpeech to Tflite. Pls see the [colab](https://colab.research.google.com/drive/1HudLLpT9CQdh2k04c06bHUwLubhGTWxA?usp=sharing). Thank @jaeyoo from TFlite team for his support.
@@ -35,15 +37,17 @@
35
37
- Mixed precision to speed-up training if posible.
36
38
- Support both Single/Multi GPU in base trainer class.
37
39
- TFlite conversion for all supported model.
40
+
- Android example.
41
+
- Support many languages (currently, we support Chinese, Korean, English.)
Different Tensorflow version should be working but not tested yet. This repo will try to work with latest stable tensorflow version. **We recommend you install tensorflow 2.3.0 to training in case you want to use MultiGPU.**
49
53
@@ -90,7 +94,7 @@ Here in an audio samples on valid set. [tacotron-2](https://drive.google.com/ope
90
94
91
95
Prepare a dataset in the following format:
92
96
```
93
-
|- datasets/
97
+
|- [NAME_DATASET]/
94
98
| |- metadata.csv
95
99
| |- wav/
96
100
| |- file1.wav
@@ -99,6 +103,8 @@ Prepare a dataset in the following format:
99
103
100
104
where `metadata.csv` has the following format: `id|transcription`. This is a ljspeech-like format, you can ignore preprocessing steps if you have other format dataset.
101
105
106
+
Note that `NAME_DATASET` should be `[ljspeech/kss/baker/libritts]` for example.
107
+
102
108
## Preprocessing
103
109
104
110
The preprocessing has two steps:
@@ -113,20 +119,22 @@ The preprocessing has two steps:
Right now we only support [`ljspeech`](https://keithito.com/LJ-Speech-Dataset/) and [`kss`](https://www.kaggle.com/bryanpark/korean-single-speaker-speech-dataset) for dataset argument. In the future, we intend to support more datasets.
126
+
Right now we only support [`ljspeech`](https://keithito.com/LJ-Speech-Dataset/), [`kss`](https://www.kaggle.com/bryanpark/korean-single-speaker-speech-dataset), [`baker`](https://weixinxcxdb.oss-cn-beijing.aliyuncs.com/gwYinPinKu/BZNSYP.rar) and [`libritts`](http://www.openslr.org/60/) for dataset argument. In the future, we intend to support more datasets.
127
+
128
+
**Note**: To runing `libritts` preprocessing, please first read the instruction in [examples/fastspeech2_libritts](https://github.com/TensorSpeech/TensorFlowTTS/tree/master/examples/fastspeech2_libritts). We need reformat it first before run preprocessing.
121
129
122
130
After preprocessing, the structure of the project folder should be:
123
131
```
124
-
|- datasets/
132
+
|- [NAME_DATASET]/
125
133
| |- metadata.csv
126
134
| |- wav/
127
135
| |- file1.wav
128
136
| |- ...
129
-
|- dump/
137
+
|- dump_[ljspeech/kss/baker/libritts]/
130
138
| |- train/
131
139
| |- ids/
132
140
| |- LJ001-0001-ids.npy
@@ -184,8 +192,10 @@ After preprocessing, the structure of the project folder should be:
184
192
185
193
We use suffix (`ids`, `raw-feats`, `raw-energy`, `raw-f0`, `norm-feats` and `wave`) for each type of input.
186
194
195
+
187
196
**IMPORTANT NOTES**:
188
197
- This preprocessing step is based on [ESPnet](https://github.com/espnet/espnet) so you can combine all models here with other models from ESPnet repository.
198
+
- Regardless how your dataset is formatted, the final structure of `dump` folder **SHOULD** follow above structure to be able use the training script or you can modify by yourself 😄.
189
199
190
200
## Training models
191
201
@@ -194,6 +204,7 @@ To know how to training model from scratch or fine-tune with other datasets/lang
194
204
- For Tacotron-2 tutorial, pls see [example/tacotron2](https://github.com/dathudeptrai/TensorflowTTS/tree/master/examples/tacotron2)
195
205
- For FastSpeech tutorial, pls see [example/fastspeech](https://github.com/dathudeptrai/TensorflowTTS/tree/master/examples/fastspeech)
196
206
- For FastSpeech2 tutorial, pls see [example/fastspeech2](https://github.com/dathudeptrai/TensorflowTTS/tree/master/examples/fastspeech2)
207
+
- For FastSpeech2 + MFA tutorial, pls see [example/fastspeech2_libritts](https://github.com/dathudeptrai/TensorflowTTS/tree/master/examples/fastspeech2_libritts)
197
208
- For MelGAN tutorial, pls see [example/melgan](https://github.com/dathudeptrai/TensorflowTTS/tree/master/examples/melgan)
198
209
- For MelGAN + STFT Loss tutorial, pls see [example/melgan.stft](https://github.com/dathudeptrai/TensorflowTTS/tree/master/examples/melgan.stft)
199
210
- For Multiband-MelGAN tutorial, pls see [example/multiband_melgan](https://github.com/dathudeptrai/TensorflowTTS/tree/master/examples/multiband_melgan)
@@ -237,10 +248,9 @@ import yaml
237
248
238
249
import tensorflow as tf
239
250
240
-
from tensorflow_tts.processor import LJSpeechProcessor
241
-
242
251
from tensorflow_tts.inference import AutoConfig
243
252
from tensorflow_tts.inference import TFAutoModel
253
+
from tensorflow_tts.inference import AutoProcessor
ids = processor.text_to_sequence("Recent research at Harvard has shown meditating for as little as 8 weeks, can actually increase the grey matter in the parts of the brain responsible for emotional regulation, and learning.")
Overrall, Almost models here are licensed under the [Apache 2.0](http://www.apache.org/licenses/LICENSE-2.0) for all countries in the world, except in **Viet Nam** this framework cannot be used for production in any way without permission from TensorflowTTS's Authors. There is an exception, Tacotron-2 can be used with any perpose. So, if you are VietNamese and want to use this framework for production, you **Must** contact our in andvance.
Everything is done from main repo folder so TensorflowTTS/
5
+
6
+
0. Optional*[Download](http://www.openslr.org/60/) and prepare libritts (helper to prepare libri in examplesfastspeech2_libritts/libri_experiment/prepare_libri.ipynb)
7
+
- Dataset structure after finish this step:
8
+
```
9
+
|- TensorFlowTTS/
10
+
| |- LibriTTS/
11
+
| |- |- train-clean-100/
12
+
| |- |- SPEAKERS.txt
13
+
| |- |- ...
14
+
| |- libritts/
15
+
| |- |- 200/
16
+
| |- |- |- 200_124139_000001_000000.txt
17
+
| |- |- |- 200_124139_000001_000000.wav
18
+
| |- |- |- ...
19
+
| |- |- 250/
20
+
| |- |- ...
21
+
| |- tensorflow_tts/
22
+
| |- models/
23
+
| |- ...
24
+
```
25
+
1. Extract Duration (use examples/mfa_extraction or pretrained tacotron2)
6. Change CharactorDurationF0EnergyMelDataset speaker mapper in fastspeech2_dataset to match your dataset (if you use libri with mfa_extraction you didnt need to change anything)
51
+
7. Change train_libri.sh to match your dataset and run:
0 commit comments