|
19 | 19 | :zany_face: TensorflowTTS provides real-time state-of-the-art speech synthesis architectures such as Tacotron-2, Melgan, Multiband-Melgan, FastSpeech, FastSpeech2 based-on TensorFlow 2. With Tensorflow 2, we can speed-up training/inference progress, optimizer further by using [fake-quantize aware](https://www.tensorflow.org/model_optimization/guide/quantization/training_comprehensive_guide) and [pruning](https://www.tensorflow.org/model_optimization/guide/pruning/pruning_with_keras), make TTS models can be run faster than real-time and be able to deploy on mobile devices or embedded systems. |
20 | 20 |
|
21 | 21 | ## What's new |
| 22 | +- 2020/08/14 **(NEW!)** Support Chinese TTS. Pls see the [colab](https://colab.research.google.com/drive/1YpSHRBRPBI7cnTkQn1UcVTWEQVbsUm1S?usp=sharing). Thank [@azraelkuan](https://github.com/azraelkuan). |
22 | 23 | - 2020/08/05 **(NEW!)** Support Korean TTS. Pls see the [colab](https://colab.research.google.com/drive/1ybWwOS5tipgPFttNulp77P6DAB5MtiuN?usp=sharing). Thank [@crux153](https://github.com/crux153). |
23 | 24 | - 2020/07/17 Support MultiGPU for all Trainer. |
24 | 25 | - 2020/07/05 Support Convert Tacotron-2, FastSpeech to Tflite. Pls see the [colab](https://colab.research.google.com/drive/1HudLLpT9CQdh2k04c06bHUwLubhGTWxA?usp=sharing). Thank @jaeyoo from TFlite team for his support. |
|
35 | 36 | - Mixed precision to speed-up training if posible. |
36 | 37 | - Support both Single/Multi GPU in base trainer class. |
37 | 38 | - TFlite conversion for all supported model. |
| 39 | +- Android example. |
| 40 | +- Support many languages (currently, we support Chinese, Korean, English.) |
38 | 41 |
|
39 | 42 | ## Requirements |
40 | 43 | This repository is tested on Ubuntu 18.04 with: |
41 | 44 |
|
42 | | -- Python 3.6+ |
| 45 | +- Python 3.7+ |
43 | 46 | - Cuda 10.1 |
44 | 47 | - CuDNN 7.6.5 |
45 | 48 | - Tensorflow 2.2/2.3 |
46 | | -- [Tensorflow Addons](https://github.com/tensorflow/addons) 0.10.0 |
| 49 | +- [Tensorflow Addons](https://github.com/tensorflow/addons) >= 0.10.0 |
47 | 50 |
|
48 | 51 | Different Tensorflow version should be working but not tested yet. This repo will try to work with latest stable tensorflow version. **We recommend you install tensorflow 2.3.0 to training in case you want to use MultiGPU.** |
49 | 52 |
|
@@ -113,11 +116,11 @@ The preprocessing has two steps: |
113 | 116 |
|
114 | 117 | To reproduce the steps above: |
115 | 118 | ``` |
116 | | -tensorflow-tts-preprocess --rootdir ./datasets --outdir ./dump --config preprocess/ljspeech_preprocess.yaml --dataset ljspeech |
117 | | -tensorflow-tts-normalize --rootdir ./dump --outdir ./dump --config preprocess/ljspeech_preprocess.yaml --dataset ljspeech |
| 119 | +tensorflow-tts-preprocess --rootdir ./datasets --outdir ./dump --config preprocess/[ljspeech/kss/baker]_preprocess.yaml --dataset [ljspeech/kss/baker] |
| 120 | +tensorflow-tts-normalize --rootdir ./dump --outdir ./dump --config preprocess/[ljspeech/kss/baker]_preprocess.yaml --dataset [ljspeech/kss/baker] |
118 | 121 | ``` |
119 | 122 |
|
120 | | -Right now we only support [`ljspeech`](https://keithito.com/LJ-Speech-Dataset/) and [`kss`](https://www.kaggle.com/bryanpark/korean-single-speaker-speech-dataset) for dataset argument. In the future, we intend to support more datasets. |
| 123 | +Right now we only support [`ljspeech`](https://keithito.com/LJ-Speech-Dataset/), [`kss`](https://www.kaggle.com/bryanpark/korean-single-speaker-speech-dataset), [`baker`](https://weixinxcxdb.oss-cn-beijing.aliyuncs.com/gwYinPinKu/BZNSYP.rar) for dataset argument. In the future, we intend to support more datasets. |
121 | 124 |
|
122 | 125 | After preprocessing, the structure of the project folder should be: |
123 | 126 | ``` |
@@ -184,28 +187,6 @@ After preprocessing, the structure of the project folder should be: |
184 | 187 |
|
185 | 188 | We use suffix (`ids`, `raw-feats`, `raw-energy`, `raw-f0`, `norm-feats` and `wave`) for each type of input. |
186 | 189 |
|
187 | | -### Preprocessing Chinese Dataset |
188 | | -please download the open dataset from [Data-Baker](https://weixinxcxdb.oss-cn-beijing.aliyuncs.com/gwYinPinKu/BZNSYP.rar), and extract data like this: |
189 | | -``` |
190 | | -. |
191 | | -├── PhoneLabeling |
192 | | -│ ├── 000001.interval |
193 | | -│ ├── ... |
194 | | -│ └── 010000.interval |
195 | | -├── ProsodyLabeling |
196 | | -│ └── 000001-010000.txt |
197 | | -└── Wave |
198 | | - ├── 000001.wav |
199 | | - ├── ... |
200 | | - └── 010000.wav |
201 | | -``` |
202 | | - |
203 | | -after install tensorflowtts, you can process data like this: |
204 | | -```shell |
205 | | -tensorflow-tts-preprocess --dataset baker --rootdir ./baker --outdir ./dump --config ./preprocess/baker_preprocess.yaml |
206 | | -tensorflow-tts-normalize --rootdir ./dump --outdir ./dump --config ./preprocess/baker_preprocess.yaml --dataset baker |
207 | | -``` |
208 | | - |
209 | 190 |
|
210 | 191 | **IMPORTANT NOTES**: |
211 | 192 | - This preprocessing step is based on [ESPnet](https://github.com/espnet/espnet) so you can combine all models here with other models from ESPnet repository. |
|
0 commit comments