Skip to content

Commit 3c388e7

Browse files
committed
✍ Update README, add Chinese TTS Colab.
1 parent 39185be commit 3c388e7

File tree

1 file changed

+8
-27
lines changed

1 file changed

+8
-27
lines changed

README.md

Lines changed: 8 additions & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,7 @@
1919
:zany_face: TensorflowTTS provides real-time state-of-the-art speech synthesis architectures such as Tacotron-2, Melgan, Multiband-Melgan, FastSpeech, FastSpeech2 based-on TensorFlow 2. With Tensorflow 2, we can speed-up training/inference progress, optimizer further by using [fake-quantize aware](https://www.tensorflow.org/model_optimization/guide/quantization/training_comprehensive_guide) and [pruning](https://www.tensorflow.org/model_optimization/guide/pruning/pruning_with_keras), make TTS models can be run faster than real-time and be able to deploy on mobile devices or embedded systems.
2020

2121
## What's new
22+
- 2020/08/14 **(NEW!)** Support Chinese TTS. Pls see the [colab](https://colab.research.google.com/drive/1YpSHRBRPBI7cnTkQn1UcVTWEQVbsUm1S?usp=sharing). Thank [@azraelkuan](https://github.com/azraelkuan).
2223
- 2020/08/05 **(NEW!)** Support Korean TTS. Pls see the [colab](https://colab.research.google.com/drive/1ybWwOS5tipgPFttNulp77P6DAB5MtiuN?usp=sharing). Thank [@crux153](https://github.com/crux153).
2324
- 2020/07/17 Support MultiGPU for all Trainer.
2425
- 2020/07/05 Support Convert Tacotron-2, FastSpeech to Tflite. Pls see the [colab](https://colab.research.google.com/drive/1HudLLpT9CQdh2k04c06bHUwLubhGTWxA?usp=sharing). Thank @jaeyoo from TFlite team for his support.
@@ -35,15 +36,17 @@
3536
- Mixed precision to speed-up training if posible.
3637
- Support both Single/Multi GPU in base trainer class.
3738
- TFlite conversion for all supported model.
39+
- Android example.
40+
- Support many languages (currently, we support Chinese, Korean, English.)
3841

3942
## Requirements
4043
This repository is tested on Ubuntu 18.04 with:
4144

42-
- Python 3.6+
45+
- Python 3.7+
4346
- Cuda 10.1
4447
- CuDNN 7.6.5
4548
- Tensorflow 2.2/2.3
46-
- [Tensorflow Addons](https://github.com/tensorflow/addons) 0.10.0
49+
- [Tensorflow Addons](https://github.com/tensorflow/addons) >= 0.10.0
4750

4851
Different Tensorflow version should be working but not tested yet. This repo will try to work with latest stable tensorflow version. **We recommend you install tensorflow 2.3.0 to training in case you want to use MultiGPU.**
4952

@@ -113,11 +116,11 @@ The preprocessing has two steps:
113116

114117
To reproduce the steps above:
115118
```
116-
tensorflow-tts-preprocess --rootdir ./datasets --outdir ./dump --config preprocess/ljspeech_preprocess.yaml --dataset ljspeech
117-
tensorflow-tts-normalize --rootdir ./dump --outdir ./dump --config preprocess/ljspeech_preprocess.yaml --dataset ljspeech
119+
tensorflow-tts-preprocess --rootdir ./datasets --outdir ./dump --config preprocess/[ljspeech/kss/baker]_preprocess.yaml --dataset [ljspeech/kss/baker]
120+
tensorflow-tts-normalize --rootdir ./dump --outdir ./dump --config preprocess/[ljspeech/kss/baker]_preprocess.yaml --dataset [ljspeech/kss/baker]
118121
```
119122

120-
Right now we only support [`ljspeech`](https://keithito.com/LJ-Speech-Dataset/) and [`kss`](https://www.kaggle.com/bryanpark/korean-single-speaker-speech-dataset) for dataset argument. In the future, we intend to support more datasets.
123+
Right now we only support [`ljspeech`](https://keithito.com/LJ-Speech-Dataset/), [`kss`](https://www.kaggle.com/bryanpark/korean-single-speaker-speech-dataset), [`baker`](https://weixinxcxdb.oss-cn-beijing.aliyuncs.com/gwYinPinKu/BZNSYP.rar) for dataset argument. In the future, we intend to support more datasets.
121124

122125
After preprocessing, the structure of the project folder should be:
123126
```
@@ -184,28 +187,6 @@ After preprocessing, the structure of the project folder should be:
184187

185188
We use suffix (`ids`, `raw-feats`, `raw-energy`, `raw-f0`, `norm-feats` and `wave`) for each type of input.
186189

187-
### Preprocessing Chinese Dataset
188-
please download the open dataset from [Data-Baker](https://weixinxcxdb.oss-cn-beijing.aliyuncs.com/gwYinPinKu/BZNSYP.rar), and extract data like this:
189-
```
190-
.
191-
├── PhoneLabeling
192-
│ ├── 000001.interval
193-
│ ├── ...
194-
│ └── 010000.interval
195-
├── ProsodyLabeling
196-
│ └── 000001-010000.txt
197-
└── Wave
198-
├── 000001.wav
199-
├── ...
200-
└── 010000.wav
201-
```
202-
203-
after install tensorflowtts, you can process data like this:
204-
```shell
205-
tensorflow-tts-preprocess --dataset baker --rootdir ./baker --outdir ./dump --config ./preprocess/baker_preprocess.yaml
206-
tensorflow-tts-normalize --rootdir ./dump --outdir ./dump --config ./preprocess/baker_preprocess.yaml --dataset baker
207-
```
208-
209190

210191
**IMPORTANT NOTES**:
211192
- This preprocessing step is based on [ESPnet](https://github.com/espnet/espnet) so you can combine all models here with other models from ESPnet repository.

0 commit comments

Comments
 (0)