Skip to content

Commit 5b15bb9

Browse files
committed
🚩 Adjust configs and readmes
1 parent 2493011 commit 5b15bb9

File tree

5 files changed

+32
-28
lines changed

5 files changed

+32
-28
lines changed

examples/multiband_melgan_hf/README.md

Lines changed: 25 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -1,23 +1,24 @@
1-
# Multi-band MelGAN: Faster Waveform Generation for High-Quality Text-to-Speech
2-
Based on the script [`train_multiband_melgan.py`](https://github.com/dathudeptrai/TensorflowTTS/tree/master/examples/multiband_melgan/train_multiband_melgan.py).
1+
2+
# Multi-band MelGAN: Faster Waveform Generation for High-Quality Text-to-Speech
3+
Based on the script [`train_multiband_melgan_hf.py`](https://github.com/dathudeptrai/TensorflowTTS/tree/master/examples/multiband_melgan_hf/train_multiband_melgan_hf.py).
34

45
## Training Multi-band MelGAN from scratch with LJSpeech dataset.
5-
This example code show you how to train MelGAN from scratch with Tensorflow 2 based on custom training loop and tf.function. The data used for this example is LJSpeech, you can download the dataset at [link](https://keithito.com/LJ-Speech-Dataset/).
6+
This example code show you how to train MelGAN from scratch with Tensorflow 2 based on custom training loop and tf.function. The data used for this example is LJSpeech Ultimate, you can download the dataset at [link](https://machineexperiments.tumblr.com/post/662408083204685824/ljspeech-ultimate).
67

78
### Step 1: Create Tensorflow based Dataloader (tf.dataset)
89
Please see detail at [examples/melgan/](https://github.com/dathudeptrai/TensorflowTTS/tree/master/examples/melgan#step-1-create-tensorflow-based-dataloader-tfdataset)
910

1011
### Step 2: Training from scratch
11-
After you re-define your dataloader, pls modify an input arguments, train_dataset and valid_dataset from [`train_multiband_melgan.py`](https://github.com/dathudeptrai/TensorflowTTS/tree/master/examples/multiband_melgan/train_multiband_melgan.py). Here is an example command line to training melgan-stft from scratch:
12+
After you re-define your dataloader, pls modify an input arguments, train_dataset and valid_dataset from [`train_multiband_melgan_hf.py`](https://github.com/dathudeptrai/TensorflowTTS/tree/master/examples/multiband_melgan_hf/train_multiband_melgan_hf.py). Here is an example command line to training melgan-stft from scratch:
1213

1314
First, you need training generator with only stft loss:
1415

1516
```bash
16-
CUDA_VISIBLE_DEVICES=0 python examples/multiband_melgan/train_multiband_melgan.py \
17+
CUDA_VISIBLE_DEVICES=0 python examples/multiband_melgan_hf/train_multiband_melgan_hf.py \
1718
--train-dir ./dump/train/ \
1819
--dev-dir ./dump/valid/ \
19-
--outdir ./examples/multiband_melgan/exp/train.multiband_melgan.v1/ \
20-
--config ./examples/multiband_melgan/conf/multiband_melgan.v1.yaml \
20+
--outdir ./examples/multiband_melgan_hf/exp/train.multiband_melgan_hf.v1/ \
21+
--config ./examples/multiband_melgan_hf/conf/multiband_melgan_hf.lju.v1.yml \
2122
--use-norm 1 \
2223
--generator_mixed_precision 1 \
2324
--resume ""
@@ -26,27 +27,30 @@ CUDA_VISIBLE_DEVICES=0 python examples/multiband_melgan/train_multiband_melgan.p
2627
Then resume and start training generator + discriminator:
2728

2829
```bash
29-
CUDA_VISIBLE_DEVICES=0 python examples/multiband_melgan/train_multiband_melgan.py \
30+
CUDA_VISIBLE_DEVICES=0 python examples/multiband_melgan_hf/train_multiband_melgan_hf.py \
3031
--train-dir ./dump/train/ \
3132
--dev-dir ./dump/valid/ \
32-
--outdir ./examples/multiband_melgan/exp/train.multiband_melgan.v1/ \
33-
--config ./examples/multiband_melgan/conf/multiband_melgan.v1.yaml \
33+
--outdir ./examples/multiband_melgan_hf/exp/train.multiband_melgan_hf.v1/ \
34+
--config ./examples/multiband_melgan_hf/conf/multiband_melgan_hf.lju.v1.yml \
3435
--use-norm 1 \
35-
--resume ./examples/multiband_melgan/exp/train.multiband_melgan.v1/checkpoints/ckpt-200000
36+
--resume ./examples/multiband_melgan_hf/exp/train.multiband_melgan_hf.v1/checkpoints/ckpt-200000
3637
```
3738

3839
IF you want to use MultiGPU to training you can replace `CUDA_VISIBLE_DEVICES=0` by `CUDA_VISIBLE_DEVICES=0,1,2,3` for example. You also need to tune the `batch_size` for each GPU (in config file) by yourself to maximize the performance. Note that MultiGPU now support for Training but not yet support for Decode.
3940

4041
In case you want to resume the training progress, please following below example command line:
4142

4243
```bash
43-
--resume ./examples/multiband_melgan/exp/train.multiband_melgan.v1/checkpoints/ckpt-100000
44+
--resume ./examples/multiband_melgan_hf/exp/train.multiband_melgan_hf.v1/checkpoints/ckpt-100000
4445
```
4546

46-
If you want to finetune a model, use `--pretrained` like this with the filename of the generator
47+
If you want to finetune a model, use `--pretrained` like this with the filename of the generator and discriminator, separated by comma.
4748
```bash
48-
--pretrained ptgenerator.h5
49+
--pretrained ptgenerator.h5,ptdiscriminator.h5
4950
```
51+
It is recommended that you first train text2mel model then extract postnets so that vocoder learns to compensate for flaws, if you do so, append `--postnets 1` to arguments
52+
53+
5054

5155
**IMPORTANT NOTES**:
5256

@@ -58,20 +62,20 @@ If you want to finetune a model, use `--pretrained` like this with the filename
5862
To running inference on folder mel-spectrogram (eg valid folder), run below command line:
5963

6064
```bash
61-
CUDA_VISIBLE_DEVICES=0 python examples/multiband_melgan/decode_mb_melgan.py \
65+
CUDA_VISIBLE_DEVICES=0 python examples/multiband_melgan_hf/decode_mb_melgan.py \
6266
--rootdir ./dump/valid/ \
63-
--outdir ./prediction/multiband_melgan.v1/ \
64-
--checkpoint ./examples/multiband_melgan/exp/train.multiband_melgan.v1/checkpoints/generator-940000.h5 \
65-
--config ./examples/multiband_melgan/conf/multiband_melgan.v1.yaml \
67+
--outdir ./prediction/multiband_melgan_hf.v1/ \
68+
--checkpoint ./examples/multiband_melgan_hf/exp/train.multiband_melgan_hf.v1/checkpoints/generator-920000.h5 \
69+
--config ./examples/multiband_melgan_hf/conf/multiband_melgan_hf.lju.v1.yml \
6670
--batch-size 32 \
6771
--use-norm 1
6872
```
6973

7074
## Finetune MelGAN STFT with ljspeech pretrained on other languages
71-
Just load pretrained model and training from scratch with other languages. **DO NOT FORGET** re-preprocessing on your dataset if needed. A hop_size should be 256 if you want to use our pretrained.
75+
Just load pretrained model and training from scratch with other languages. **DO NOT FORGET** re-preprocessing on your dataset if needed. A hop_size should be 512 if you want to use our pretrained.
7276

7377
## Learning Curves
74-
Here is a learning curves of melgan based on this config [`multiband_melgan.v1.yaml`](https://github.com/dathudeptrai/TensorflowTTS/tree/master/examples/multiband_melgan/conf/multiband_melgan.v1.yaml)
78+
Here is a learning curves of melgan based on this config [`multiband_melgan_hf.v1.yaml`](https://github.com/dathudeptrai/TensorflowTTS/tree/master/examples/multiband_melgan_hf/conf/multiband_melgan_hf.v1.yaml)
7579

7680
<img src="fig/eval.png" height="300" width="850">
7781

@@ -80,11 +84,7 @@ Here is a learning curves of melgan based on this config [`multiband_melgan.v1.y
8084
## Pretrained Models and Audio samples
8185
| Model | Conf | Lang | Fs [Hz] | Mel range [Hz] | FFT / Hop / Win [pt] | # iters | Notes |
8286
| :------ | :---: | :---: | :----: | :--------: | :---------------: | :-----: | :-----: |
83-
| [multiband_melgan.v1](https://drive.google.com/drive/folders/1Hg82YnPbX6dfF7DxVs4c96RBaiFbh-cT?usp=sharing) | [link](https://github.com/tensorspeech/TensorFlowTTS/tree/master/examples/multiband_melgan/conf/multiband_melgan.v1.yaml) | EN | 22.05k | 80-7600 | 1024 / 256 / None | 940K | -|
84-
| [multiband_melgan.v1](https://drive.google.com/drive/folders/199XCXER51PWf_VzUpOwxfY_8XDfeXuZl?usp=sharing) | [link](https://github.com/dathudeptrai/TensorflowTTS/tree/master/examples/multiband_melgan/conf/multiband_melgan.v1.yaml) | KO | 22.05k | 80-7600 | 1024 / 256 / None | 1000K | -|
85-
| [multiband_melgan.v1_24k](https://drive.google.com/drive/folders/14H6Oa8kGxlIhfZZFf6JFzWL5NVHDpKai?usp=sharing) | [link](https://drive.google.com/file/d/1l2jBwTWVVsRuT5FLDOIDToEhqWBmuCMe/view?usp=sharing) | EN | 24k | 80-7600 | 2048 / 300 / 1200 | 1000K | Converted from [kan-bayashi's model](https://drive.google.com/drive/folders/1jfB15igea6tOQ0hZJGIvnpf3QyNhTLnq?usp=sharing); good universal vocoder|
86-
87-
87+
| [multiband_melgan_hf.lju.v1](https://drive.google.com/drive/folders/1tOMzik_Nr4eY63gooKYSmNTJyXC6Pp55?usp=sharing) | [link](https://github.com/tensorspeech/TensorFlowTTS/tree/master/examples/multiband_melgan_hf/conf/multiband_melgan_hf.lju.v1.yml) | EN | 44.1k | 20-11025 | 2048 / 512 / 2048 | 920K | -|
8888

8989

9090
## Reference

examples/multiband_melgan_hf/conf/multiband_melgan_hf.lju.v1ft.yml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11

22
#This is the hyperparameter configuration file for finetuning MB-MelGAN + MPD. It is intended to be used for finetuning generator and discriminator
3-
#Trains fast, it is mostly done at 30k steps, although one can still see small improvements beyond that
3+
#Trains fast, adapts to new voice within 30k steps, although it is beneficial to keep training beyond that for about 200k, depending on dataset size.
44

55
###########################################################
66
# FEATURE EXTRACTION SETTING #
@@ -98,7 +98,7 @@ discriminator_optimizer_params:
9898
lr_fn: "PiecewiseConstantDecay"
9999
lr_params:
100100
boundaries: [100000, 200000]
101-
values: [0.00003125, 0.00003125, 0.00003125]
101+
values: [0.000015625, 0.000015625, 0.000015625]
102102
amsgrad: false
103103

104104
###########################################################

examples/tacotron2/README.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -75,6 +75,8 @@ CUDA_VISIBLE_DEVICES=0 python examples/tacotron2/extract_duration.py \
7575
--win-back 3
7676
```
7777

78+
To extract postnets for training vocoder, follow above steps but with `extract_postnets.py`
79+
7880
You also can download my extracted durations at 40k steps at [link](https://drive.google.com/drive/u/1/folders/1kaPXRdLg9gZrll9KtvH3-feOBMM8sn3_?usp=drive_open).
7981

8082
## Finetune Tacotron-2 with ljspeech pretrained on other languages
@@ -116,6 +118,7 @@ Here is a result of tacotron2 based on this config [`tacotron2.v1.yaml`](https:/
116118
| :------ | :---: | :---: | :----: | :--------: | :---------------: | :-----: | :-----: |
117119
| [tacotron2.v1](https://drive.google.com/open?id=1kaPXRdLg9gZrll9KtvH3-feOBMM8sn3_) | [link](https://github.com/tensorspeech/TensorFlowTTS/tree/master/examples/tacotron2/conf/tacotron2.v1.yaml) | EN | 22.05k | 80-7600 | 1024 / 256 / None | 65K | 1
118120
| [tacotron2.v1](https://drive.google.com/drive/folders/1WMBe01BBnYf3sOxMhbvnF2CUHaRTpBXJ?usp=sharing) | [link](https://github.com/tensorspeech/TensorFlowTTS/tree/master/examples/tacotron2/conf/tacotron2.kss.v1.yaml) | KO | 22.05k | 80-7600 | 1024 / 256 / None | 100K | 1
121+
| [tacotron2.lju.v1](https://drive.google.com/drive/folders/1tOMzik_Nr4eY63gooKYSmNTJyXC6Pp55?usp=sharing) | [link](https://github.com/tensorspeech/TensorFlowTTS/tree/master/examples/tacotron2/conf/tacotron2.lju.v1.yaml) | EN | 44.1k | 20-11025 | 2048 / 512 / None | 126K | 1
119122

120123
## Reference
121124

examples/tacotron2/conf/tacotron2.lju.v1.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -73,7 +73,7 @@ var_train_expr: null # trainable variable expr (eg. 'embeddings|decoder_cell' )
7373
###########################################################
7474
# INTERVAL SETTING #
7575
###########################################################
76-
train_max_steps: 170000 # Number of training steps.
76+
train_max_steps: 200000 # Number of training steps.
7777
save_interval_steps: 2000 # Interval steps to save checkpoint.
7878
eval_interval_steps: 500 # Interval steps to evaluate the network.
7979
log_interval_steps: 200 # Interval steps to record the training log.
Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
{"symbol_to_id": {"pad": 0, "-": 1, "!": 2, "'": 3, "(": 4, ")": 5, ",": 6, ".": 7, ":": 8, ";": 9, "?": 10, "@AA": 11, "@AA0": 12, "@AA1": 13, "@AA2": 14, "@AE": 15, "@AE0": 16, "@AE1": 17, "@AE2": 18, "@AH": 19, "@AH0": 20, "@AH1": 21, "@AH2": 22, "@AO": 23, "@AO0": 24, "@AO1": 25, "@AO2": 26, "@AW": 27, "@AW0": 28, "@AW1": 29, "@AW2": 30, "@AY": 31, "@AY0": 32, "@AY1": 33, "@AY2": 34, "@B": 35, "@CH": 36, "@D": 37, "@DH": 38, "@EH": 39, "@EH0": 40, "@EH1": 41, "@EH2": 42, "@ER": 43, "@ER0": 44, "@ER1": 45, "@ER2": 46, "@EY": 47, "@EY0": 48, "@EY1": 49, "@EY2": 50, "@F": 51, "@G": 52, "@HH": 53, "@IH": 54, "@IH0": 55, "@IH1": 56, "@IH2": 57, "@IY": 58, "@IY0": 59, "@IY1": 60, "@IY2": 61, "@JH": 62, "@K": 63, "@L": 64, "@M": 65, "@N": 66, "@NG": 67, "@OW": 68, "@OW0": 69, "@OW1": 70, "@OW2": 71, "@OY": 72, "@OY0": 73, "@OY1": 74, "@OY2": 75, "@P": 76, "@R": 77, "@S": 78, "@SH": 79, "@T": 80, "@TH": 81, "@UH": 82, "@UH0": 83, "@UH1": 84, "@UH2": 85, "@UW": 86, "@UW0": 87, "@UW1": 88, "@UW2": 89, "@V": 90, "@W": 91, "@Y": 92, "@Z": 93, "@ZH": 94, "eos": 95}, "id_to_symbol": {"0": "pad", "1": "-", "2": "!", "3": "'", "4": "(", "5": ")", "6": ",", "7": ".", "8": ":", "9": ";", "10": "?", "11": "@AA", "12": "@AA0", "13": "@AA1", "14": "@AA2", "15": "@AE", "16": "@AE0", "17": "@AE1", "18": "@AE2", "19": "@AH", "20": "@AH0", "21": "@AH1", "22": "@AH2", "23": "@AO", "24": "@AO0", "25": "@AO1", "26": "@AO2", "27": "@AW", "28": "@AW0", "29": "@AW1", "30": "@AW2", "31": "@AY", "32": "@AY0", "33": "@AY1", "34": "@AY2", "35": "@B", "36": "@CH", "37": "@D", "38": "@DH", "39": "@EH", "40": "@EH0", "41": "@EH1", "42": "@EH2", "43": "@ER", "44": "@ER0", "45": "@ER1", "46": "@ER2", "47": "@EY", "48": "@EY0", "49": "@EY1", "50": "@EY2", "51": "@F", "52": "@G", "53": "@HH", "54": "@IH", "55": "@IH0", "56": "@IH1", "57": "@IH2", "58": "@IY", "59": "@IY0", "60": "@IY1", "61": "@IY2", "62": "@JH", "63": "@K", "64": "@L", "65": "@M", "66": "@N", "67": "@NG", "68": "@OW", "69": "@OW0", "70": "@OW1", "71": "@OW2", "72": "@OY", "73": "@OY0", "74": "@OY1", "75": "@OY2", "76": "@P", "77": "@R", "78": "@S", "79": "@SH", "80": "@T", "81": "@TH", "82": "@UH", "83": "@UH0", "84": "@UH1", "85": "@UH2", "86": "@UW", "87": "@UW0", "88": "@UW1", "89": "@UW2", "90": "@V", "91": "@W", "92": "@Y", "93": "@Z", "94": "@ZH", "95": "eos"}, "speakers_map": {"ljspeech": 0}, "processor_name": "LJSpeechUltimateProcessor"}

0 commit comments

Comments
 (0)