Skip to content

Commit f34ff2a

Browse files
authored
Merge pull request #300 from ZDisket/add-lbttspt
🍭 Add converted LibriTTS universal vocoder to pt model list and model notes
2 parents 5fa2995 + ffcc42c commit f34ff2a

File tree

1 file changed

+10
-6
lines changed

1 file changed

+10
-6
lines changed

examples/multiband_melgan/README.md

Lines changed: 10 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# Multi-band MelGAN: Faster Waveform Generation for High-Quality Text-to-Speech
1+
# Multi-band MelGAN: Faster Waveform Generation for High-Quality Text-to-Speech
22
Based on the script [`train_multiband_melgan.py`](https://github.com/dathudeptrai/TensorflowTTS/tree/master/examples/multiband_melgan/train_multiband_melgan.py).
33

44
## Training Multi-band MelGAN from scratch with LJSpeech dataset.
@@ -78,13 +78,17 @@ Here is a learning curves of melgan based on this config [`multiband_melgan.v1.y
7878
<img src="fig/train.png" height="300" width="850">
7979

8080
## Pretrained Models and Audio samples
81-
| Model | Conf | Lang | Fs [Hz] | Mel range [Hz] | FFT / Hop / Win [pt] | # iters |
82-
| :------ | :---: | :---: | :----: | :--------: | :---------------: | :-----: |
83-
| [multiband_melgan.v1](https://drive.google.com/drive/folders/1Hg82YnPbX6dfF7DxVs4c96RBaiFbh-cT?usp=sharing) | [link](https://github.com/tensorspeech/TensorFlowTTS/tree/master/examples/multiband_melgan/conf/multiband_melgan.v1.yaml) | EN | 22.05k | 80-7600 | 1024 / 256 / None | 940K |
84-
| [multiband_melgan.v1](https://drive.google.com/drive/folders/199XCXER51PWf_VzUpOwxfY_8XDfeXuZl?usp=sharing) | [link](https://github.com/dathudeptrai/TensorflowTTS/tree/master/examples/multiband_melgan/conf/multiband_melgan.v1.yaml) | KO | 22.05k | 80-7600 | 1024 / 256 / None | 1000K |
81+
| Model | Conf | Lang | Fs [Hz] | Mel range [Hz] | FFT / Hop / Win [pt] | # iters | Notes |
82+
| :------ | :---: | :---: | :----: | :--------: | :---------------: | :-----: | :-----: |
83+
| [multiband_melgan.v1](https://drive.google.com/drive/folders/1Hg82YnPbX6dfF7DxVs4c96RBaiFbh-cT?usp=sharing) | [link](https://github.com/tensorspeech/TensorFlowTTS/tree/master/examples/multiband_melgan/conf/multiband_melgan.v1.yaml) | EN | 22.05k | 80-7600 | 1024 / 256 / None | 940K | -|
84+
| [multiband_melgan.v1](https://drive.google.com/drive/folders/199XCXER51PWf_VzUpOwxfY_8XDfeXuZl?usp=sharing) | [link](https://github.com/dathudeptrai/TensorflowTTS/tree/master/examples/multiband_melgan/conf/multiband_melgan.v1.yaml) | KO | 22.05k | 80-7600 | 1024 / 256 / None | 1000K | -|
85+
| [multiband_melgan.v1_24k](https://drive.google.com/drive/folders/14H6Oa8kGxlIhfZZFf6JFzWL5NVHDpKai?usp=sharing) | [link](https://drive.google.com/file/d/1l2jBwTWVVsRuT5FLDOIDToEhqWBmuCMe/view?usp=sharing) | EN | 24k | 80-7600 | 2048 / 300 / 1200 | 1000K | Converted from [kan-bayashi's model](https://drive.google.com/drive/folders/1jfB15igea6tOQ0hZJGIvnpf3QyNhTLnq?usp=sharing); good universal vocoder|
86+
87+
88+
8589

8690
## Reference
8791

8892
1. https://github.com/kan-bayashi/ParallelWaveGAN
8993
2. [Parallel WaveGAN: A fast waveform generation model based on generative adversarial networks with multi-resolution spectrogram](https://arxiv.org/abs/1910.11480)
90-
3. [Multi-band MelGAN: Faster Waveform Generation for High-Quality Text-to-Speech](https://arxiv.org/abs/2005.05106)
94+
3. [Multi-band MelGAN: Faster Waveform Generation for High-Quality Text-to-Speech](https://arxiv.org/abs/2005.05106)

0 commit comments

Comments
 (0)