|
1 | | -# Multi-band MelGAN: Faster Waveform Generation for High-Quality Text-to-Speech |
| 1 | +# Multi-band MelGAN: Faster Waveform Generation for High-Quality Text-to-Speech |
2 | 2 | Based on the script [`train_multiband_melgan.py`](https://github.com/dathudeptrai/TensorflowTTS/tree/master/examples/multiband_melgan/train_multiband_melgan.py). |
3 | 3 |
|
4 | 4 | ## Training Multi-band MelGAN from scratch with LJSpeech dataset. |
@@ -78,13 +78,17 @@ Here is a learning curves of melgan based on this config [`multiband_melgan.v1.y |
78 | 78 | <img src="fig/train.png" height="300" width="850"> |
79 | 79 |
|
80 | 80 | ## Pretrained Models and Audio samples |
81 | | -| Model | Conf | Lang | Fs [Hz] | Mel range [Hz] | FFT / Hop / Win [pt] | # iters | |
82 | | -| :------ | :---: | :---: | :----: | :--------: | :---------------: | :-----: | |
83 | | -| [multiband_melgan.v1](https://drive.google.com/drive/folders/1Hg82YnPbX6dfF7DxVs4c96RBaiFbh-cT?usp=sharing) | [link](https://github.com/tensorspeech/TensorFlowTTS/tree/master/examples/multiband_melgan/conf/multiband_melgan.v1.yaml) | EN | 22.05k | 80-7600 | 1024 / 256 / None | 940K | |
84 | | -| [multiband_melgan.v1](https://drive.google.com/drive/folders/199XCXER51PWf_VzUpOwxfY_8XDfeXuZl?usp=sharing) | [link](https://github.com/dathudeptrai/TensorflowTTS/tree/master/examples/multiband_melgan/conf/multiband_melgan.v1.yaml) | KO | 22.05k | 80-7600 | 1024 / 256 / None | 1000K | |
| 81 | +| Model | Conf | Lang | Fs [Hz] | Mel range [Hz] | FFT / Hop / Win [pt] | # iters | Notes | |
| 82 | +| :------ | :---: | :---: | :----: | :--------: | :---------------: | :-----: | :-----: | |
| 83 | +| [multiband_melgan.v1](https://drive.google.com/drive/folders/1Hg82YnPbX6dfF7DxVs4c96RBaiFbh-cT?usp=sharing) | [link](https://github.com/tensorspeech/TensorFlowTTS/tree/master/examples/multiband_melgan/conf/multiband_melgan.v1.yaml) | EN | 22.05k | 80-7600 | 1024 / 256 / None | 940K | -| |
| 84 | +| [multiband_melgan.v1](https://drive.google.com/drive/folders/199XCXER51PWf_VzUpOwxfY_8XDfeXuZl?usp=sharing) | [link](https://github.com/dathudeptrai/TensorflowTTS/tree/master/examples/multiband_melgan/conf/multiband_melgan.v1.yaml) | KO | 22.05k | 80-7600 | 1024 / 256 / None | 1000K | -| |
| 85 | +| [multiband_melgan.v1_24k](https://drive.google.com/drive/folders/14H6Oa8kGxlIhfZZFf6JFzWL5NVHDpKai?usp=sharing) | [link](https://drive.google.com/file/d/1l2jBwTWVVsRuT5FLDOIDToEhqWBmuCMe/view?usp=sharing) | EN | 24k | 80-7600 | 2048 / 300 / 1200 | 1000K | Converted from [kan-bayashi's model](https://drive.google.com/drive/folders/1jfB15igea6tOQ0hZJGIvnpf3QyNhTLnq?usp=sharing); good universal vocoder| |
| 86 | + |
| 87 | + |
| 88 | + |
85 | 89 |
|
86 | 90 | ## Reference |
87 | 91 |
|
88 | 92 | 1. https://github.com/kan-bayashi/ParallelWaveGAN |
89 | 93 | 2. [Parallel WaveGAN: A fast waveform generation model based on generative adversarial networks with multi-resolution spectrogram](https://arxiv.org/abs/1910.11480) |
90 | | -3. [Multi-band MelGAN: Faster Waveform Generation for High-Quality Text-to-Speech](https://arxiv.org/abs/2005.05106) |
| 94 | +3. [Multi-band MelGAN: Faster Waveform Generation for High-Quality Text-to-Speech](https://arxiv.org/abs/2005.05106) |
0 commit comments