TensorSpeech
diff --git a/‎README.md‎
Lines changed: 17 additions & 14 deletions b/‎README.md‎
Lines changed: 17 additions & 14 deletions
diff --git a/‎examples/fastspeech2/README.md‎
Lines changed: 3 additions & 1 deletion b/‎examples/fastspeech2/README.md‎
Lines changed: 3 additions & 1 deletion
diff --git a/‎examples/fastspeech2/conf/fastspeech2.kss.v1.yaml‎
Lines changed: 76 additions & 0 deletions b/‎examples/fastspeech2/conf/fastspeech2.kss.v1.yaml‎
Lines changed: 76 additions & 0 deletions
diff --git a/‎examples/fastspeech2/conf/fastspeech2.kss.v2.yaml‎
Lines changed: 78 additions & 0 deletions b/‎examples/fastspeech2/conf/fastspeech2.kss.v2.yaml‎
Lines changed: 78 additions & 0 deletions
diff --git a/‎examples/multiband_melgan/README.md‎
Lines changed: 2 additions & 1 deletion b/‎examples/multiband_melgan/README.md‎
Lines changed: 2 additions & 1 deletion
diff --git a/‎examples/multiband_melgan/decode_mb_melgan.py‎
Lines changed: 1 addition & 1 deletion b/‎examples/multiband_melgan/decode_mb_melgan.py‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎examples/tacotron2/README.md‎
Lines changed: 2 additions & 1 deletion b/‎examples/tacotron2/README.md‎
Lines changed: 2 additions & 1 deletion
@@ -1,11 +1,11 @@
 <h2 align="center">
 <p> :yum: TensorflowTTS
 <p align="center">
-    <a href="https://github.com/dathudeptrai/TensorflowTTS/actions">
-        <img alt="Build" src="https://github.com/dathudeptrai/TensorflowTTS/workflows/CI/badge.svg?branch=master">
+    <a href="https://github.com/tensorspeech/TensorFlowTTS/actions">
+        <img alt="Build" src="https://github.com/tensorspeech/TensorFlowTTS/workflows/CI/badge.svg?branch=master">
     </a>
-    <a href="https://github.com/dathudeptrai/TensorflowTTS/blob/master/LICENSE">
-        <img alt="GitHub" src="https://img.shields.io/github/license/dathudeptrai/TensorflowTTS?color=red">
+    <a href="https://github.com/tensorspeech/TensorFlowTTS/blob/master/LICENSE">
+        <img alt="GitHub" src="https://img.shields.io/github/license/tensorspeech/TensorflowTTS?color=red">
     </a>
     <a href="https://colab.research.google.com/drive/1akxtrLZHKuMiQup00tzO2olCaN-y3KiD?usp=sharing">
         <img alt="Colab" src="https://colab.research.google.com/assets/colab-badge.svg">
@@ -19,8 +19,9 @@
 :zany_face: TensorflowTTS provides real-time state-of-the-art speech synthesis architectures such as Tacotron-2, Melgan, Multiband-Melgan, FastSpeech, FastSpeech2 based-on TensorFlow 2. With Tensorflow 2, we can speed-up training/inference progress, optimizer further by using [fake-quantize aware](https://www.tensorflow.org/model_optimization/guide/quantization/training_comprehensive_guide) and [pruning](https://www.tensorflow.org/model_optimization/guide/pruning/pruning_with_keras), make TTS models can be run faster than real-time and be able to deploy on mobile devices or embedded systems.
 
 ## What's new
-- 2020/07/17 **(NEW!)** Support MultiGPU for all Trainer.
-- 2020/07/05 **(New!)** Support Convert Tacotron-2, FastSpeech to Tflite. Pls see the [colab](https://colab.research.google.com/drive/1HudLLpT9CQdh2k04c06bHUwLubhGTWxA?usp=sharing). Thank @jaeyoo from TFlite team for his support.
+- 2020/08/05 **(NEW!)** Support Korean TTS. Pls see the [colab](https://colab.research.google.com/drive/1ybWwOS5tipgPFttNulp77P6DAB5MtiuN?usp=sharing). Thank [@crux153](https://github.com/crux153).
+- 2020/07/17 Support MultiGPU for all Trainer.
+- 2020/07/05 Support Convert Tacotron-2, FastSpeech to Tflite. Pls see the [colab](https://colab.research.google.com/drive/1HudLLpT9CQdh2k04c06bHUwLubhGTWxA?usp=sharing). Thank @jaeyoo from TFlite team for his support.
 - 2020/06/20 [FastSpeech2](https://arxiv.org/abs/2006.04558) implementation with Tensorflow is supported.
 - 2020/06/07 [Multi-band MelGAN (MB MelGAN)](https://github.com/dathudeptrai/TensorflowTTS/blob/master/examples/multiband_melgan/) implementation with Tensorflow is supported.
 
@@ -41,7 +42,7 @@ This repository is tested on Ubuntu 18.04 with:
 - Python 3.6+
 - Cuda 10.1
 - CuDNN 7.6.5
-- Tensorflow 2.2
+- Tensorflow 2.2/2.3
 - [Tensorflow Addons](https://github.com/tensorflow/addons) 0.10.0
 
 Different Tensorflow version should be working but not tested yet. This repo will try to work with latest stable tensorflow version. **We recommend you install tensorflow 2.3.0 to training in case you want to use MultiGPU.**
@@ -54,8 +55,8 @@ $ pip install TensorflowTTS
 ### From source
 Examples are included in the repository but are not shipped with the framework. Therefore, in order to run the latest verion of examples, you need install from source following bellow.
 ```bash
-$ git clone https://github.com/TensorSpeech/TensorflowTTS.git
-$ cd TensorflowTTS
+$ git clone https://github.com/TensorSpeech/TensorFlowTTS.git
+$ cd TensorFlowTTS
 $ pip install .
 ```
 If you want upgrade the repository and its dependencies:
@@ -112,10 +113,12 @@ The preprocessing has two steps:
 
 To reproduce the steps above:
 ```
-tensorflow-tts-preprocess --rootdir ./datasets --outdir ./dump --config preprocess/ljspeech_preprocess.yaml
-tensorflow-tts-normalize --rootdir ./dump --outdir ./dump --config preprocess/ljspeech_preprocess.yaml
+tensorflow-tts-preprocess --rootdir ./datasets --outdir ./dump --config preprocess/ljspeech_preprocess.yaml --dataset ljspeech
+tensorflow-tts-normalize --rootdir ./dump --outdir ./dump --config preprocess/ljspeech_preprocess.yaml --dataset ljspeech
 ```
 
+Right now we only support [`ljspeech`](https://keithito.com/LJ-Speech-Dataset/) and [`kss`](https://www.kaggle.com/bryanpark/korean-single-speaker-speech-dataset) for dataset argument. In the future, we intend to support more datasets.
+
 After preprocessing, the structure of the project folder should be:
 ```
 |- datasets/
@@ -225,7 +228,7 @@ A detail implementation of base_trainer from [tensorflow_tts/trainer/base_traine
 All models on this repo are trained based-on **GanBasedTrainer** (see [train_melgan.py](https://github.com/dathudeptrai/TensorflowTTS/blob/master/examples/melgan/train_melgan.py), [train_melgan_stft.py](https://github.com/dathudeptrai/TensorflowTTS/blob/master/examples/melgan.stft/train_melgan_stft.py), [train_multiband_melgan.py](https://github.com/dathudeptrai/TensorflowTTS/blob/master/examples/multiband_melgan/train_multiband_melgan.py)) and **Seq2SeqBasedTrainer** (see [train_tacotron2.py](https://github.com/dathudeptrai/TensorflowTTS/blob/master/examples/tacotron2/train_tacotron2.py), [train_fastspeech.py](https://github.com/dathudeptrai/TensorflowTTS/blob/master/examples/fastspeech/train_fastspeech.py)).
 
 # End-to-End Examples
-You can know how to inference each model at [notebooks](https://github.com/dathudeptrai/TensorflowTTS/tree/master/notebooks) or see a [colab](https://colab.research.google.com/drive/1akxtrLZHKuMiQup00tzO2olCaN-y3KiD?usp=sharing). Here is an example code for end2end inference with fastspeech and melgan.
+You can know how to inference each model at [notebooks](https://github.com/dathudeptrai/TensorflowTTS/tree/master/notebooks) or see a [colab](https://colab.research.google.com/drive/1akxtrLZHKuMiQup00tzO2olCaN-y3KiD?usp=sharing) (for English), [colab](https://colab.research.google.com/drive/1ybWwOS5tipgPFttNulp77P6DAB5MtiuN?usp=sharing) (for Korean). Here is an example code for end2end inference with fastspeech and melgan.
 
 ```python
 import numpy as np
@@ -242,15 +245,15 @@ from tensorflow_tts.inference import TFAutoModel
 # initialize fastspeech model.
 fs_config = AutoConfig.from_pretrained('/examples/fastspeech/conf/fastspeech.v1.yaml')
 fastspeech = TFAutoModel.from_pretrained(
-    config=config,
+    config=fs_config,
     pretrained_path="./examples/fastspeech/pretrained/model-195000.h5"
 )
 
 
 # initialize melgan model
 melgan_config = AutoConfig.from_pretrained('./examples/melgan/conf/melgan.v1.yaml')
 melgan = TFAutoModel.from_pretrained(
-    config=config,
+    config=melgan_config,
     pretrained_path="./examples/melgan/checkpoint/generator-1500000.h5"
 )
 
 
@@ -52,7 +52,9 @@ CUDA_VISIBLE_DEVICES=0 python examples/fastspeech2/decode_fastspeech2.py \
 ## Pretrained Models and Audio samples
 | Model                                                                                                          | Conf                                                                                                                        | Lang  | Fs [Hz] | Mel range [Hz] | FFT / Hop / Win [pt] | # iters |
 | :------                                                                                                        | :---:                                                                                                                       | :---: | :----:  | :--------:     | :---------------:    | :-----: |
-| [fastspeech2.v1](https://drive.google.com/drive/folders/158vFyC2pxw9xKdxp-C5WPEtgtUiWZYE0?usp=sharing)             | [link](https://github.com/dathudeptrai/TensorflowTTS/tree/master/examples/fastspeech2/conf/fastspeech2.v1.yaml)          | EN    | 22.05k  | 80-7600        | 1024 / 256 / None    | 150k    |
+| [fastspeech2.v1](https://drive.google.com/drive/folders/158vFyC2pxw9xKdxp-C5WPEtgtUiWZYE0?usp=sharing)             | [link](https://github.com/TensorSpeech/TensorFlowTTS/blob/master/examples/fastspeech2/conf/fastspeech2.v1.yaml)          | EN    | 22.05k  | 80-7600        | 1024 / 256 / None    | 150k    |
+| [fastspeech2.kss.v1](https://drive.google.com/drive/folders/1DU952--jVnJ5SZDSINRs7dVVSpdB7tC_?usp=sharing)             | [link](https://github.com/TensorSpeech/TensorFlowTTS/blob/master/examples/fastspeech2/conf/fastspeech2.kss.v1.yaml)          | KO    | 22.05k  | 80-7600        | 1024 / 256 / None    | 200k    |
+| [fastspeech2.kss.v2](https://drive.google.com/drive/folders/1G3-AJnEsu2rYXYgo2iGIVJfCqqfbpwMu?usp=sharing)             | [link](https://github.com/TensorSpeech/TensorFlowTTS/blob/master/examples/fastspeech2/conf/fastspeech2.kss.v2.yaml)          | KO    | 22.05k  | 80-7600        | 1024 / 256 / None    | 200k    |
 
 ## Reference
 
 
@@ -0,0 +1,76 @@
+# This is the hyperparameter configuration file for FastSpeech2 v1.
+# Please make sure this is adjusted for the KSS dataset. If you want to
+# apply to the other dataset, you might need to carefully change some parameters.
+# This configuration performs 200k iters but a best checkpoint is around 150k iters.
+
+###########################################################
+#                FEATURE EXTRACTION SETTING               #
+###########################################################
+hop_size: 256            # Hop size.
+format: "npy"
+
+
+###########################################################
+#              NETWORK ARCHITECTURE SETTING               #
+###########################################################
+model_type: "fastspeech2"
+
+fastspeech2_params:
+    dataset: "kss"
+    n_speakers: 1
+    encoder_hidden_size: 384
+    encoder_num_hidden_layers: 4
+    encoder_num_attention_heads: 2
+    encoder_attention_head_size: 192  # hidden_size // num_attention_heads
+    encoder_intermediate_size: 1024
+    encoder_intermediate_kernel_size: 3
+    encoder_hidden_act: "mish"
+    decoder_hidden_size: 384
+    decoder_num_hidden_layers: 4
+    decoder_num_attention_heads: 2
+    decoder_attention_head_size: 192  # hidden_size // num_attention_heads
+    decoder_intermediate_size: 1024
+    decoder_intermediate_kernel_size: 3
+    decoder_hidden_act: "mish"
+    variant_prediction_num_conv_layers: 2
+    variant_predictor_filter: 256
+    variant_predictor_kernel_size: 3
+    variant_predictor_dropout_rate: 0.5
+    num_mels: 80
+    hidden_dropout_prob: 0.2
+    attention_probs_dropout_prob: 0.1
+    max_position_embeddings: 2048
+    initializer_range: 0.02
+    output_attentions: False
+    output_hidden_states: False
+
+###########################################################
+#                  DATA LOADER SETTING                    #
+###########################################################
+batch_size: 16              # Batch size.
+remove_short_samples: true  # Whether to remove samples the length of which are less than batch_max_steps.
+allow_cache: true           # Whether to allow cache in dataset. If true, it requires cpu memory.
+mel_length_threshold: 32    # remove all targets has mel_length <= 32 
+is_shuffle: true            # shuffle dataset after each epoch.
+###########################################################
+#             OPTIMIZER & SCHEDULER SETTING               #
+###########################################################
+optimizer_params:
+    initial_learning_rate: 0.001
+    end_learning_rate: 0.00005
+    decay_steps: 150000          # < train_max_steps is recommend.
+    warmup_proportion: 0.02
+    weight_decay: 0.001
+    
+    
+###########################################################
+#                    INTERVAL SETTING                     #
+###########################################################
+train_max_steps: 200000               # Number of training steps.
+save_interval_steps: 5000             # Interval steps to save checkpoint.
+eval_interval_steps: 500              # Interval steps to evaluate the network.
+log_interval_steps: 200               # Interval steps to record the training log.
+###########################################################
+#                     OTHER SETTING                       #
+###########################################################
+num_save_intermediate_results: 1  # Number of batch to be saved as intermediate results.
@@ -0,0 +1,78 @@
+# This is the hyperparameter configuration file for FastSpeech2 v2.
+# the different of v2 and v1 is that v2 apply linformer technique.
+# Please make sure this is adjusted for the KSS dataset. If you want to
+# apply to the other dataset, you might need to carefully change some parameters.
+# This configuration performs 200k iters but a best checkpoint is around 150k iters.
+
+###########################################################
+#                FEATURE EXTRACTION SETTING               #
+###########################################################
+hop_size: 256            # Hop size.
+format: "npy"
+
+
+###########################################################
+#              NETWORK ARCHITECTURE SETTING               #
+###########################################################
+model_type: "fastspeech2"
+
+fastspeech2_params:
+    dataset: "kss"
+    n_speakers: 1
+    encoder_hidden_size: 256
+    encoder_num_hidden_layers: 3
+    encoder_num_attention_heads: 2
+    encoder_attention_head_size: 16  # in v1, = 384//2
+    encoder_intermediate_size: 1024
+    encoder_intermediate_kernel_size: 3
+    encoder_hidden_act: "mish"
+    decoder_hidden_size: 256
+    decoder_num_hidden_layers: 3
+    decoder_num_attention_heads: 2
+    decoder_attention_head_size: 16  # in v1, = 384//2
+    decoder_intermediate_size: 1024
+    decoder_intermediate_kernel_size: 3
+    decoder_hidden_act: "mish"
+    variant_prediction_num_conv_layers: 2
+    variant_predictor_filter: 256
+    variant_predictor_kernel_size: 3
+    variant_predictor_dropout_rate: 0.5
+    num_mels: 80
+    hidden_dropout_prob: 0.2
+    attention_probs_dropout_prob: 0.1
+    max_position_embeddings: 2048
+    initializer_range: 0.02
+    output_attentions: False
+    output_hidden_states: False
+
+###########################################################
+#                  DATA LOADER SETTING                    #
+###########################################################
+batch_size: 16              # Batch size.
+remove_short_samples: true  # Whether to remove samples the length of which are less than batch_max_steps.
+allow_cache: true           # Whether to allow cache in dataset. If true, it requires cpu memory.
+mel_length_threshold: 32    # remove all targets has mel_length <= 32 
+is_shuffle: true            # shuffle dataset after each epoch.
+###########################################################
+#             OPTIMIZER & SCHEDULER SETTING               #
+###########################################################
+optimizer_params:
+    initial_learning_rate: 0.001
+    end_learning_rate: 0.00005
+    decay_steps: 150000          # < train_max_steps is recommend.
+    warmup_proportion: 0.02
+    weight_decay: 0.001
+    
+    
+###########################################################
+#                    INTERVAL SETTING                     #
+###########################################################
+train_max_steps: 200000               # Number of training steps.
+save_interval_steps: 5000             # Interval steps to save checkpoint.
+eval_interval_steps: 500              # Interval steps to evaluate the network.
+log_interval_steps: 200               # Interval steps to record the training log.
+delay_f0_energy_steps: 3              # 2 steps use LR outputs only then 1 steps LR + F0 + Energy.
+###########################################################
+#                     OTHER SETTING                       #
+###########################################################
+num_save_intermediate_results: 1  # Number of batch to be saved as intermediate results.
@@ -74,7 +74,8 @@ Here is a learning curves of melgan based on this config [`multiband_melgan.v1.y
 ## Pretrained Models and Audio samples
 | Model                                                                                                          | Conf                                                                                                                        | Lang  | Fs [Hz] | Mel range [Hz] | FFT / Hop / Win [pt] | # iters |
 | :------                                                                                                        | :---:                                                                                                                       | :---: | :----:  | :--------:     | :---------------:    | :-----: |
-| [multiband_melgan.v1](https://drive.google.com/drive/folders/1Hg82YnPbX6dfF7DxVs4c96RBaiFbh-cT?usp=sharing)             | [link](https://github.com/dathudeptrai/TensorflowTTS/tree/master/examples/multiband_melgan/conf/multiband_melgan.v1.yaml)          | EN    | 22.05k  | 80-7600        | 1024 / 256 / None    | 940K    |
+| [multiband_melgan.v1](https://drive.google.com/drive/folders/1Hg82YnPbX6dfF7DxVs4c96RBaiFbh-cT?usp=sharing)             | [link](https://github.com/tensorspeech/TensorFlowTTS/tree/master/examples/multiband_melgan/conf/multiband_melgan.v1.yaml)          | EN    | 22.05k  | 80-7600        | 1024 / 256 / None    | 940K    |
+| [multiband_melgan.v1](https://drive.google.com/drive/folders/199XCXER51PWf_VzUpOwxfY_8XDfeXuZl?usp=sharing)             | [link](https://github.com/dathudeptrai/TensorflowTTS/tree/master/examples/multiband_melgan/conf/multiband_melgan.v1.yaml)          | KO    | 22.05k  | 80-7600        | 1024 / 256 / None    | 1000K    |
 
 ## Reference
 
 
@@ -95,7 +95,7 @@ def main():
     config.update(vars(args))
 
     if config["format"] == "npy":
-        mel_query = "*-norm-feats.npy" if args.use_norm == 1 else "*-raw-feats.npy"
+        mel_query = "*-fs-after-feats.npy" if "fastspeech" in args.rootdir else "*-norm-feats.npy" if args.use_norm == 1 else "*-raw-feats.npy"
         mel_load_fn = np.load
     else:
         raise ValueError("Only npy is supported.")
 
@@ -109,7 +109,8 @@ Here is a result of tacotron2 based on this config [`tacotron2.v1.yaml`](https:/
 ## Pretrained Models and Audio samples
 | Model                                                                                                          | Conf                                                                                                                        | Lang  | Fs [Hz] | Mel range [Hz] | FFT / Hop / Win [pt] | # iters | reduction factor|
 | :------                                                                                                        | :---:                                                                                                                       | :---: | :----:  | :--------:     | :---------------:    | :-----: |  :-----: |
-| [tacotron2.v1](https://drive.google.com/open?id=1kaPXRdLg9gZrll9KtvH3-feOBMM8sn3_)             | [link](https://github.com/dathudeptrai/TensorflowTTS/tree/master/examples/tacotron2/conf/tacotron2.v1.yaml)          | EN    | 22.05k  | 80-7600        | 1024 / 256 / None    | 65k    | 1
+| [tacotron2.v1](https://drive.google.com/open?id=1kaPXRdLg9gZrll9KtvH3-feOBMM8sn3_)             | [link](https://github.com/tensorspeech/TensorFlowTTS/tree/master/examples/tacotron2/conf/tacotron2.v1.yaml)          | EN    | 22.05k  | 80-7600        | 1024 / 256 / None    | 65K    | 1
+| [tacotron2.v1](https://drive.google.com/drive/folders/1WMBe01BBnYf3sOxMhbvnF2CUHaRTpBXJ?usp=sharing)             | [link](https://github.com/tensorspeech/TensorFlowTTS/tree/master/examples/tacotron2/conf/tacotron2.kss.v1.yaml)          | KO    | 22.05k  | 80-7600        | 1024 / 256 / None    | 100K    | 1
 
 ## Reference