TensorSpeech
diff --git a/‎.gitignore‎
Lines changed: 7 additions & 1 deletion b/‎.gitignore‎
Lines changed: 7 additions & 1 deletion
diff --git a/‎README.md‎
Lines changed: 16 additions & 10 deletions b/‎README.md‎
Lines changed: 16 additions & 10 deletions
diff --git a/‎…mples/fastspeech2_multispeaker/README.md‎ ‎examples/fastspeech2_libritts/README.md‎examples/fastspeech2_multispeaker/README.md renamed to examples/fastspeech2_libritts/README.md
Lines changed: 11 additions & 11 deletions b/‎…mples/fastspeech2_multispeaker/README.md‎ ‎examples/fastspeech2_libritts/README.md‎examples/fastspeech2_multispeaker/README.md renamed to examples/fastspeech2_libritts/README.md
Lines changed: 11 additions & 11 deletions
diff --git a/‎…ltispeaker/conf/fastspeech2libritts.yaml‎ ‎…2_libritts/conf/fastspeech2libritts.yaml‎examples/fastspeech2_multispeaker/conf/fastspeech2libritts.yaml renamed to examples/fastspeech2_libritts/conf/fastspeech2libritts.yaml
Lines changed: 1 addition & 0 deletions b/‎…ltispeaker/conf/fastspeech2libritts.yaml‎ ‎…2_libritts/conf/fastspeech2libritts.yaml‎examples/fastspeech2_multispeaker/conf/fastspeech2libritts.yaml renamed to examples/fastspeech2_libritts/conf/fastspeech2libritts.yaml
Lines changed: 1 addition & 0 deletions
diff --git a/‎…ech2_multispeaker/fastspeech2_dataset.py‎ ‎…tspeech2_libritts/fastspeech2_dataset.py‎examples/fastspeech2_multispeaker/fastspeech2_dataset.py renamed to examples/fastspeech2_libritts/fastspeech2_dataset.py b/‎…ech2_multispeaker/fastspeech2_dataset.py‎ ‎…tspeech2_libritts/fastspeech2_dataset.py‎examples/fastspeech2_multispeaker/fastspeech2_dataset.py renamed to examples/fastspeech2_libritts/fastspeech2_dataset.py
diff --git a/‎…ker/libri_experiment/prepare_libri.ipynb‎ ‎…tts/libri_experiment/prepare_libri.ipynb‎examples/fastspeech2_multispeaker/libri_experiment/prepare_libri.ipynb renamed to examples/fastspeech2_libritts/libri_experiment/prepare_libri.ipynb
Lines changed: 10 additions & 2 deletions b/‎…ker/libri_experiment/prepare_libri.ipynb‎ ‎…tts/libri_experiment/prepare_libri.ipynb‎examples/fastspeech2_multispeaker/libri_experiment/prepare_libri.ipynb renamed to examples/fastspeech2_libritts/libri_experiment/prepare_libri.ipynb
Lines changed: 10 additions & 2 deletions
diff --git a/‎…astspeech2_multispeaker/scripts/build.sh‎ ‎…es/fastspeech2_libritts/scripts/build.sh‎examples/fastspeech2_multispeaker/scripts/build.sh renamed to examples/fastspeech2_libritts/scripts/build.sh b/‎…astspeech2_multispeaker/scripts/build.sh‎ ‎…es/fastspeech2_libritts/scripts/build.sh‎examples/fastspeech2_multispeaker/scripts/build.sh renamed to examples/fastspeech2_libritts/scripts/build.sh
diff --git a/‎…2_multispeaker/scripts/docker/Dockerfile‎ ‎…eech2_libritts/scripts/docker/Dockerfile‎examples/fastspeech2_multispeaker/scripts/docker/Dockerfile renamed to examples/fastspeech2_libritts/scripts/docker/Dockerfile b/‎…2_multispeaker/scripts/docker/Dockerfile‎ ‎…eech2_libritts/scripts/docker/Dockerfile‎examples/fastspeech2_multispeaker/scripts/docker/Dockerfile renamed to examples/fastspeech2_libritts/scripts/docker/Dockerfile
diff --git a/‎…ech2_multispeaker/scripts/interactive.sh‎ ‎…tspeech2_libritts/scripts/interactive.sh‎examples/fastspeech2_multispeaker/scripts/interactive.sh renamed to examples/fastspeech2_libritts/scripts/interactive.sh b/‎…ech2_multispeaker/scripts/interactive.sh‎ ‎…tspeech2_libritts/scripts/interactive.sh‎examples/fastspeech2_multispeaker/scripts/interactive.sh renamed to examples/fastspeech2_libritts/scripts/interactive.sh
diff --git a/‎…ech2_multispeaker/scripts/train_libri.sh‎ ‎…tspeech2_libritts/scripts/train_libri.sh‎examples/fastspeech2_multispeaker/scripts/train_libri.sh renamed to examples/fastspeech2_libritts/scripts/train_libri.sh b/‎…ech2_multispeaker/scripts/train_libri.sh‎ ‎…tspeech2_libritts/scripts/train_libri.sh‎examples/fastspeech2_multispeaker/scripts/train_libri.sh renamed to examples/fastspeech2_libritts/scripts/train_libri.sh
@@ -35,4 +35,10 @@ ljspeech
 LibriTTS/
 dataset/
 mfa/
-kss
+kss/
+baker/
+libritts/
+dump_baker/
+dump_ljspeech/
+dump_kss/
+dump_libritts/
@@ -19,6 +19,7 @@
 :zany_face: TensorflowTTS provides real-time state-of-the-art speech synthesis architectures such as Tacotron-2, Melgan, Multiband-Melgan, FastSpeech, FastSpeech2 based-on TensorFlow 2. With Tensorflow 2, we can speed-up training/inference progress, optimizer further by using [fake-quantize aware](https://www.tensorflow.org/model_optimization/guide/quantization/training_comprehensive_guide) and [pruning](https://www.tensorflow.org/model_optimization/guide/pruning/pruning_with_keras), make TTS models can be run faster than real-time and be able to deploy on mobile devices or embedded systems.
 
 ## What's new
+- 2020/08/18 **(NEW!)** Update [new base processor](https://github.com/TensorSpeech/TensorFlowTTS/blob/master/tensorflow_tts/processor/base_processor.py). Add [AutoProcessor](https://github.com/TensorSpeech/TensorFlowTTS/blob/master/tensorflow_tts/inference/auto_processor.py) and [pretrained processor](https://github.com/TensorSpeech/TensorFlowTTS/blob/master/tensorflow_tts/processor/pretrained/) json file.
 - 2020/08/14 **(NEW!)** Support Chinese TTS. Pls see the [colab](https://colab.research.google.com/drive/1YpSHRBRPBI7cnTkQn1UcVTWEQVbsUm1S?usp=sharing). Thank [@azraelkuan](https://github.com/azraelkuan).
 - 2020/08/05 **(NEW!)** Support Korean TTS. Pls see the [colab](https://colab.research.google.com/drive/1ybWwOS5tipgPFttNulp77P6DAB5MtiuN?usp=sharing). Thank [@crux153](https://github.com/crux153).
 - 2020/07/17 Support MultiGPU for all Trainer.
@@ -93,7 +94,7 @@ Here in an audio samples on valid set. [tacotron-2](https://drive.google.com/ope
 
 Prepare a dataset in the following format:
 ```
-|- datasets/
+|- [NAME_DATASET]/
 |   |- metadata.csv
 |   |- wav/
 |       |- file1.wav
@@ -102,6 +103,8 @@ Prepare a dataset in the following format:
 
 where `metadata.csv` has the following format: `id|transcription`. This is a ljspeech-like format, you can ignore preprocessing steps if you have other format dataset.
 
+Note that `NAME_DATASET` should be `[ljspeech/kss/baker/libritts]` for example.
+
 ## Preprocessing
 
 The preprocessing has two steps:
@@ -116,20 +119,22 @@ The preprocessing has two steps:
 
 To reproduce the steps above:
 ```
-tensorflow-tts-preprocess --rootdir ./datasets --outdir ./dump --config preprocess/[ljspeech/kss/baker]_preprocess.yaml --dataset [ljspeech/kss/baker]
-tensorflow-tts-normalize --rootdir ./dump --outdir ./dump --config preprocess/[ljspeech/kss/baker]_preprocess.yaml --dataset [ljspeech/kss/baker]
+tensorflow-tts-preprocess --rootdir ./[ljspeech/kss/baker/libritts] --outdir ./dump_[ljspeech/kss/baker/libritts] --config preprocess/[ljspeech/kss/baker]_preprocess.yaml --dataset [ljspeech/kss/baker/libritts]
+tensorflow-tts-normalize --rootdir ./dump_[ljspeech/kss/baker/libritts] --outdir ./dump_[ljspeech/kss/baker/libritts] --config preprocess/[ljspeech/kss/baker/libritts]_preprocess.yaml --dataset [ljspeech/kss/baker/libritts]
 ```
 
-Right now we only support [`ljspeech`](https://keithito.com/LJ-Speech-Dataset/), [`kss`](https://www.kaggle.com/bryanpark/korean-single-speaker-speech-dataset), [`baker`](https://weixinxcxdb.oss-cn-beijing.aliyuncs.com/gwYinPinKu/BZNSYP.rar) for dataset argument. In the future, we intend to support more datasets.
+Right now we only support [`ljspeech`](https://keithito.com/LJ-Speech-Dataset/), [`kss`](https://www.kaggle.com/bryanpark/korean-single-speaker-speech-dataset), [`baker`](https://weixinxcxdb.oss-cn-beijing.aliyuncs.com/gwYinPinKu/BZNSYP.rar) and [`libritts`](http://www.openslr.org/60/) for dataset argument. In the future, we intend to support more datasets.
+
+**Note**: To runing `libritts` preprocessing, please first read the instruction in [examples/fastspeech2_libritts](https://github.com/TensorSpeech/TensorFlowTTS/tree/master/examples/fastspeech2_libritts). We need reformat it first before run preprocessing.
 
 After preprocessing, the structure of the project folder should be:
 ```
-|- datasets/
+|- [NAME_DATASET]/
 |   |- metadata.csv
 |   |- wav/
 |       |- file1.wav
 |       |- ...
-|- dump/
+|- dump_[ljspeech/kss/baker/libritts]/
 |   |- train/
 |       |- ids/
 |           |- LJ001-0001-ids.npy
@@ -190,6 +195,7 @@ We use suffix (`ids`, `raw-feats`, `raw-energy`, `raw-f0`, `norm-feats` and `wav
 
 **IMPORTANT NOTES**:
 - This preprocessing step is based on [ESPnet](https://github.com/espnet/espnet) so you can combine all models here with other models from ESPnet repository.
+- Regardless how your dataset is formatted, the final structure of `dump` folder **SHOULD** follow above structure to be able use the training script or you can modify by yourself 😄.
 
 ## Training models
 
@@ -198,6 +204,7 @@ To know how to training model from scratch or fine-tune with other datasets/lang
 - For Tacotron-2 tutorial, pls see [example/tacotron2](https://github.com/dathudeptrai/TensorflowTTS/tree/master/examples/tacotron2)
 - For FastSpeech tutorial, pls see [example/fastspeech](https://github.com/dathudeptrai/TensorflowTTS/tree/master/examples/fastspeech)
 - For FastSpeech2 tutorial, pls see [example/fastspeech2](https://github.com/dathudeptrai/TensorflowTTS/tree/master/examples/fastspeech2)
+- For FastSpeech2 + MFA tutorial, pls see [example/fastspeech2_libritts](https://github.com/dathudeptrai/TensorflowTTS/tree/master/examples/fastspeech2_libritts)
 - For MelGAN tutorial, pls see [example/melgan](https://github.com/dathudeptrai/TensorflowTTS/tree/master/examples/melgan)
 - For MelGAN + STFT Loss tutorial, pls see [example/melgan.stft](https://github.com/dathudeptrai/TensorflowTTS/tree/master/examples/melgan.stft)
 - For Multiband-MelGAN tutorial, pls see [example/multiband_melgan](https://github.com/dathudeptrai/TensorflowTTS/tree/master/examples/multiband_melgan)
@@ -241,10 +248,9 @@ import yaml
 
 import tensorflow as tf
 
-from tensorflow_tts.processor import LJSpeechProcessor
-
 from tensorflow_tts.inference import AutoConfig
 from tensorflow_tts.inference import TFAutoModel
+from tensorflow_tts.inference import AutoProcessor
 
 # initialize fastspeech model.
 fs_config = AutoConfig.from_pretrained('/examples/fastspeech/conf/fastspeech.v1.yaml')
@@ -263,7 +269,7 @@ melgan = TFAutoModel.from_pretrained(
 
 
 # inference
-processor = LJSpeechProcessor(None, cleaner_names="english_cleaners")
+processor = AutoProcessor.from_pretrained(pretrained_path="./test/files/ljspeech_mapper.json")
 
 ids = processor.text_to_sequence("Recent research at Harvard has shown meditating for as little as 8 weeks, can actually increase the grey matter in the parts of the brain responsible for emotional regulation, and learning.")
 ids = tf.expand_dims(ids, 0)
@@ -285,7 +291,7 @@ sf.write('./audio_after.wav', audio_after, 22050, "PCM_16")
 ```
 
 # Contact
-[Minh Nguyen Quan Anh](https://github.com/dathudeptrai): [email protected], [erogol](https://github.com/erogol): [email protected], [Kuan Chen](https://github.com/azraelkuan): [email protected], [Takuya Ebata](https://github.com/MokkeMeguru): [email protected], [Trinh Le Quang](https://github.com/l4zyf9x): [email protected]
+[Minh Nguyen Quan Anh](https://github.com/dathudeptrai): [email protected], [erogol](https://github.com/erogol): [email protected], [Kuan Chen](https://github.com/azraelkuan): [email protected], [Dawid Kobus](https://github.com/machineko): [email protected], [Takuya Ebata](https://github.com/MokkeMeguru): [email protected], [Trinh Le Quang](https://github.com/l4zyf9x): trinhle.cse@gmail.com, [Yunchao He](https://github.com/candlewill): [email protected], [Alejandro Miguel Velasquez](https://github.com/ZDisket): xml506ok@gmail.com
 
 # License
 Overrall, Almost models here are licensed under the [Apache 2.0](http://www.apache.org/licenses/LICENSE-2.0) for all countries in the world, except in **Viet Nam** this framework cannot be used for production in any way without permission from TensorflowTTS's Authors. There is an exception, Tacotron-2 can be used with any perpose. So, if you are VietNamese and want to use this framework for production, you **Must** contact our in andvance.
 
@@ -3,15 +3,15 @@
 ## Prepare
 Everything is done from main repo folder so TensorflowTTS/
 
-0. Optional* [Download](http://www.openslr.org/60/) and prepare libritts (helper to prepare libri in examples/fastspeech2_multispeaker/libri_experiment/prepare_libri.ipynb)
+0. Optional* [Download](http://www.openslr.org/60/) and prepare libritts (helper to prepare libri in examplesfastspeech2_libritts/libri_experiment/prepare_libri.ipynb)
 - Dataset structure after finish this step:
     ```
     |- TensorFlowTTS/
     |   |- LibriTTS/
     |   |-  |- train-clean-100/
     |   |-  |- SPEAKERS.txt
     |   |-  |- ...
-    |   |- dataset/
+    |   |- libritts/
     |   |-  |- 200/
     |   |-  |-  |- 200_124139_000001_000000.txt
     |   |-  |-  |- 200_124139_000001_000000.wav
@@ -25,32 +25,32 @@ Everything is done from main repo folder so TensorflowTTS/
 1. Extract Duration (use examples/mfa_extraction or pretrained tacotron2) 
 2. Optional* build docker 
 - ```
-  bash examples/fastspeech2_multispeaker/scripts/build.sh
+  bash examples/fastspeech2_libritts/scripts/build.sh
   ```
 3. Optional* run docker
 - ```
-  bash examples/fastspeech2_multispeaker/scripts/interactive.sh
+  bash examples/fastspeech2_libritts/scripts/interactive.sh
   ```
 4. Preprocessing:
 - ```
-  tensorflow-tts-preprocess --rootdir ./dataset \
-    --outdir ./dump \
+  tensorflow-tts-preprocess --rootdir ./libritts \
+    --outdir ./dump_libritts \
     --config preprocess/preprocess_libritts.yaml \
-    --dataset multispeaker
+    --dataset libritts
   ```
 
 5. Normalization:
 - ```
-  tensorflow-tts-normalize --rootdir ./dump \
-    --outdir ./dump \
+  tensorflow-tts-normalize --rootdir ./dump_libritts \
+    --outdir ./dump_libritts \
     --config preprocess/preprocess_libritts.yaml \
-    --dataset multispeaker
+    --dataset libritts
   ```
 
 6. Change CharactorDurationF0EnergyMelDataset speaker mapper in fastspeech2_dataset to match your dataset (if you use libri with mfa_extraction you didnt need to change anything)
 7. Change train_libri.sh to match your dataset and run:
 - ```
-  bash examples/fastspeech2_multispeaker/scripts/train_libri.sh
+  bash examples/fastspeech2_libritts/scripts/train_libri.sh
   ```
 8. Optional* If u have problems with tensor sizes mismatch check step 5 in `examples/mfa_extraction` directory
 
 
@@ -15,6 +15,7 @@ format: "npy"
 model_type: fastspeech2
 
 fastspeech2_params:
+    dataset: "libritts"
     n_speakers: 20
     encoder_hidden_size: 384
     encoder_num_hidden_layers: 4
 
@@ -9,9 +9,10 @@
     "import os\n",
     "import random\n",
     "import shutil\n",
+    "import sys\n",
     "\n",
-    "libri_path = \"...../TensorflowTTS/LibriTTS\" # absolute path to TensorFlowTTS.\n",
-    "dataset_path = \"...../TensorflowTTS/dataset\" # Change to your paths\n",
+    "libri_path = \"....../LibriTTS\" # absolute path to TensorFlowTTS.\n",
+    "dataset_path = \"....../libritts\" # Change to your paths. This is a output of re-format dataset.\n",
     "subset = \"train-clean-100\""
    ]
   },
@@ -122,6 +123,13 @@
     "            shutil.copy(j, os.path.join(dataset_path, sp_id, f_name))\n",
     "            shutil.copy(j.replace(\".wav\", \".normalized.txt\"), os.path.join(dataset_path, sp_id, text_f_name))"
    ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": []
   }
  ],
  "metadata": {