Skip to content

sp-nitech/DHSMM-TTS

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Pytorch Implementation of Deep HSMM-Based Text-to-Speech Synthesis

  • This software is distributed under the BSD 3-Clause license. Please see LICENSE for more details.
  • Demo samples

Requirements

Usage

This repository currently provides an implementation of an acoustic model only.
To run training and inference, you must prepare your own speech database and neural vocoder.
The included examples assume the use of the XIMERA Corpus, which follows this directory structure:
/db/ATR-TTS-JP-CORPUS/F009/AOZORAR/T01/000/F009_AOZORAR_00001_T01.wav

1. Data Preparation

Prepare the following files based on your prepared speech database:

  • List files: ./list/
  • Phoneme+accent label files: ./data/phn/phone_hl/
    👉You can refer to the provided examples in the repository for formatting and structure.

2. Generate Mel-Spectrogram Files

sh mkmel.sh

output: ./data/mel/.../filename.npz

3. Training:

config: ./scripts/model/demo/config.py

sh train.sh

output: ./model/.../checkpoint_#####

4.Inference:

sh gen.sh

output: ./gen/.../filename.npz (mel-spectrogram files)
👉The generated mel-spectrograms can be fed into a neural vocoder to synthesize the final waveform.

Who we are

About

Pytorch implementation of Deep HSMM-based text-to-speech synthesis

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published