-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Description
##Feature Request: Add Tacotron 2 Mode to Skip Duration Processing in Preprocessing Pipeline
Is your feature request related to a problem? Please describe:
Yes.
The current preprocessing script preprocess.py forces duration processing (via --dur-file) even when training Tacotron 2, which is an attention-based autoregressive TTS model that does NOT require explicit phoneme durations.
This leads to:
Unnecessary dependency on MFA alignment and durations.txt
Wasted computation (duration parsing, silence trimming based on alignment, frame-length validation)
Confusion for users who expect Tacotron 2 to work without forced alignment
Inconsistent with Tacotron 2 paper and common practice
Describe the feature you'd like:
Add a --model-type (or --skip-duration) flag to preprocess.py to optionally skip all duration-related processing when targeting Tacotron 2.
For FastSpeech2 / VITS (current default)
python preprocess.py
--dataset=ljspeech
--rootdir=...
--dumpdir=...
--dur-file=durations.txt
--config=conf/fastspeech2.yaml
For Tacotron 2 (new mode)
python preprocess.py
--dataset=ljspeech
--rootdir=...
--dumpdir=...
--model-type=tacotron2 \ # ← NEW FLAG
--config=conf/tacotron2.yaml
Describe alternatives you've considered:
Use another model