Skip to content

Add Tacotron 2 Mode to Skip Duration Processing in Preprocessing Pipeline #4128

@HuangShaotong

Description

@HuangShaotong

##Feature Request: Add Tacotron 2 Mode to Skip Duration Processing in Preprocessing Pipeline

Is your feature request related to a problem? Please describe:
Yes.
The current preprocessing script preprocess.py forces duration processing (via --dur-file) even when training Tacotron 2, which is an attention-based autoregressive TTS model that does NOT require explicit phoneme durations.

This leads to:

Unnecessary dependency on MFA alignment and durations.txt
Wasted computation (duration parsing, silence trimming based on alignment, frame-length validation)
Confusion for users who expect Tacotron 2 to work without forced alignment
Inconsistent with Tacotron 2 paper and common practice

Describe the feature you'd like:
Add a --model-type (or --skip-duration) flag to preprocess.py to optionally skip all duration-related processing when targeting Tacotron 2.

For FastSpeech2 / VITS (current default)
python preprocess.py
--dataset=ljspeech
--rootdir=...
--dumpdir=...
--dur-file=durations.txt
--config=conf/fastspeech2.yaml

For Tacotron 2 (new mode)
python preprocess.py
--dataset=ljspeech
--rootdir=...
--dumpdir=...
--model-type=tacotron2 \ # ← NEW FLAG
--config=conf/tacotron2.yaml

Describe alternatives you've considered:
Use another model

Metadata

Metadata

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions