- Creating Package Environments
conda env create lrs2_preprocessing/environment_label.yml
conda activate label_env
conda env create -f environment_train_emv.yaml
conda activate train_env
- Build monotonic align module
cd model/monotonic_align; python setup.py build_ext --inplace; cd ../..
-
Checkpoint handling
1.1 For Face-GAN-TTS transfer LRS2 trained model weights from the USB-Stick
1.2 For FACE-TTS download LRS3 trained model weights from here
1.3 Store the checkpoints here:
'.\ckpts\'
1.4 Adjust
'resume_from'
,'use_gan'
,'infr_resume_from_orig'
or'infr_resume_gan'
inconfig.py
-
Download LRS2 into
'data/lrs2/'
-
Extract and save audio as '*.wav' files in
'data/lrs2/wav'
conda activate label_env
python data/lrs2_preprocessing/lrs2_split/extract_audio.py
- Data Labeling and Preprocessing
conda activate label_env
python data/lrs2_preprocessing/labeling.py
❗ Faces in video files should be cropped and aligned for LRS2 distribution. You can use 'syncnet_python/detectors'.
- Prepare text description in txt file.
echo "This is test" > test/text.txt
- Inference Face-TTS.
python inference.py
- Result will be saved in
'test/'
.
⚡ To make MOS test set, we use the LRS2 test set and the CFD corpus to randomly select faces.
-
Check config.py
-
Run
python train.py
This repo is based on FACE-TTS, Grad-TTS, HiFi-GAN-16k, SyncNet. Thanks!
Face-GAN-TTS
Copyright (c) 2025-present Cognitive Modeling Group University of Tübingen
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.