[Face-GAN-TTS] An Adversarial-Diffusion Framework for Generating High-Quality Voices from Faces

Installation

Creating Package Environments

conda env create lrs2_preprocessing/environment_label.yml
conda activate label_env

conda env create -f environment_train_emv.yaml
conda activate train_env

Build monotonic align module

cd model/monotonic_align; python setup.py build_ext --inplace; cd ../..

Preparation

Checkpoint handling

1.1 For Face-GAN-TTS transfer LRS2 trained model weights from the USB-Stick

1.2 For FACE-TTS download LRS3 trained model weights from here

1.3 Store the checkpoints here: '.\ckpts\'

1.4 Adjust 'resume_from', 'use_gan', 'infr_resume_from_orig' or 'infr_resume_gan' in config.py
Download LRS2 into 'data/lrs2/'
Extract and save audio as '*.wav' files in 'data/lrs2/wav'

 conda activate label_env

python data/lrs2_preprocessing/lrs2_split/extract_audio.py

Data Labeling and Preprocessing

conda activate label_env

python data/lrs2_preprocessing/labeling.py

❗ Faces in video files should be cropped and aligned for LRS2 distribution. You can use 'syncnet_python/detectors'.

Test

Prepare text description in txt file.

echo "This is test" > test/text.txt

Inference Face-TTS.

python inference.py

Result will be saved in 'test/'.

⚡ To make MOS test set, we use the LRS2 test set and the CFD corpus to randomly select faces.

Training

Check config.py
Run

python train.py

Reference

This repo is based on FACE-TTS, Grad-TTS, HiFi-GAN-16k, SyncNet. Thanks!

License

Face-GAN-TTS
Copyright (c) 2025-present Cognitive Modeling Group University of Tübingen

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
callbacks		callbacks
cfd_cropping		cfd_cropping
data		data
demo_files		demo_files
evaluation		evaluation
hyperopt		hyperopt
lrs2_preprocessing		lrs2_preprocessing
model		model
test		test
text		text
utils		utils
LICENSE		LICENSE
NOTICE		NOTICE
README.md		README.md
config.py		config.py
environment_train_env.yml		environment_train_env.yml
inference.py		inference.py
migrate_checkpoint.py		migrate_checkpoint.py
train.py		train.py
train.sh		train.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

[Face-GAN-TTS] An Adversarial-Diffusion Framework for Generating High-Quality Voices from Faces

Installation

Preparation

Test

Training

Reference

License

About

Uh oh!

Releases

Packages

Languages

License

CognitiveModeling/Face-GAN-TTS

Folders and files

Latest commit

History

Repository files navigation

[Face-GAN-TTS] An Adversarial-Diffusion Framework for Generating High-Quality Voices from Faces

Installation

Preparation

Test

Training

Reference

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages