Skip to content

CognitiveModeling/Face-GAN-TTS

Repository files navigation

[Face-GAN-TTS] An Adversarial-Diffusion Framework for Generating High-Quality Voices from Faces


Installation

  1. Creating Package Environments
conda env create lrs2_preprocessing/environment_label.yml
conda activate label_env

conda env create -f environment_train_emv.yaml
conda activate train_env
  1. Build monotonic align module
cd model/monotonic_align; python setup.py build_ext --inplace; cd ../..

Preparation

  1. Checkpoint handling

    1.1 For Face-GAN-TTS transfer LRS2 trained model weights from the USB-Stick

    1.2 For FACE-TTS download LRS3 trained model weights from here

    1.3 Store the checkpoints here: '.\ckpts\'

    1.4 Adjust 'resume_from', 'use_gan', 'infr_resume_from_orig' or 'infr_resume_gan' in config.py

  2. Download LRS2 into 'data/lrs2/'

  3. Extract and save audio as '*.wav' files in 'data/lrs2/wav'

 conda activate label_env 
python data/lrs2_preprocessing/lrs2_split/extract_audio.py
  1. Data Labeling and Preprocessing
    conda activate label_env 
    
    python data/lrs2_preprocessing/labeling.py
    

❗ Faces in video files should be cropped and aligned for LRS2 distribution. You can use 'syncnet_python/detectors'.


Test

  1. Prepare text description in txt file.
echo "This is test" > test/text.txt
  1. Inference Face-TTS.
python inference.py
  1. Result will be saved in 'test/'.

⚡ To make MOS test set, we use the LRS2 test set and the CFD corpus to randomly select faces.


Training

  1. Check config.py

  2. Run

python train.py

Reference

This repo is based on FACE-TTS, Grad-TTS, HiFi-GAN-16k, SyncNet. Thanks!


License

Face-GAN-TTS
Copyright (c) 2025-present Cognitive Modeling Group University of Tübingen

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published