This repository demonstrates an AI-powered system that generates soccer commentary with synchronized talking head videos using advanced speech synthesis and lip-sync technology.
The source video is extracted from here
pure_lip_sync.mp4
commentary_results.mp4
realtime_Ball_short.mp4
2025-08-27.14-27-28.mov
- AI Soccer Commentary: Generate realistic soccer match commentary using LLM
- Talking Head Generation: Create synchronized talking head videos with lip-sync
- High-Quality Audio: Advanced TTS (Text-to-Speech) synthesis
- Real-time Processing: Optimized for GPU acceleration
- NVidia GPU with CUDA support (1*RTX4060 is enough)
- Ubuntu 20.04 or higher
- Driver version >= 570.133
- CUDA version >= 12.0
- The environment must be created with Python 3.10 (CosyVoice-ttsfrd requires Python 3.10)
- ModelScope API key is required for LLM.
- Clone the repository:
git clone https://github.com/allanchan339/VLM_Soccer_Commentator_THG --depth 1
cd VLM_Soccer_Commentator_THG
git submodule update --init --recursive-
Install Miniconda or Anaconda, then run following commands
conda env create -f environment_torch2.4.yml -
Activate the environment:
conda activate SoCommVoice2.4# Install dependencies related to musetalk
pip install --no-cache-dir -U openmim
mim install mmengine Then we need to install mmdet and mmpose from source code and comment out the compatibility check in init.py. Otherwise, assertion error will be raised.
cd mmdetection
# Comment out the compatibility check in init.py
nano {python_path}/lib/python3.10/site-packages/mmdet/__init__.py Change the line 17 from:
and mmcv_version < digit_version(mmcv_maximum_version)), \to:
and mmcv_version <= digit_version(mmcv_maximum_version)), \mim install "mmpose>=1.1.0" # not exist in conda-forge# Download the MuseTalk model
sh ./download_THG_weight.sh
# Download the GPT-SoVITS models:
bash ./download_TTS_weight.shpython web_ui_all.py