Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
76 changes: 76 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,9 +17,85 @@ Wenjiang Zhou
Lyra Lab, Tencent Music Entertainment

**[github](https://github.com/TMElyralab/MuseTalk)** **[huggingface](https://huggingface.co/TMElyralab/MuseTalk)** **[space](https://huggingface.co/spaces/TMElyralab/MuseTalk)** **[Technical report](https://arxiv.org/abs/2410.10122)**
**[colab](MuseTalkV15.ipynb)**

We introduce `MuseTalk`, a **real-time high quality** lip-syncing model (30fps+ on an NVIDIA Tesla V100). MuseTalk can be applied with input videos, e.g., generated by [MuseV](https://github.com/TMElyralab/MuseV), as a complete virtual human solution.


## 🚀 What’s New In This Fork

- **Dockerized MuseTalk Service:** Fully packaged for easy deployment as an API service or in your own workflows.
- **FastAPI Integration:** Added `fastapi_service.py` for running MuseTalk inference through a modern REST API.
- **Automated Model Downloads:** `download_models.py` script auto-downloads all required weights from Hugging Face, Google Drive, and direct URLs (including S3FD).
- **Self-Installing Dependencies:** The model downloader script will install missing Python packages automatically – clean experience for new users.
- **Submodule Friendly:** Designed to work as a submodule in larger projects. Clone, init, and run.
- **Improved Repository Structure:** All models organized under `/models/` for clear management and reproducibility.
- **Workflow Automation:** Minimal manual steps – just run one setup script, then launch your API or Docker container.

***

### 🛠️ How To Run this Fork

**Step 1. Prepare Environment**
Clone the main repo and initialize submodules:
```bash
git clone --recurse-submodules https://github.com/rafipatel/MuseTalk.git
```


**Step 2. If not using Docker, follow all the installation steps mentioned in [Installations](#installation) untill [download weight section](#setup-ffmpeg)**

**Step 3. Download Models and Weights using new script in the fork (as download_weights.sh throws huggingface-cli error)**
Enter the musetalk service folder and run setup:
```bash
cd MuseTalk
python download_models.py
```
- This will:
- Install any missing Python dependencies
- Download all required weights/checkpoints into `/models/`

**Step 4. Run the MuseTalk API**
You can launch the FastAPI server directly:
```bash
uvicorn fastapi_service:app --host 0.0.0.0 --port 8000
```
Or directly via scripts.inference (change test.yaml with your audio and video):
```bash
!python -m scripts.inference --inference_config configs/inference/test.yaml --result_dir results/test --unet_model_path models/musetalkV15/unet.pth --unet_config models/musetalkV15/musetalk.json --version v15 --ffmpeg_path ffmpeg-master-latest-win64-gpl-shared/bin
```

Or (recommended for production) launch via Docker (after downloading files, (mounting volume to container as mentioned below)):
```bash
docker build -t musetalk .

docker run \
-p 8000:8000 \
-v $(pwd)/models:/app/MuseTalk/models \
-v $(pwd)/results:/app/MuseTalk/results \
-v $(pwd)/data:/app/MuseTalk/data \
-v $(pwd)/configs:/app/MuseTalk/configs \
musetalk
```

**Step 4. API Usage**
Access the API at `http://localhost:8000` and POST your inference jobs.

***

**Optional:**
- Edit `download_models.py` to download extra models if needed.
- Update requirements as new features are added.

***

## 📝 Note
- For GPU support, make sure your Docker and system configuration are compatible.
- All downloads are script-automated for plug-and-play — no manual model shopping or setup required!
- For more details, see the commented sections in each main script.



## 🔥 Updates
We're excited to unveil MuseTalk 1.5.
This version **(1)** integrates training with perceptual loss, GAN loss, and sync loss, significantly boosting its overall performance. **(2)** We've implemented a two-stage training strategy and a spatio-temporal data sampling approach to strike a balance between visual quality and lip-sync accuracy.
Expand Down
80 changes: 80 additions & 0 deletions dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,80 @@
FROM python:3.10-slim-bullseye

ENV PYTHONUNBUFFERED=1
ENV PYTORCH_ENABLE_MPS_FALLBACK=1

# Install system dependencies
RUN apt-get update && apt-get install -y \
git \
wget \
ffmpeg \
libsndfile1 \
libgl1-mesa-glx \
libglib2.0-0 \
build-essential \
curl \
&& rm -rf /var/lib/apt/lists/*

WORKDIR /app

# Install PyTorch for CPU/MPS (Mac)
RUN pip3 install --no-cache-dir \
torch==2.0.1 \
torchvision==0.15.2 \
torchaudio==2.0.2

# Clone MuseTalk
# RUN git clone https://github.com/TMElyralab/MuseTalk.git /app/MuseTalk
RUN git clone https://github.com/rafipatel/MuseTalk.git /app/MuseTalk

WORKDIR /app/MuseTalk

# Install requirements
RUN pip3 install --no-cache-dir -r requirements.txt
# RUN pip3 install -r requirements.txt || true

# Install OpenMMLab packages (CPU version)
RUN pip3 install --no-cache-dir -U openmim && \
mim install mmengine && \
pip3 install mmcv==2.0.1 && \
mim install "mmdet==3.1.0" && \
mim install "mmpose==1.1.0"

# Download model weights
# RUN python3 -m pip install huggingface_hub && \
# python3 -c "from huggingface_hub import snapshot_download; \
# snapshot_download(repo_id='TMElyralab/MuseTalk', local_dir='./models', allow_patterns=['models/musetalkV15/*'])" || true

# Download additional model files


# RUN mkdir -p models/face-parse-bisent && \
# pip3 install gdown && \
# gdown --id 154JgKpzCPW82qINcVieuPH3fZ2e0P812 -O models/face-parse-bisent/79999_iter.pth && \
# curl -L https://download.pytorch.org/models/resnet18-5c106cde.pth -o models/face-parse-bisent/resnet18-5c106cde.pth


# # Set working directory
# WORKDIR /app/MuseTalk

# Create entrypoint script
# RUN echo '#!/bin/bash\n\
# python -m scripts.inference \\\n\
# --inference_config ${INFERENCE_CONFIG:-configs/inference/test.yaml} \\\n\
# --result_dir ${RESULT_DIR:-results/test} \\\n\
# --unet_model_path ${UNET_MODEL_PATH:-models/musetalkV15/unet.pth} \\\n\
# --unet_config ${UNET_CONFIG:-models/musetalkV15/musetalk.json} \\\n\
# --version ${VERSION:-v15} \\\n\
# --ffmpeg_path ${FFMPEG_PATH:-/usr/bin/ffmpeg}' > /app/run_inference.sh && \
# chmod +x /app/run_inference.sh

# RUN chmod +x /app/MuseTalk/download_weights.sh
# RUN /app/MuseTalk/download_weights.sh

# COPY inference.sh /app/MuseTalk/inference.sh

# RUN chmod +x /app/MuseTalk/inference.sh
# CMD ["/app/MuseTalk/inference.sh", "v1.5","normal"]
# CMD ["uvicorn", "fastapi_service:app","--host", "0.0.0.0" ,"--port", "8000"]
CMD ["python3", "-m", "uvicorn", "fastapi_service:app", "--host", "0.0.0.0" ,"--port", "8000"]
# CMD ["/app/run_inference.sh"]
188 changes: 188 additions & 0 deletions download_models.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,188 @@
import os
import sys
import importlib
import subprocess

# List any packages needed for downloading
REQUIRED_PACKAGES = [
"huggingface_hub",
"gdown",
"requests"
]

PYTHON_EXEC = sys.executable

# Install missing packages BEFORE importing
for pkg in REQUIRED_PACKAGES:
try:
importlib.import_module(pkg.replace("-", "_"))
except ImportError:
print(f"Installing {pkg} ...")
subprocess.run([PYTHON_EXEC, "-m", "pip", "install", pkg])


from huggingface_hub import hf_hub_download
import sys
import subprocess
import importlib

# --- Configuration ---
CHECKPOINTS_DIR = "models"
HF_ENDPOINT = os.environ.get("HF_ENDPOINT", "https://huggingface.co") # Use mirror if set

# --- Directory Setup ---
DIRS = [
"musetalkV15", "syncnet", "dwpose",
"face-parse-bisent", "sd-vae", "whisper", "musetalk" # Ensure 'musetalk' is here if V1.0 is needed
]

for d in DIRS:
os.makedirs(os.path.join(CHECKPOINTS_DIR, d), exist_ok=True)
print(f"✅ Created base directory: {CHECKPOINTS_DIR} and subdirectories.")

# --- Hugging Face Downloads ---

def download_hf_files(repo_id, filenames, subdir="", has_subpath=False):
"""
Downloads a list of files from a Hugging Face repo.

If has_subpath is True (e.g., MuseTalk), files are downloaded relative to CHECKPOINTS_DIR.
If has_subpath is False (e.g., Whisper), files are downloaded directly into CHECKPOINTS_DIR/subdir.
"""
target_local_dir = os.path.join(CHECKPOINTS_DIR, subdir)

# If the filename contains the directory structure (e.g., "repo_name/file.bin"),
# we need to set local_dir to CHECKPOINTS_DIR to preserve the path.
# Otherwise, we set local_dir to the final destination (target_local_dir).
final_local_dir = CHECKPOINTS_DIR if has_subpath else target_local_dir

for filename in filenames:
print(f"Downloading {filename} from {repo_id} to {target_local_dir}...")

# Use hf_hub_download. The output path handling is based on `has_subpath`.
hf_hub_download(
repo_id=repo_id,
filename=filename,
local_dir=final_local_dir,
endpoint=HF_ENDPOINT
)
print(f"✅ Finished downloading files for {repo_id} into {subdir}/.")


# 1. MuseTalk V1.0 & V1.5 Weights (Uses subpaths in filenames)
# NOTE: The repo files are structured like "musetalk/..." and "musetalkV15/..."
# Setting local_dir=CHECKPOINTS_DIR ensures this internal structure is preserved under "models/"

# V1.0 Files (Target: models/musetalk)
download_hf_files(
repo_id="TMElyralab/MuseTalk",
filenames=[
"musetalk/musetalk.json",
"musetalk/pytorch_model.bin"
],
subdir="musetalk",
has_subpath=True # Filenames contain the subdir path
)
# V1.5 Files (Target: models/musetalkV15)
download_hf_files(
repo_id="TMElyralab/MuseTalk",
filenames=[
"musetalkV15/musetalk.json",
"musetalkV15/unet.pth"
],
subdir="musetalkV15",
has_subpath=True # Filenames contain the subdir path
)

# 2. SD VAE Weights (No subpaths in filenames)
# Target: models/sd-vae/
download_hf_files(
repo_id="stabilityai/sd-vae-ft-mse",
filenames=[
"config.json",
"diffusion_pytorch_model.bin",
"diffusion_pytorch_model.safetensors"
],
subdir="sd-vae",
has_subpath=False
)

# 3. Whisper Weights (No subpaths in filenames)
# FIX: This now downloads directly into models/whisper/
# Target: models/whisper/
download_hf_files(
repo_id="openai/whisper-tiny",
filenames=[
"config.json",
"pytorch_model.bin",
"preprocessor_config.json"
],
subdir="whisper",
has_subpath=False
)

# 4. DWPose Weights (No subpaths in filenames)
# Target: models/dwpose/
download_hf_files(
repo_id="yzd-v/DWPose",
filenames=["dw-ll_ucoco_384.pth"],
subdir="dwpose",
has_subpath=False
)

# 5. SyncNet Weights (No subpaths in filenames)
# Target: models/syncnet/
download_hf_files(
repo_id="ByteDance/LatentSync",
filenames=["latentsync_syncnet.pt"],
subdir="syncnet",
has_subpath=False
)

print("--- Hugging Face downloads complete. ---")



# Download BiSeNet Face Parse Model file (from Google Drive)
try:
import gdown
except ImportError:
subprocess.run(['pip', 'install', 'gdown'])
import gdown
gdown.download(
'https://drive.google.com/uc?id=154JgKpzCPW82qINcVieuPH3fZ2e0P812',
os.path.join(CHECKPOINTS_DIR, "face-parse-bisent", "79999_iter.pth"),
quiet=False
)




# Download resnet18 model
import requests
url = "https://download.pytorch.org/models/resnet18-5c106cde.pth"
output_path = os.path.join(CHECKPOINTS_DIR, "face-parse-bisent", "resnet18-5c106cde.pth")
response = requests.get(url, stream=True)
with open(output_path, "wb") as f:
for chunk in response.iter_content(chunk_size=8192):
f.write(chunk)
print(f"✅ Downloaded {url} to {output_path}")



s3fd_url = "https://www.adrianbulat.com/downloads/python-fan/s3fd-619a316812.pth"
s3fd_dest_dir = os.path.expanduser("~/.cache/torch/hub/checkpoints")
os.makedirs(s3fd_dest_dir, exist_ok=True)
s3fd_dest_path = os.path.join(s3fd_dest_dir, "s3fd-619a316812.pth")

if not os.path.exists(s3fd_dest_path) or os.path.getsize(s3fd_dest_path) < 85_000_000: # ~85MB expected
print(f"Downloading S3FD weights from {s3fd_url} ...")
response = requests.get(s3fd_url, stream=True)
with open(s3fd_dest_path, "wb") as f:
for chunk in response.iter_content(chunk_size=8192):
f.write(chunk)
print(f"✅ Downloaded S3FD model to {s3fd_dest_path}")
else:
print(f"✅ S3FD weights already present: {s3fd_dest_path}")

print("--- All model downloads complete. ---")
Loading