TMElyralab · rafipatel · Nov 13, 2025 · Nov 13, 2025 · Nov 13, 2025 · Nov 13, 2025
diff --git a/README.md b/README.md
@@ -17,9 +17,85 @@ Wenjiang Zhou
 Lyra Lab, Tencent Music Entertainment
 
 **[github](https://github.com/TMElyralab/MuseTalk)**    **[huggingface](https://huggingface.co/TMElyralab/MuseTalk)**    **[space](https://huggingface.co/spaces/TMElyralab/MuseTalk)**    **[Technical report](https://arxiv.org/abs/2410.10122)**
+**[colab](MuseTalkV15.ipynb)**
 
 We introduce `MuseTalk`, a **real-time high quality** lip-syncing model (30fps+ on an NVIDIA Tesla V100). MuseTalk can be applied with input videos, e.g., generated by [MuseV](https://github.com/TMElyralab/MuseV), as a complete virtual human solution.
 
+
+## 🚀 What’s New In This Fork
+
+- **Dockerized MuseTalk Service:** Fully packaged for easy deployment as an API service or in your own workflows.
+- **FastAPI Integration:** Added `fastapi_service.py` for running MuseTalk inference through a modern REST API.
+- **Automated Model Downloads:** `download_models.py` script auto-downloads all required weights from Hugging Face, Google Drive, and direct URLs (including S3FD).
+- **Self-Installing Dependencies:** The model downloader script will install missing Python packages automatically – clean experience for new users.
+- **Submodule Friendly:** Designed to work as a submodule in larger projects. Clone, init, and run.
+- **Improved Repository Structure:** All models organized under `/models/` for clear management and reproducibility.
+- **Workflow Automation:** Minimal manual steps – just run one setup script, then launch your API or Docker container.
+
+***
+
+### 🛠️ How To Run this Fork
+
+**Step 1. Prepare Environment**  
+Clone the main repo and initialize submodules:
+```bash
+git clone --recurse-submodules https://github.com/rafipatel/MuseTalk.git
+```
+
+
+**Step 2. If not using Docker, follow all the installation steps mentioned in [Installations](#installation) untill [download weight section](#setup-ffmpeg)**
+
+**Step 3. Download Models and Weights using new script in the fork (as download_weights.sh throws huggingface-cli error)**  
+Enter the musetalk service folder and run setup:
+```bash
+cd MuseTalk
+python download_models.py
+``` 
+- This will:
+    - Install any missing Python dependencies
+    - Download all required weights/checkpoints into `/models/`
+
+**Step 4. Run the MuseTalk API**  
+You can launch the FastAPI server directly:
+```bash
+uvicorn fastapi_service:app --host 0.0.0.0 --port 8000 
+```
+Or directly via scripts.inference (change test.yaml with your audio and video):
+```bash
+!python -m scripts.inference --inference_config configs/inference/test.yaml --result_dir results/test --unet_model_path models/musetalkV15/unet.pth --unet_config models/musetalkV15/musetalk.json --version v15 --ffmpeg_path ffmpeg-master-latest-win64-gpl-shared/bin
+```
+
+Or (recommended for production) launch via Docker (after downloading files, (mounting volume to container as mentioned below)):
+```bash
+docker build -t musetalk .
+
+docker run \ 
+  -p 8000:8000 \                                     
+  -v $(pwd)/models:/app/MuseTalk/models \
+  -v $(pwd)/results:/app/MuseTalk/results \
+  -v $(pwd)/data:/app/MuseTalk/data \
+  -v $(pwd)/configs:/app/MuseTalk/configs \
+  musetalk
+```
+
+**Step 4. API Usage**  
+Access the API at `http://localhost:8000` and POST your inference jobs.
+
+***
+
+**Optional:**  
+- Edit `download_models.py` to download extra models if needed.
+- Update requirements as new features are added.
+
+***
+
+## 📝 Note
+- For GPU support, make sure your Docker and system configuration are compatible.
+- All downloads are script-automated for plug-and-play — no manual model shopping or setup required!
+- For more details, see the commented sections in each main script.
+
+
+
 ## 🔥 Updates
 We're excited to unveil MuseTalk 1.5. 
 This version **(1)** integrates training with perceptual loss, GAN loss, and sync loss, significantly boosting its overall performance. **(2)** We've implemented a two-stage training strategy and a spatio-temporal data sampling approach to strike a balance between visual quality and lip-sync accuracy. 

diff --git a/dockerfile b/dockerfile
@@ -0,0 +1,80 @@
+FROM python:3.10-slim-bullseye
+
+ENV PYTHONUNBUFFERED=1
+ENV PYTORCH_ENABLE_MPS_FALLBACK=1
+
+# Install system dependencies
+RUN apt-get update && apt-get install -y \
+    git \
+    wget \
+    ffmpeg \
+    libsndfile1 \
+    libgl1-mesa-glx \
+    libglib2.0-0 \
+    build-essential \
+    curl \
+    && rm -rf /var/lib/apt/lists/*
+
+WORKDIR /app
+
+# Install PyTorch for CPU/MPS (Mac)
+RUN pip3 install --no-cache-dir \
+    torch==2.0.1 \
+    torchvision==0.15.2 \
+    torchaudio==2.0.2
+
+# Clone MuseTalk
+# RUN git clone https://github.com/TMElyralab/MuseTalk.git /app/MuseTalk
+RUN git clone https://github.com/rafipatel/MuseTalk.git /app/MuseTalk
+
+WORKDIR /app/MuseTalk
+
+# Install requirements
+RUN pip3 install --no-cache-dir -r requirements.txt
+# RUN pip3 install -r requirements.txt || true
+
+# Install OpenMMLab packages (CPU version)
+RUN pip3 install --no-cache-dir -U openmim && \
+    mim install mmengine && \
+    pip3 install mmcv==2.0.1 && \
+    mim install "mmdet==3.1.0" && \
+    mim install "mmpose==1.1.0"
+
+# Download model weights
+# RUN python3 -m pip install huggingface_hub && \
+#     python3 -c "from huggingface_hub import snapshot_download; \
+#     snapshot_download(repo_id='TMElyralab/MuseTalk', local_dir='./models', allow_patterns=['models/musetalkV15/*'])" || true
+
+# Download additional model files
+
+
+# RUN mkdir -p models/face-parse-bisent && \
+#     pip3 install gdown && \
+#     gdown --id 154JgKpzCPW82qINcVieuPH3fZ2e0P812 -O models/face-parse-bisent/79999_iter.pth && \
+#     curl -L https://download.pytorch.org/models/resnet18-5c106cde.pth -o models/face-parse-bisent/resnet18-5c106cde.pth
+
+
+# # Set working directory
+# WORKDIR /app/MuseTalk
+
+# Create entrypoint script
+# RUN echo '#!/bin/bash\n\
+# python -m scripts.inference \\\n\
+#   --inference_config ${INFERENCE_CONFIG:-configs/inference/test.yaml} \\\n\
+#   --result_dir ${RESULT_DIR:-results/test} \\\n\
+#   --unet_model_path ${UNET_MODEL_PATH:-models/musetalkV15/unet.pth} \\\n\
+#   --unet_config ${UNET_CONFIG:-models/musetalkV15/musetalk.json} \\\n\
+#   --version ${VERSION:-v15} \\\n\
+#   --ffmpeg_path ${FFMPEG_PATH:-/usr/bin/ffmpeg}' > /app/run_inference.sh && \
+# chmod +x /app/run_inference.sh
+
+# RUN chmod +x /app/MuseTalk/download_weights.sh
+# RUN  /app/MuseTalk/download_weights.sh
+
+# COPY inference.sh /app/MuseTalk/inference.sh
+
+# RUN chmod +x /app/MuseTalk/inference.sh
+# CMD ["/app/MuseTalk/inference.sh", "v1.5","normal"]
+# CMD ["uvicorn", "fastapi_service:app","--host", "0.0.0.0" ,"--port", "8000"]
+CMD ["python3", "-m", "uvicorn", "fastapi_service:app", "--host", "0.0.0.0" ,"--port", "8000"]
+# CMD ["/app/run_inference.sh"]
diff --git a/download_models.py b/download_models.py
@@ -0,0 +1,188 @@
+import os
+import sys
+import importlib
+import subprocess
+
+# List any packages needed for downloading
+REQUIRED_PACKAGES = [
+    "huggingface_hub",
+    "gdown",
+    "requests"
+]
+
+PYTHON_EXEC = sys.executable
+
+# Install missing packages BEFORE importing
+for pkg in REQUIRED_PACKAGES:
+    try:
+        importlib.import_module(pkg.replace("-", "_"))
+    except ImportError:
+        print(f"Installing {pkg} ...")
+        subprocess.run([PYTHON_EXEC, "-m", "pip", "install", pkg])
+
+
+from huggingface_hub import hf_hub_download
+import sys
+import subprocess
+import importlib
+
+# --- Configuration ---
+CHECKPOINTS_DIR = "models"
+HF_ENDPOINT = os.environ.get("HF_ENDPOINT", "https://huggingface.co") # Use mirror if set
+
+# --- Directory Setup ---
+DIRS = [
+     "musetalkV15", "syncnet", "dwpose",
+    "face-parse-bisent", "sd-vae", "whisper", "musetalk" # Ensure 'musetalk' is here if V1.0 is needed
+]
+
+for d in DIRS:
+    os.makedirs(os.path.join(CHECKPOINTS_DIR, d), exist_ok=True)
+print(f"✅ Created base directory: {CHECKPOINTS_DIR} and subdirectories.")
+
+# --- Hugging Face Downloads ---
+
+def download_hf_files(repo_id, filenames, subdir="", has_subpath=False):
+    """
+    Downloads a list of files from a Hugging Face repo.
+
+    If has_subpath is True (e.g., MuseTalk), files are downloaded relative to CHECKPOINTS_DIR.
+    If has_subpath is False (e.g., Whisper), files are downloaded directly into CHECKPOINTS_DIR/subdir.
+    """
+    target_local_dir = os.path.join(CHECKPOINTS_DIR, subdir)
+
+    # If the filename contains the directory structure (e.g., "repo_name/file.bin"),
+    # we need to set local_dir to CHECKPOINTS_DIR to preserve the path.
+    # Otherwise, we set local_dir to the final destination (target_local_dir).
+    final_local_dir = CHECKPOINTS_DIR if has_subpath else target_local_dir
+
+    for filename in filenames:
+        print(f"Downloading {filename} from {repo_id} to {target_local_dir}...")
+
+        # Use hf_hub_download. The output path handling is based on `has_subpath`.
+        hf_hub_download(
+            repo_id=repo_id,
+            filename=filename,
+            local_dir=final_local_dir,
+            endpoint=HF_ENDPOINT
+        )
+    print(f"✅ Finished downloading files for {repo_id} into {subdir}/.")
+
+
+# 1. MuseTalk V1.0 & V1.5 Weights (Uses subpaths in filenames)
+# NOTE: The repo files are structured like "musetalk/..." and "musetalkV15/..."
+# Setting local_dir=CHECKPOINTS_DIR ensures this internal structure is preserved under "models/"
+
+# V1.0 Files (Target: models/musetalk)
+download_hf_files(
+    repo_id="TMElyralab/MuseTalk",
+    filenames=[
+        "musetalk/musetalk.json",
+        "musetalk/pytorch_model.bin"
+    ],
+    subdir="musetalk",
+    has_subpath=True # Filenames contain the subdir path
+)
+# V1.5 Files (Target: models/musetalkV15)
+download_hf_files(
+    repo_id="TMElyralab/MuseTalk",
+    filenames=[
+        "musetalkV15/musetalk.json",
+        "musetalkV15/unet.pth"
+    ],
+    subdir="musetalkV15",
+    has_subpath=True # Filenames contain the subdir path
+)
+
+# 2. SD VAE Weights (No subpaths in filenames)
+# Target: models/sd-vae/
+download_hf_files(
+    repo_id="stabilityai/sd-vae-ft-mse",
+    filenames=[
+        "config.json",
+        "diffusion_pytorch_model.bin",
+        "diffusion_pytorch_model.safetensors"
+    ],
+    subdir="sd-vae",
+    has_subpath=False
+)
+
+# 3. Whisper Weights (No subpaths in filenames)
+# FIX: This now downloads directly into models/whisper/
+# Target: models/whisper/
+download_hf_files(
+    repo_id="openai/whisper-tiny",
+    filenames=[
+        "config.json",
+        "pytorch_model.bin",
+        "preprocessor_config.json"
+    ],
+    subdir="whisper",
+    has_subpath=False
+)
+
+# 4. DWPose Weights (No subpaths in filenames)
+# Target: models/dwpose/
+download_hf_files(
+    repo_id="yzd-v/DWPose",
+    filenames=["dw-ll_ucoco_384.pth"],
+    subdir="dwpose",
+    has_subpath=False
+)
+
+# 5. SyncNet Weights (No subpaths in filenames)
+# Target: models/syncnet/
+download_hf_files(
+    repo_id="ByteDance/LatentSync",
+    filenames=["latentsync_syncnet.pt"],
+    subdir="syncnet",
+    has_subpath=False
+)
+
+print("--- Hugging Face downloads complete. ---")
+
+
+
+# Download BiSeNet Face Parse Model file (from Google Drive)
+try:
+    import gdown
+except ImportError:
+    subprocess.run(['pip', 'install', 'gdown'])
+    import gdown
+gdown.download(
+    'https://drive.google.com/uc?id=154JgKpzCPW82qINcVieuPH3fZ2e0P812',
+    os.path.join(CHECKPOINTS_DIR, "face-parse-bisent", "79999_iter.pth"),
+    quiet=False
+)
+
+
+
+
+# Download resnet18 model
+import requests
+url = "https://download.pytorch.org/models/resnet18-5c106cde.pth"
+output_path = os.path.join(CHECKPOINTS_DIR, "face-parse-bisent", "resnet18-5c106cde.pth")
+response = requests.get(url, stream=True)
+with open(output_path, "wb") as f:
+    for chunk in response.iter_content(chunk_size=8192):
+        f.write(chunk)
+print(f"✅ Downloaded {url} to {output_path}")
+
+
+
+s3fd_url = "https://www.adrianbulat.com/downloads/python-fan/s3fd-619a316812.pth"
+s3fd_dest_dir = os.path.expanduser("~/.cache/torch/hub/checkpoints")
+os.makedirs(s3fd_dest_dir, exist_ok=True)
+s3fd_dest_path = os.path.join(s3fd_dest_dir, "s3fd-619a316812.pth")
+
+if not os.path.exists(s3fd_dest_path) or os.path.getsize(s3fd_dest_path) < 85_000_000:  # ~85MB expected
+    print(f"Downloading S3FD weights from {s3fd_url} ...")
+    response = requests.get(s3fd_url, stream=True)
+    with open(s3fd_dest_path, "wb") as f:
+        for chunk in response.iter_content(chunk_size=8192):
+            f.write(chunk)
+    print(f"✅ Downloaded S3FD model to {s3fd_dest_path}")
+else:
+    print(f"✅ S3FD weights already present: {s3fd_dest_path}")
+
+print("--- All model downloads complete. ---")