warmshao
diff --git a/‎README.md‎
Lines changed: 7 additions & 0 deletions b/‎README.md‎
Lines changed: 7 additions & 0 deletions
diff --git a/‎README_ZH.md‎
Lines changed: 7 additions & 0 deletions b/‎README_ZH.md‎
Lines changed: 7 additions & 0 deletions
diff --git a/‎assets/examples/driving/a-01.wav‎
329 KB b/‎assets/examples/driving/a-01.wav‎
329 KB
diff --git a/‎configs/onnx_infer.yaml‎
Lines changed: 8 additions & 0 deletions b/‎configs/onnx_infer.yaml‎
Lines changed: 8 additions & 0 deletions
diff --git a/‎configs/onnx_mp_infer.yaml‎
Lines changed: 8 additions & 0 deletions b/‎configs/onnx_mp_infer.yaml‎
Lines changed: 8 additions & 0 deletions
diff --git a/‎configs/trt_infer.yaml‎
Lines changed: 8 additions & 0 deletions b/‎configs/trt_infer.yaml‎
Lines changed: 8 additions & 0 deletions
diff --git a/‎configs/trt_mp_infer.yaml‎
Lines changed: 8 additions & 0 deletions b/‎configs/trt_mp_infer.yaml‎
Lines changed: 8 additions & 0 deletions
diff --git a/‎requirements.txt‎
Lines changed: 2 additions & 1 deletion b/‎requirements.txt‎
Lines changed: 2 additions & 1 deletion
diff --git a/‎requirements_macos.txt‎
Lines changed: 2 additions & 1 deletion b/‎requirements_macos.txt‎
Lines changed: 2 additions & 1 deletion
diff --git a/‎requirements_win.txt‎
Lines changed: 2 additions & 1 deletion b/‎requirements_win.txt‎
Lines changed: 2 additions & 1 deletion
@@ -7,6 +7,7 @@
 * Achieved real-time running of LivePortrait on RTX 3090 GPU using TensorRT, reaching speeds of 30+ FPS. This is the speed for rendering a single frame, including pre- and post-processing, not just the model inference speed.
 * Implemented conversion of LivePortrait model to Onnx model, achieving inference speed of about 70ms/frame (~12 FPS) using onnxruntime-gpu on RTX 3090, facilitating cross-platform deployment.
 * Seamless support for native gradio app, with several times faster speed and support for simultaneous inference on multiple faces and Animal Model.
+* Added support for [JoyVASA](https://github.com/jdh-algo/JoyVASA), which can drive videos or images with audio.
 
 **If you find this project useful, please give it a star ✨✨**
 
@@ -15,6 +16,12 @@
 <video src="https://github.com/user-attachments/assets/716d61a7-41ae-483a-874d-ea1bf345bd1a" controls="controls" width="500" height="300">您的浏览器不支持播放该视频！</video>
 
 **Changelog**
+- [x] **2024/12/16:** Added support for [JoyVASA](https://github.com/jdh-algo/JoyVASA), which can drive videos or images with audio. Very cool!
+ - Update code, then download the models: `huggingface-cli download TencentGameMate/chinese-hubert-base --local-dir .\checkpoints\chinese-hubert-base` and `huggingface-cli download jdh-algo/JoyVASA --local-dir ./checkpoints/JoyVASA`
+ - After launching the webui, follow the tutorial below. When the source is a video, it's recommended to only drive the mouth movements
+  
+  <video src="https://github.com/user-attachments/assets/42fb24be-0cde-4138-9671-e52eec95e7f5" controls="controls" width="500" height="400">您的浏览器不支持播放该视频！</video>
+
 - [x] **2024/12/14:** Added pickle and image driving, as well as region driving animation_region.
   - Please update the latest code. Windows users can directly double-click `update.bat` to update, but note that your local code will be overwritten.
   - Running `python run.py` now automatically saves the corresponding pickle to the same directory as the driving video, allowing for direct reuse.
 
@@ -7,6 +7,7 @@
 * 通过TensorRT实现在RTX 3090显卡上**实时**运行LivePortrait，速度达到 30+ FPS. 这个速度是实测渲染出一帧的速度，而不仅仅是模型的推理时间。
 * 实现将LivePortrait模型转为Onnx模型，使用onnxruntime-gpu在RTX 3090上的推理速度约为 70ms/帧（～12 FPS），方便跨平台的部署。
 * 无缝支持原生的gradio app, 速度快了好几倍，支持多张人脸、Animal模型。
+* 增加[JoyVASA](https://github.com/jdh-algo/JoyVASA)的支持，可以用音频驱动视频或图片。
 
 **如果你觉得这个项目有用，帮我点个star吧✨✨**
 
@@ -15,6 +16,12 @@
 <video src="https://github.com/user-attachments/assets/716d61a7-41ae-483a-874d-ea1bf345bd1a" controls="controls" width="500" height="300">您的浏览器不支持播放该视频！</video>
 
 **日志**
+- [x] **2024/12/16:** 增加[JoyVASA](https://github.com/jdh-algo/JoyVASA)的支持，可以用音频驱动视频或图片。非常酷！
+  - 更新代码，然后下载模型: `huggingface-cli download TencentGameMate/chinese-hubert-base --local-dir .\checkpoints\chinese-hubert-base` 和 ` huggingface-cli download jdh-algo/JoyVASA --local-dir ./checkpoints/JoyVASA`
+  - 启动webui后根据以下教程使用即可，建议source 是视频的情况下只驱动嘴部
+
+   <video src="https://github.com/user-attachments/assets/42fb24be-0cde-4138-9671-e52eec95e7f5" controls="controls" width="500" height="400">您的浏览器不支持播放该视频！</video>
+  
 - [x] **2024/12/14:** 增加pickle和image驱动以及区域驱动`animation_region`。
   - 请更新最新的代码，windows用户可以直接双击`update.bat`更新，但请注意本地的代码将会被覆盖。
   - `python run.py ` 现在运行 `driving video`会自动保存对应的pickle到跟`driving video`一样的目录，可以直接复用。
 
@@ -70,6 +70,11 @@ animal_models:
       - "./checkpoints/liveportrait_onnx/retinaface_det_static.onnx"
       - "./checkpoints/liveportrait_onnx/face_2dpose_106_static.onnx"
 
+joyvasa_models:
+  motion_model_path: "checkpoints/JoyVASA/motion_generator/motion_generator_hubert_chinese.pt"
+  audio_model_path: "checkpoints/chinese-hubert-base"
+  motion_template_path: "checkpoints/JoyVASA/motion_template/motion_template.pkl"
+
 crop_params:
   src_dsize: 512
   src_scale: 2.3
@@ -102,5 +107,8 @@ infer_params:
   driving_multiplier: 1.0
   animation_region: "all"
 
+  cfg_mode: "incremental"
+  cfg_scale: 1.2
+
   source_max_dim: 1280 # the max dim of height and width of source image
   source_division: 2 # make sure the height and width of source image can be divided by this number
@@ -64,6 +64,11 @@ animal_models:
     name: "MediaPipeFaceModel"
     predict_type: "mp"
 
+joyvasa_models:
+  motion_model_path: "checkpoints/JoyVASA/motion_generator/motion_generator_hubert_chinese.pt"
+  audio_model_path: "checkpoints/chinese-hubert-base"
+  motion_template_path: "checkpoints/JoyVASA/motion_template/motion_template.pkl"
+
 crop_params:
   src_dsize: 512
   src_scale: 2.3
@@ -96,5 +101,8 @@ infer_params:
   driving_multiplier: 1.0
   animation_region: "all"
 
+  cfg_mode: "incremental"
+  cfg_scale: 1.2
+
   source_max_dim: 1280 # the max dim of height and width of source image
   source_division: 2 # make sure the height and width of source image can be divided by this number
@@ -70,6 +70,11 @@ animal_models:
       - "./checkpoints/liveportrait_onnx/retinaface_det_static.trt"
       - "./checkpoints/liveportrait_onnx/face_2dpose_106_static.trt"
 
+joyvasa_models:
+  motion_model_path: "checkpoints/JoyVASA/motion_generator/motion_generator_hubert_chinese.pt"
+  audio_model_path: "checkpoints/chinese-hubert-base"
+  motion_template_path: "checkpoints/JoyVASA/motion_template/motion_template.pkl"
+
 crop_params:
   src_dsize: 512
   src_scale: 2.3
@@ -102,5 +107,8 @@ infer_params:
   driving_multiplier: 1.0
   animation_region: "all"
 
+  cfg_mode: "incremental"
+  cfg_scale: 1.2
+
   source_max_dim: 1280 # the max dim of height and width of source image
   source_division: 2 # make sure the height and width of source image can be divided by this number
@@ -64,6 +64,11 @@ animal_models:
     name: "MediaPipeFaceModel"
     predict_type: "mp"
 
+joyvasa_models:
+  motion_model_path: "checkpoints/JoyVASA/motion_generator/motion_generator_hubert_chinese.pt"
+  audio_model_path: "checkpoints/chinese-hubert-base"
+  motion_template_path: "checkpoints/JoyVASA/motion_template/motion_template.pkl"
+
 crop_params:
   src_dsize: 512
   src_scale: 2.3
@@ -96,5 +101,8 @@ infer_params:
   mask_crop_path: "./assets/mask_template.png"
   driving_multiplier: 1.0
 
+  cfg_mode: "incremental"
+  cfg_scale: 1.2
+
   source_max_dim: 1280 # the max dim of height and width of source image
   source_division: 2 # make sure the height and width of source image can be divided by this number
@@ -9,4 +9,5 @@ scikit-image
 insightface
 huggingface_hub[cli]
 mediapipe
-torchgeometry
+torchgeometry
+soundfile
@@ -9,4 +9,5 @@ scikit-image
 insightface
 huggingface_hub[cli]
 mediapipe
-torchgeometry
+torchgeometry
+soundfile
@@ -8,4 +8,5 @@ scikit-image
 insightface
 huggingface_hub[cli]
 mediapipe
-torchgeometry
+torchgeometry
+soundfile