Skip to content

Commit 9bac434

Browse files
authored
Merge pull request #106 from warmshao/joyvasa
Joyvasa
2 parents c1a9d4d + 36efdf9 commit 9bac434

22 files changed

+1207
-149
lines changed

README.md

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,7 @@
77
* Achieved real-time running of LivePortrait on RTX 3090 GPU using TensorRT, reaching speeds of 30+ FPS. This is the speed for rendering a single frame, including pre- and post-processing, not just the model inference speed.
88
* Implemented conversion of LivePortrait model to Onnx model, achieving inference speed of about 70ms/frame (~12 FPS) using onnxruntime-gpu on RTX 3090, facilitating cross-platform deployment.
99
* Seamless support for native gradio app, with several times faster speed and support for simultaneous inference on multiple faces and Animal Model.
10+
* Added support for [JoyVASA](https://github.com/jdh-algo/JoyVASA), which can drive videos or images with audio.
1011

1112
**If you find this project useful, please give it a star ✨✨**
1213

@@ -15,6 +16,12 @@
1516
<video src="https://github.com/user-attachments/assets/716d61a7-41ae-483a-874d-ea1bf345bd1a" controls="controls" width="500" height="300">您的浏览器不支持播放该视频!</video>
1617

1718
**Changelog**
19+
- [x] **2024/12/16:** Added support for [JoyVASA](https://github.com/jdh-algo/JoyVASA), which can drive videos or images with audio. Very cool!
20+
- Update code, then download the models: `huggingface-cli download TencentGameMate/chinese-hubert-base --local-dir .\checkpoints\chinese-hubert-base` and `huggingface-cli download jdh-algo/JoyVASA --local-dir ./checkpoints/JoyVASA`
21+
- After launching the webui, follow the tutorial below. When the source is a video, it's recommended to only drive the mouth movements
22+
23+
<video src="https://github.com/user-attachments/assets/42fb24be-0cde-4138-9671-e52eec95e7f5" controls="controls" width="500" height="400">您的浏览器不支持播放该视频!</video>
24+
1825
- [x] **2024/12/14:** Added pickle and image driving, as well as region driving animation_region.
1926
- Please update the latest code. Windows users can directly double-click `update.bat` to update, but note that your local code will be overwritten.
2027
- Running `python run.py` now automatically saves the corresponding pickle to the same directory as the driving video, allowing for direct reuse.

README_ZH.md

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,7 @@
77
* 通过TensorRT实现在RTX 3090显卡上**实时**运行LivePortrait,速度达到 30+ FPS. 这个速度是实测渲染出一帧的速度,而不仅仅是模型的推理时间。
88
* 实现将LivePortrait模型转为Onnx模型,使用onnxruntime-gpu在RTX 3090上的推理速度约为 70ms/帧(~12 FPS),方便跨平台的部署。
99
* 无缝支持原生的gradio app, 速度快了好几倍,支持多张人脸、Animal模型。
10+
* 增加[JoyVASA](https://github.com/jdh-algo/JoyVASA)的支持,可以用音频驱动视频或图片。
1011

1112
**如果你觉得这个项目有用,帮我点个star吧✨✨**
1213

@@ -15,6 +16,12 @@
1516
<video src="https://github.com/user-attachments/assets/716d61a7-41ae-483a-874d-ea1bf345bd1a" controls="controls" width="500" height="300">您的浏览器不支持播放该视频!</video>
1617

1718
**日志**
19+
- [x] **2024/12/16:** 增加[JoyVASA](https://github.com/jdh-algo/JoyVASA)的支持,可以用音频驱动视频或图片。非常酷!
20+
- 更新代码,然后下载模型: `huggingface-cli download TencentGameMate/chinese-hubert-base --local-dir .\checkpoints\chinese-hubert-base`` huggingface-cli download jdh-algo/JoyVASA --local-dir ./checkpoints/JoyVASA`
21+
- 启动webui后根据以下教程使用即可,建议source 是视频的情况下只驱动嘴部
22+
23+
<video src="https://github.com/user-attachments/assets/42fb24be-0cde-4138-9671-e52eec95e7f5" controls="controls" width="500" height="400">您的浏览器不支持播放该视频!</video>
24+
1825
- [x] **2024/12/14:** 增加pickle和image驱动以及区域驱动`animation_region`
1926
- 请更新最新的代码,windows用户可以直接双击`update.bat`更新,但请注意本地的代码将会被覆盖。
2027
- `python run.py ` 现在运行 `driving video`会自动保存对应的pickle到跟`driving video`一样的目录,可以直接复用。

assets/examples/driving/a-01.wav

329 KB
Binary file not shown.

configs/onnx_infer.yaml

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -70,6 +70,11 @@ animal_models:
7070
- "./checkpoints/liveportrait_onnx/retinaface_det_static.onnx"
7171
- "./checkpoints/liveportrait_onnx/face_2dpose_106_static.onnx"
7272

73+
joyvasa_models:
74+
motion_model_path: "checkpoints/JoyVASA/motion_generator/motion_generator_hubert_chinese.pt"
75+
audio_model_path: "checkpoints/chinese-hubert-base"
76+
motion_template_path: "checkpoints/JoyVASA/motion_template/motion_template.pkl"
77+
7378
crop_params:
7479
src_dsize: 512
7580
src_scale: 2.3
@@ -102,5 +107,8 @@ infer_params:
102107
driving_multiplier: 1.0
103108
animation_region: "all"
104109

110+
cfg_mode: "incremental"
111+
cfg_scale: 1.2
112+
105113
source_max_dim: 1280 # the max dim of height and width of source image
106114
source_division: 2 # make sure the height and width of source image can be divided by this number

configs/onnx_mp_infer.yaml

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -64,6 +64,11 @@ animal_models:
6464
name: "MediaPipeFaceModel"
6565
predict_type: "mp"
6666

67+
joyvasa_models:
68+
motion_model_path: "checkpoints/JoyVASA/motion_generator/motion_generator_hubert_chinese.pt"
69+
audio_model_path: "checkpoints/chinese-hubert-base"
70+
motion_template_path: "checkpoints/JoyVASA/motion_template/motion_template.pkl"
71+
6772
crop_params:
6873
src_dsize: 512
6974
src_scale: 2.3
@@ -96,5 +101,8 @@ infer_params:
96101
driving_multiplier: 1.0
97102
animation_region: "all"
98103

104+
cfg_mode: "incremental"
105+
cfg_scale: 1.2
106+
99107
source_max_dim: 1280 # the max dim of height and width of source image
100108
source_division: 2 # make sure the height and width of source image can be divided by this number

configs/trt_infer.yaml

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -70,6 +70,11 @@ animal_models:
7070
- "./checkpoints/liveportrait_onnx/retinaface_det_static.trt"
7171
- "./checkpoints/liveportrait_onnx/face_2dpose_106_static.trt"
7272

73+
joyvasa_models:
74+
motion_model_path: "checkpoints/JoyVASA/motion_generator/motion_generator_hubert_chinese.pt"
75+
audio_model_path: "checkpoints/chinese-hubert-base"
76+
motion_template_path: "checkpoints/JoyVASA/motion_template/motion_template.pkl"
77+
7378
crop_params:
7479
src_dsize: 512
7580
src_scale: 2.3
@@ -102,5 +107,8 @@ infer_params:
102107
driving_multiplier: 1.0
103108
animation_region: "all"
104109

110+
cfg_mode: "incremental"
111+
cfg_scale: 1.2
112+
105113
source_max_dim: 1280 # the max dim of height and width of source image
106114
source_division: 2 # make sure the height and width of source image can be divided by this number

configs/trt_mp_infer.yaml

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -64,6 +64,11 @@ animal_models:
6464
name: "MediaPipeFaceModel"
6565
predict_type: "mp"
6666

67+
joyvasa_models:
68+
motion_model_path: "checkpoints/JoyVASA/motion_generator/motion_generator_hubert_chinese.pt"
69+
audio_model_path: "checkpoints/chinese-hubert-base"
70+
motion_template_path: "checkpoints/JoyVASA/motion_template/motion_template.pkl"
71+
6772
crop_params:
6873
src_dsize: 512
6974
src_scale: 2.3
@@ -96,5 +101,8 @@ infer_params:
96101
mask_crop_path: "./assets/mask_template.png"
97102
driving_multiplier: 1.0
98103

104+
cfg_mode: "incremental"
105+
cfg_scale: 1.2
106+
99107
source_max_dim: 1280 # the max dim of height and width of source image
100108
source_division: 2 # make sure the height and width of source image can be divided by this number

requirements.txt

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,4 +9,5 @@ scikit-image
99
insightface
1010
huggingface_hub[cli]
1111
mediapipe
12-
torchgeometry
12+
torchgeometry
13+
soundfile

requirements_macos.txt

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,4 +9,5 @@ scikit-image
99
insightface
1010
huggingface_hub[cli]
1111
mediapipe
12-
torchgeometry
12+
torchgeometry
13+
soundfile

requirements_win.txt

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,4 +8,5 @@ scikit-image
88
insightface
99
huggingface_hub[cli]
1010
mediapipe
11-
torchgeometry
11+
torchgeometry
12+
soundfile

0 commit comments

Comments
 (0)