You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The deepfuze lipsync workflow on comfyui works perfectly on my mac chip m1, but sadly it is very slow because it writes many temp files.
Correct me if i'm wrong but I think that it is because it does not process frames in memory, instead it uses files.
So does anyone want to have a command line like this:
python lipsync.py --ref-video path/to/video.mp4 \ # the reference video
--audio path/to/input_audio.wav \ # the input audio file
--output path/to/output_lipsynced_and_enhanced.mp4 # the output lip sync video with very high quality
Key points:
Uses MPS (Metal Performance Shaders) on Apple M1 chips for GPU acceleration, the code use CoreMLExecutionProvider
Processes frames in memory to minimize disk I/O => This is critical I think
Parallelizes operations where possible
Uses ONNX runtime for optimized model inference
How It Works:
Input Processing: Loads video frames and audio into memory
Audio Analysis: Converts audio into mel spectrograms
Face Detection: Locates faces in each frame
Lip Synchronization: Adjusts lip movements based on audio, use wav2lip_gan.onnx
Face Enhancement: Improves visual quality of facial features, use gfpgan_1.4.onnx
Output Generation: Combines processed frames with given audio to make final output video
From step 5, it is easy to pipe to another process for upscaling....
I'm a newbie with python, don't know how to come up with the detail solution but this approach could be very practical.
Correct me if I'm wrong.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
-
Can someone have any idea on adapting components from [DeepFuze] (https://github.com/SamKhoze/ComfyUI-DeepFuze) and [VideoHelperSuite] (https://github.com/Kosinkadink/ComfyUI-VideoHelperSuite) to create a more efficient lip-syncing run as command line solution with reduced disk I/O.
The deepfuze lipsync workflow on comfyui works perfectly on my mac chip m1, but sadly it is very slow because it writes many temp files.
Correct me if i'm wrong but I think that it is because it does not process frames in memory, instead it uses files.
So does anyone want to have a command line like this:
Key points:
How It Works:
From step 5, it is easy to pipe to another process for upscaling....
I'm a newbie with python, don't know how to come up with the detail solution but this approach could be very practical.
Correct me if I'm wrong.
Thank you
Beta Was this translation helpful? Give feedback.
All reactions