-
Notifications
You must be signed in to change notification settings - Fork 31
add WanDMDPipeline #219
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add WanDMDPipeline #219
Conversation
update lora converter
Summary of ChangesHello @akaitsuki-ii, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly expands the video generation capabilities by introducing the Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a new WanDMDPipeline for video generation, which appears to be based on the Denoising Motion Diffusion method. It also adds support for a new LoRA format from diffusers and includes some minor refactoring. My review focuses on the new WanDMDPipeline. I've pointed out the lack of docstrings for the new class and its main __call__ method, which is important for usability and maintainability. I also noted a hardcoded cfg_scale which limits the pipeline's flexibility, and an unhelpful comment. Additionally, I've identified code duplication in wan_video.py in the new LoRA conversion logic. Addressing these points will improve the quality and clarity of the new pipeline.
| from diffsynth_engine.pipelines.wan_video import WanVideoPipeline | ||
|
|
||
|
|
||
| class WanDMDPipeline(WanVideoPipeline): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The class WanDMDPipeline lacks a docstring. It's important to document what the pipeline does, how it differs from its parent WanVideoPipeline, and what "DMD" stands for (presumably Denoising Motion Diffusion). A good docstring improves code readability and maintainability.
For example:
class WanDMDPipeline(WanVideoPipeline):
"""
A pipeline for Denoising Motion Diffusion (DMD) for video generation, inheriting from WanVideoPipeline.
This pipeline uses a specific denoising schedule controlled by `denoising_step_list`
and does not use classifier-free guidance (negative prompts are ignored).
"""| def __call__( | ||
| self, | ||
| prompt, | ||
| input_image: Image.Image | None = None, | ||
| seed=None, | ||
| height=480, | ||
| width=832, | ||
| num_frames=81, | ||
| denoising_step_list: List[int] = None, | ||
| progress_callback: Optional[Callable] = None, # def progress_callback(current, total, status) | ||
| ): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The __call__ method is the main entry point for the pipeline but lacks a docstring. Please add one to explain the parameters, especially denoising_step_list, and what the method returns. It's also crucial to mention that this pipeline variant does not support negative prompts and uses a fixed cfg_scale of 1.0.
Here is a suggestion:
@torch.no_grad()
def __call__(
self,
prompt,
input_image: Image.Image | None = None,
seed=None,
height=480,
width=832,
num_frames=81,
denoising_step_list: List[int] = None,
progress_callback: Optional[Callable] = None, # def progress_callback(current, total, status)
):
"""
Generates a video based on a prompt and an optional input image using the DMD method.
Args:
prompt (str): The text prompt to guide video generation.
input_image (Image.Image | None, optional): An optional input image for image-to-video generation. Defaults to None.
seed (int, optional): Random seed for noise generation. Defaults to None.
height (int, optional): Height of the output video. Defaults to 480.
width (int, optional): Width of the output video. Defaults to 832.
num_frames (int, optional): Number of frames in the output video. Must be `4*k + 1`. Defaults to 81.
denoising_step_list (List[int], optional): A list of timesteps for the denoising process, selected from a 1000-step schedule. Defaults to `[1000, 750, 500, 250]`.
progress_callback (Optional[Callable], optional): A callback function for progress updates. Defaults to None.
Returns:
List[Image.Image]: A list of PIL Images representing the generated video frames.
Note:
This pipeline does not use classifier-free guidance; `cfg_scale` is fixed to 1.0 and negative prompts are ignored.
"""| self.load_models_to_device(["dit2"]) | ||
| model = self.dit2 | ||
|
|
||
| timestep = timestep * mask[:, :, :, ::2, ::2].flatten() # seq_len |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment # seq_len is not very descriptive. It doesn't explain why the timestep is being multiplied by the flattened mask. Please provide a more informative comment that clarifies the purpose of this operation, which appears to be for applying a spatially-varying denoising schedule.
| timestep = timestep * mask[:, :, :, ::2, ::2].flatten() # seq_len | |
| timestep = timestep * mask[:, :, :, ::2, ::2].flatten() # Apply mask for a spatially-varying denoising schedule |
| negative_prompt_emb=prompt_emb_nega, | ||
| image_clip_feature=image_clip_feature, | ||
| image_y=image_y, | ||
| cfg_scale=1.0, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The cfg_scale is hardcoded to 1.0. This disables classifier-free guidance and means negative prompts have no effect. While this might be intentional for this specific pipeline, it reduces flexibility. Consider making cfg_scale a parameter of the __call__ method with a default of 1.0, and document this behavior. This would make the pipeline more versatile and its behavior clearer to users. If it must be fixed, a comment explaining why would be beneficial.
| def _from_diffusers(self, state_dict): | ||
| dit_dict = {} | ||
| for key, param in state_dict.items(): | ||
| if ".lora_down.weight" not in key: | ||
| continue | ||
|
|
||
| lora_args = {} | ||
| lora_args["up"] = state_dict[key.replace(".lora_down.weight", ".lora_up.weight")] | ||
| lora_args["down"] = param | ||
| lora_args["rank"] = lora_args["up"].shape[1] | ||
| if key.replace(".lora_down.weight", ".alpha") in state_dict: | ||
| lora_args["alpha"] = state_dict[key.replace(".lora_down.weight", ".alpha")] | ||
| else: | ||
| lora_args["alpha"] = lora_args["rank"] | ||
| key = key.replace("diffusion_model.", "").replace(".lora_down.weight", "") | ||
| dit_dict[key] = lora_args | ||
| return {"dit": dit_dict} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new method _from_diffusers contains logic for parsing LoRA arguments (lines 52-59) that is identical to the logic in the existing _from_fun method (lines 88-95 in the full file). This code duplication can make maintenance harder. Consider refactoring this common logic into a private helper method to improve code reuse and readability.
No description provided.