Skip to content

Conversation

@akaitsuki-ii
Copy link
Contributor

No description provided.

@akaitsuki-ii akaitsuki-ii requested a review from a team January 6, 2026 07:50
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @akaitsuki-ii, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly expands the video generation capabilities by introducing the WanDMDPipeline, designed for both text-to-video and image-to-video tasks. It streamlines the configuration of noise schedulers by moving the update logic to the base pipeline and enhances compatibility with different LoRA model formats. The addition of new examples provides clear guidance on how to leverage these new features.

Highlights

  • New Video Generation Pipeline: Introduced WanDMDPipeline, a new pipeline for advanced video generation, inheriting from WanVideoPipeline and supporting both text-to-video and image-to-video functionalities.
  • Centralized Scheduler Configuration: The apply_scheduler_config method in BasePipeline was updated to directly call self.noise_scheduler.update_config, centralizing scheduler configuration logic for all pipelines.
  • Enhanced LoRA Compatibility: Added support for converting LoRA state dictionaries from the 'diffusers' format within the WanVideoLoRAConverter, improving flexibility for model loading.
  • New Example Scripts: Included two new example scripts (wan_dmd_image_to_video.py and wan_dmd_text_to_video.py) to demonstrate the usage of the new WanDMDPipeline for various video generation tasks.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new WanDMDPipeline for video generation, which appears to be based on the Denoising Motion Diffusion method. It also adds support for a new LoRA format from diffusers and includes some minor refactoring. My review focuses on the new WanDMDPipeline. I've pointed out the lack of docstrings for the new class and its main __call__ method, which is important for usability and maintainability. I also noted a hardcoded cfg_scale which limits the pipeline's flexibility, and an unhelpful comment. Additionally, I've identified code duplication in wan_video.py in the new LoRA conversion logic. Addressing these points will improve the quality and clarity of the new pipeline.

from diffsynth_engine.pipelines.wan_video import WanVideoPipeline


class WanDMDPipeline(WanVideoPipeline):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The class WanDMDPipeline lacks a docstring. It's important to document what the pipeline does, how it differs from its parent WanVideoPipeline, and what "DMD" stands for (presumably Denoising Motion Diffusion). A good docstring improves code readability and maintainability.

For example:

class WanDMDPipeline(WanVideoPipeline):
    """
    A pipeline for Denoising Motion Diffusion (DMD) for video generation, inheriting from WanVideoPipeline.

    This pipeline uses a specific denoising schedule controlled by `denoising_step_list`
    and does not use classifier-free guidance (negative prompts are ignored).
    """

Comment on lines +26 to +36
def __call__(
self,
prompt,
input_image: Image.Image | None = None,
seed=None,
height=480,
width=832,
num_frames=81,
denoising_step_list: List[int] = None,
progress_callback: Optional[Callable] = None, # def progress_callback(current, total, status)
):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The __call__ method is the main entry point for the pipeline but lacks a docstring. Please add one to explain the parameters, especially denoising_step_list, and what the method returns. It's also crucial to mention that this pipeline variant does not support negative prompts and uses a fixed cfg_scale of 1.0.

Here is a suggestion:

    @torch.no_grad()
    def __call__(
        self,
        prompt,
        input_image: Image.Image | None = None,
        seed=None,
        height=480,
        width=832,
        num_frames=81,
        denoising_step_list: List[int] = None,
        progress_callback: Optional[Callable] = None,  # def progress_callback(current, total, status)
    ):
        """
        Generates a video based on a prompt and an optional input image using the DMD method.

        Args:
            prompt (str): The text prompt to guide video generation.
            input_image (Image.Image | None, optional): An optional input image for image-to-video generation. Defaults to None.
            seed (int, optional): Random seed for noise generation. Defaults to None.
            height (int, optional): Height of the output video. Defaults to 480.
            width (int, optional): Width of the output video. Defaults to 832.
            num_frames (int, optional): Number of frames in the output video. Must be `4*k + 1`. Defaults to 81.
            denoising_step_list (List[int], optional): A list of timesteps for the denoising process, selected from a 1000-step schedule. Defaults to `[1000, 750, 500, 250]`.
            progress_callback (Optional[Callable], optional): A callback function for progress updates. Defaults to None.

        Returns:
            List[Image.Image]: A list of PIL Images representing the generated video frames.
        
        Note:
            This pipeline does not use classifier-free guidance; `cfg_scale` is fixed to 1.0 and negative prompts are ignored.
        """

self.load_models_to_device(["dit2"])
model = self.dit2

timestep = timestep * mask[:, :, :, ::2, ::2].flatten() # seq_len
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The comment # seq_len is not very descriptive. It doesn't explain why the timestep is being multiplied by the flattened mask. Please provide a more informative comment that clarifies the purpose of this operation, which appears to be for applying a spatially-varying denoising schedule.

Suggested change
timestep = timestep * mask[:, :, :, ::2, ::2].flatten() # seq_len
timestep = timestep * mask[:, :, :, ::2, ::2].flatten() # Apply mask for a spatially-varying denoising schedule

negative_prompt_emb=prompt_emb_nega,
image_clip_feature=image_clip_feature,
image_y=image_y,
cfg_scale=1.0,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The cfg_scale is hardcoded to 1.0. This disables classifier-free guidance and means negative prompts have no effect. While this might be intentional for this specific pipeline, it reduces flexibility. Consider making cfg_scale a parameter of the __call__ method with a default of 1.0, and document this behavior. This would make the pipeline more versatile and its behavior clearer to users. If it must be fixed, a comment explaining why would be beneficial.

Comment on lines +46 to +62
def _from_diffusers(self, state_dict):
dit_dict = {}
for key, param in state_dict.items():
if ".lora_down.weight" not in key:
continue

lora_args = {}
lora_args["up"] = state_dict[key.replace(".lora_down.weight", ".lora_up.weight")]
lora_args["down"] = param
lora_args["rank"] = lora_args["up"].shape[1]
if key.replace(".lora_down.weight", ".alpha") in state_dict:
lora_args["alpha"] = state_dict[key.replace(".lora_down.weight", ".alpha")]
else:
lora_args["alpha"] = lora_args["rank"]
key = key.replace("diffusion_model.", "").replace(".lora_down.weight", "")
dit_dict[key] = lora_args
return {"dit": dit_dict}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The new method _from_diffusers contains logic for parsing LoRA arguments (lines 52-59) that is identical to the logic in the existing _from_fun method (lines 88-95 in the full file). This code duplication can make maintenance harder. Consider refactoring this common logic into a private helper method to improve code reuse and readability.

@qzzz95 qzzz95 merged commit e0712a2 into main Jan 7, 2026
@qzzz95 qzzz95 deleted the wan_dmd branch January 7, 2026 07:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants