add WanDMDPipeline #219

akaitsuki-ii · 2026-01-06T07:50:44Z

No description provided.

update lora converter

gemini-code-assist · 2026-01-06T07:51:03Z

Summary of Changes

Hello @akaitsuki-ii, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly expands the video generation capabilities by introducing the WanDMDPipeline, designed for both text-to-video and image-to-video tasks. It streamlines the configuration of noise schedulers by moving the update logic to the base pipeline and enhances compatibility with different LoRA model formats. The addition of new examples provides clear guidance on how to leverage these new features.

Highlights

New Video Generation Pipeline: Introduced WanDMDPipeline, a new pipeline for advanced video generation, inheriting from WanVideoPipeline and supporting both text-to-video and image-to-video functionalities.
Centralized Scheduler Configuration: The apply_scheduler_config method in BasePipeline was updated to directly call self.noise_scheduler.update_config, centralizing scheduler configuration logic for all pipelines.
Enhanced LoRA Compatibility: Added support for converting LoRA state dictionaries from the 'diffusers' format within the WanVideoLoRAConverter, improving flexibility for model loading.
New Example Scripts: Included two new example scripts (wan_dmd_image_to_video.py and wan_dmd_text_to_video.py) to demonstrate the usage of the new WanDMDPipeline for various video generation tasks.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces a new WanDMDPipeline for video generation, which appears to be based on the Denoising Motion Diffusion method. It also adds support for a new LoRA format from diffusers and includes some minor refactoring. My review focuses on the new WanDMDPipeline. I've pointed out the lack of docstrings for the new class and its main __call__ method, which is important for usability and maintainability. I also noted a hardcoded cfg_scale which limits the pipeline's flexibility, and an unhelpful comment. Additionally, I've identified code duplication in wan_video.py in the new LoRA conversion logic. Addressing these points will improve the quality and clarity of the new pipeline.

gemini-code-assist · 2026-01-06T07:53:11Z

diffsynth_engine/pipelines/wan_dmd.py

+from diffsynth_engine.pipelines.wan_video import WanVideoPipeline
+
+
+class WanDMDPipeline(WanVideoPipeline):


The class WanDMDPipeline lacks a docstring. It's important to document what the pipeline does, how it differs from its parent WanVideoPipeline, and what "DMD" stands for (presumably Denoising Motion Diffusion). A good docstring improves code readability and maintainability.

For example:

class WanDMDPipeline(WanVideoPipeline): """ A pipeline for Denoising Motion Diffusion (DMD) for video generation, inheriting from WanVideoPipeline. This pipeline uses a specific denoising schedule controlled by `denoising_step_list` and does not use classifier-free guidance (negative prompts are ignored). """

gemini-code-assist · 2026-01-06T07:53:12Z

diffsynth_engine/pipelines/wan_dmd.py

+    def __call__(
+        self,
+        prompt,
+        input_image: Image.Image | None = None,
+        seed=None,
+        height=480,
+        width=832,
+        num_frames=81,
+        denoising_step_list: List[int] = None,
+        progress_callback: Optional[Callable] = None,  # def progress_callback(current, total, status)
+    ):


The __call__ method is the main entry point for the pipeline but lacks a docstring. Please add one to explain the parameters, especially denoising_step_list, and what the method returns. It's also crucial to mention that this pipeline variant does not support negative prompts and uses a fixed cfg_scale of 1.0.

Here is a suggestion:

@torch.no_grad() def __call__( self, prompt, input_image: Image.Image | None = None, seed=None, height=480, width=832, num_frames=81, denoising_step_list: List[int] = None, progress_callback: Optional[Callable] = None, # def progress_callback(current, total, status) ): """ Generates a video based on a prompt and an optional input image using the DMD method. Args: prompt (str): The text prompt to guide video generation. input_image (Image.Image | None, optional): An optional input image for image-to-video generation. Defaults to None. seed (int, optional): Random seed for noise generation. Defaults to None. height (int, optional): Height of the output video. Defaults to 480. width (int, optional): Width of the output video. Defaults to 832. num_frames (int, optional): Number of frames in the output video. Must be `4*k + 1`. Defaults to 81. denoising_step_list (List[int], optional): A list of timesteps for the denoising process, selected from a 1000-step schedule. Defaults to `[1000, 750, 500, 250]`. progress_callback (Optional[Callable], optional): A callback function for progress updates. Defaults to None. Returns: List[Image.Image]: A list of PIL Images representing the generated video frames. Note: This pipeline does not use classifier-free guidance; `cfg_scale` is fixed to 1.0 and negative prompts are ignored. """

gemini-code-assist · 2026-01-06T07:53:12Z

diffsynth_engine/pipelines/wan_dmd.py

+                self.load_models_to_device(["dit2"])
+                model = self.dit2
+
+            timestep = timestep * mask[:, :, :, ::2, ::2].flatten()  # seq_len


The comment # seq_len is not very descriptive. It doesn't explain why the timestep is being multiplied by the flattened mask. Please provide a more informative comment that clarifies the purpose of this operation, which appears to be for applying a spatially-varying denoising schedule.

Suggested change

timestep = timestep * mask[:, :, :, ::2, ::2].flatten() # seq_len

timestep = timestep * mask[:, :, :, ::2, ::2].flatten() # Apply mask for a spatially-varying denoising schedule

gemini-code-assist · 2026-01-06T07:53:12Z

diffsynth_engine/pipelines/wan_dmd.py

+                negative_prompt_emb=prompt_emb_nega,
+                image_clip_feature=image_clip_feature,
+                image_y=image_y,
+                cfg_scale=1.0,


The cfg_scale is hardcoded to 1.0. This disables classifier-free guidance and means negative prompts have no effect. While this might be intentional for this specific pipeline, it reduces flexibility. Consider making cfg_scale a parameter of the __call__ method with a default of 1.0, and document this behavior. This would make the pipeline more versatile and its behavior clearer to users. If it must be fixed, a comment explaining why would be beneficial.

gemini-code-assist · 2026-01-06T07:53:12Z

diffsynth_engine/pipelines/wan_video.py

+    def _from_diffusers(self, state_dict):
+        dit_dict = {}
+        for key, param in state_dict.items():
+            if ".lora_down.weight" not in key:
+                continue
+
+            lora_args = {}
+            lora_args["up"] = state_dict[key.replace(".lora_down.weight", ".lora_up.weight")]
+            lora_args["down"] = param
+            lora_args["rank"] = lora_args["up"].shape[1]
+            if key.replace(".lora_down.weight", ".alpha") in state_dict:
+                lora_args["alpha"] = state_dict[key.replace(".lora_down.weight", ".alpha")]
+            else:
+                lora_args["alpha"] = lora_args["rank"]
+            key = key.replace("diffusion_model.", "").replace(".lora_down.weight", "")
+            dit_dict[key] = lora_args
+        return {"dit": dit_dict}


The new method _from_diffusers contains logic for parsing LoRA arguments (lines 52-59) that is identical to the logic in the existing _from_fun method (lines 88-95 in the full file). This code duplication can make maintenance harder. Consider refactoring this common logic into a private helper method to improve code reuse and readability.

akaitsuki-ii added 2 commits December 15, 2025 11:18

add wan dmd pipeline

484cc01

update lora converter

add example

c19da92

akaitsuki-ii requested a review from a team January 6, 2026 07:50

gemini-code-assist bot reviewed Jan 6, 2026

View reviewed changes

qzzz95 approved these changes Jan 7, 2026

View reviewed changes

qzzz95 merged commit e0712a2 into main Jan 7, 2026

qzzz95 deleted the wan_dmd branch January 7, 2026 07:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

add WanDMDPipeline #219

add WanDMDPipeline #219

Uh oh!

akaitsuki-ii commented Jan 6, 2026

Uh oh!

gemini-code-assist bot commented Jan 6, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Jan 6, 2026

Uh oh!

gemini-code-assist bot Jan 6, 2026

Uh oh!

gemini-code-assist bot Jan 6, 2026

Uh oh!

gemini-code-assist bot Jan 6, 2026

Uh oh!

gemini-code-assist bot Jan 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		from diffsynth_engine.pipelines.wan_video import WanVideoPipeline


		class WanDMDPipeline(WanVideoPipeline):

	timestep = timestep * mask[:, :, :, ::2, ::2].flatten() # seq_len
	timestep = timestep * mask[:, :, :, ::2, ::2].flatten() # Apply mask for a spatially-varying denoising schedule

add WanDMDPipeline #219

add WanDMDPipeline #219

Uh oh!

Conversation

akaitsuki-ii commented Jan 6, 2026

Uh oh!

gemini-code-assist bot commented Jan 6, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Jan 6, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 6, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 6, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 6, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 6, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants