[WIP]Add Wan2.2 Animate Pipeline (Continuation of #12442 by tolgacangoz) #12526

dg845 · 2025-10-21T23:02:11Z

What does this PR do?

This PR is a continuation of #12442 by @tolgacangoz. It adds a pipeline for the Wan2.2-Animate-14B model (project page, paper, code, weights), a SOTA character animation and replacement video model.

Fixes #12441 (the original requesting issue).

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@yiyixuxu
@sayakpaul
@tolgacangoz

- Introduced WanAnimateTransformer3DModel and WanAnimatePipeline. - Updated get_transformer_config to handle the new model type. - Modified convert_transformer to instantiate the correct transformer based on model type. - Adjusted main execution logic to accommodate the new Animate model type.

…l guidance

…prove error handling for undefined parameters

…work for character animation and replacement - Added Wan 2.2 Animate 14B model to the documentation. - Introduced the Wan-Animate framework, detailing its capabilities for character animation and replacement. - Included example usage for the WanAnimatePipeline with preprocessing steps and guidance on input requirements.

- Introduced `WanAnimateGGUFSingleFileTests` to validate functionality. - Added dummy input generation for testing model behavior.

- Introduced `EncoderApp`, `Encoder`, `Direction`, `Synthesis`, and `Generator` classes for enhanced motion and appearance encoding. - Added `FaceEncoder`, `FaceBlock`, and `FaceAdapter` classes to integrate facial motion processing. - Updated `WanTimeTextImageMotionEmbedding` to utilize the new `Generator` for motion embedding. - Enhanced `WanAnimateTransformer3DModel` with additional face adapter and pose patch embedding for improved model functionality.

- Introduced `pad_video` method to handle padding of video frames to a target length. - Updated video processing logic to utilize the new padding method for `pose_video`, `face_video`, and conditionally for `background_video` and `mask_video`. - Ensured compatibility with existing preprocessing steps for video inputs.

…roved video processing - Added optional parameters: `conditioning_pixel_values`, `refer_pixel_values`, `refer_t_pixel_values`, `bg_pixel_values`, and `mask_pixel_values` to the `prepare_latents` method. - Updated the logic in the denoising loop to accommodate the new parameters, enhancing the flexibility and functionality of the pipeline.

…eneration - Updated the calculation of `num_latent_frames` and adjusted the shape of latent tensors to accommodate changes in frame processing. - Enhanced the `get_i2v_mask` method for better mask generation, ensuring compatibility with new tensor shapes. - Improved handling of pixel values and device management for better performance and clarity in the video processing pipeline.

…and mask generation - Consolidated the handling of `pose_latents_no_ref` to improve clarity and efficiency in latent tensor calculations. - Updated the `get_i2v_mask` method to accept batch size and adjusted tensor shapes accordingly for better compatibility. - Enhanced the logic for mask pixel values in the replacement mode, ensuring consistent processing across different scenarios.

…nced processing - Introduced custom QR decomposition and fused leaky ReLU functions for improved tensor operations. - Implemented upsampling and downsampling functions with native support for better performance. - Added new classes: `FusedLeakyReLU`, `Blur`, `ScaledLeakyReLU`, `EqualConv2d`, `EqualLinear`, and `RMSNorm` for advanced neural network layers. - Refactored `EncoderApp`, `Generator`, and `FaceBlock` classes to integrate new functionalities and improve modularity. - Updated attention mechanism to utilize `dispatch_attention_fn` for enhanced flexibility in processing.

…annotations - Removed extra-abstractioned-functions such as `custom_qr`, `fused_leaky_relu`, and `make_kernel` to streamline the codebase. - Updated class constructors and method signatures to include type hints for better clarity and type checking. - Refactored the `FusedLeakyReLU`, `Blur`, `EqualConv2d`, and `EqualLinear` classes to enhance readability and maintainability. - Simplified the `Generator` and `Encoder` classes by removing redundant parameters and improving initialization logic.

- Added new key mappings for the Animate model's transformer architecture. - Implemented weight conversion functions for `EqualLinear` and `EqualConv2d` to standard layers. - Updated `WanAnimatePipeline` to handle reference image encoding and conditioning properly. - Refactored the `WanAnimateTransformer3DModel` to include a new `motion_encoder_dim` parameter for improved flexibility.

…proved model integration - Updated key mappings in `convert_wan_to_diffusers.py` for the Animate model's transformer architecture. - Implemented weight scaling for `EqualLinear` and `EqualConv2d` layers. - Refactored `WanAnimateMotionEmbedder` and `WanAnimateFaceBlock` for better parameter handling. - Modified `WanAnimatePipeline` to support new reference image encoding and conditioning logic. - Switched scheduler to `UniPCMultistepScheduler` for improved performance.

… conditioning logic - Added parameters `y_ref` and `calculate_noise_latents_only` to improve flexibility in processing. - Streamlined the encoding of reference images and conditioning videos. - Adjusted tensor concatenation and masking logic for better clarity. - Updated return values to accommodate new processing paths based on `mask_reft_len` and `calculate_noise_latents_only` flags.

- Added checks to skip unnecessary transformations for specific keys, including blur kernels and biases. - Implemented renaming of sequential indices to named components for better clarity in weight handling. - Introduced scaling for `EqualLinear` and `EqualConv2d` weights, ensuring compatibility with the Animate model's architecture. - Added comments and TODOs for future verification and simplification of the conversion process.

…es for animation and replacement modes, and improving test coverage for various scenarios.

Updated contribution attribution for the Wan-Animate model.

- Reverted the order of face_embedder norms to their original configuration for improved clarity. - Introduced a placeholder for `face_encoder.norm2` to maintain compatibility with the existing architecture.

…st hidden state

…nference_steps, and guidance_scale

… in WanAnimatePipeline

… simplify expected output validation

HuggingFaceDocBuilderDev · 2025-10-21T23:10:51Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

tolgacangoz · 2025-10-22T06:06:41Z

src/diffusers/models/transformers/transformer_wan_animate.py

+        input = self.conv2d(input)
+
+        if self.activate:
+            input = self.act(input + self.bias_leaky_relu) * 2**0.5


Hi. If fused leaky ReLU is supposed to be used that way, then this line should also be updated accordingly. I was trying to reduce the number of abstractions by removing the FusedLeakyReLU class.

Also, congrats for joining the diffusers team ㊗️!

a-free-a · 2025-10-22T09:36:54Z

@tolgacangoz @dg845 Congratulations on releasing Wan2.2-Animate-14B-Diffusers!
I attempted to use it for video generation but encountered a runtime failure. Could you please confirm whether additional code changes to the preprocessing stage are required to complete the full workflow?

…caling

tolgacangoz and others added 30 commits October 6, 2025 21:46

template1

3529a0a

temp2

4f2ee5e

up

778fb54

up

d77b6ba

fix-copies

2fc6ac2

style

6182d44

Refactor WanAnimate model components

8c9fd89

Enhance WanAnimatePipeline with new parameters for mode and tempora…

d01e941

…l guidance

Update WanAnimatePipeline to require additional video inputs and im…

7af953b

…prove error handling for undefined parameters

Add unit test template for WanAnimatePipeline functionality

05a01c6

Add unit tests for WanAnimateTransformer3DModel in GGUF format

22b83ce

- Introduced `WanAnimateGGUFSingleFileTests` to validate functionality. - Added dummy input generation for testing model behavior.

style

7fb6732

Update WanAnimatePipeline

624a314

style

fc0edb5

Refactor test for WanAnimatePipeline to include new input structure

eb7eedd

from einops to torch

8968b42

Merge branch 'main' into integrations/wan2.2-animate

dce83a8

style

802896e

up

84768f6

style

b8337c6

Merge branch 'main' into integrations/wan2.2-animate

4e6651b

tolgacangoz and others added 18 commits October 17, 2025 14:45

refactor transformer

6a8662d

Merge branch 'main' into integrations/wan2.2-animate

050b313

simplify

fe02c25

Enhance documentation and tests for WanAnimatePipeline, adding exampl…

7092a28

…es for animation and replacement modes, and improving test coverage for various scenarios.

Merge branch 'main' into integrations/wan2.2-animate

5d01574

Clarify contribution of M. Tolga Cangöz

9c0a65d

Updated contribution attribution for the Wan-Animate model.

Update face_embedder key mappings in convert_wan_to_diffusers.py

28ac516

- Reverted the order of face_embedder norms to their original configuration for improved clarity. - Introduced a placeholder for `face_encoder.norm2` to maintain compatibility with the existing architecture.

up

b71d3a9

up

5818d71

Fix image embedding extraction in WanAnimatePipeline to return the la…

bfda25d

…st hidden state

Adjust default parameters in WanAnimatePipeline for num_frames, num_i…

0ac259c

…nference_steps, and guidance_scale

Update example docstring parameters for num_frames and guidance_scale…

e2e95ed

… in WanAnimatePipeline

Refactor tests in WanAnimatePipeline: remove redundant assertions and…

7146bb0

… simplify expected output validation

Add fused relu for Wan animate activations

6ffdb99

tolgacangoz reviewed Oct 22, 2025

View reviewed changes

yiyixuxu changed the title ~~Add Wan2.2 Animate Pipeline (Continuation of #12442 by tolgacangoz)~~ [WIP]Add Wan2.2 Animate Pipeline (Continuation of #12442 by tolgacangoz) Oct 22, 2025

dg845 marked this pull request as draft October 22, 2025 22:01

dg845 added 7 commits October 23, 2025 02:50

Refactor motion encoder to use custom Conv2d and Linear with weight s…

4556730

…caling

Refactor WanAnimateFaceEncoder to make it easier to understand

c3e69fc

Refactor Wan Animate transformer to reuse WanTimeTextImageEmbedding

7f4dde9

Refactor Wan Animate face blocks to use an attention processor

4f204ec

Refactor Wan Animate transformer, taking into account previous changes

57e9ea3

Remove unused imports in transformer_wan_animate

091b7ce

Merge branch 'main' into add-wan2.2-animate-pipeline

8216aef

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[WIP]Add Wan2.2 Animate Pipeline (Continuation of #12442 by tolgacangoz) #12526

[WIP]Add Wan2.2 Animate Pipeline (Continuation of #12442 by tolgacangoz) #12526

dg845 commented Oct 21, 2025

Uh oh!

HuggingFaceDocBuilderDev commented Oct 21, 2025

Uh oh!

tolgacangoz Oct 22, 2025

Uh oh!

tolgacangoz Oct 22, 2025

Uh oh!

a-free-a commented Oct 22, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[WIP]Add Wan2.2 Animate Pipeline (Continuation of #12442 by tolgacangoz) #12526

Are you sure you want to change the base?

[WIP]Add Wan2.2 Animate Pipeline (Continuation of #12442 by tolgacangoz) #12526

Conversation

dg845 commented Oct 21, 2025

What does this PR do?

Who can review?

Uh oh!

HuggingFaceDocBuilderDev commented Oct 21, 2025

Uh oh!

tolgacangoz Oct 22, 2025

Choose a reason for hiding this comment

Uh oh!

tolgacangoz Oct 22, 2025

Choose a reason for hiding this comment

Uh oh!

a-free-a commented Oct 22, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants