[Alibaba Wan Team] continue on #10921 Wan2.1 #10922

yiyixuxu · 2025-02-27T23:12:50Z

Link: https://code.alibaba-inc.com/open_wanx2/diffusers/codereview/20607813

yiyixuxu · 2025-02-28T07:40:20Z

this for now

import torch
from transformers import AutoTokenizer, UMT5EncoderModel
from diffusers import AutoencoderKLWan, WanPipeline, WanTransformer3DModel, FlowMatchEulerDiscreteScheduler
from diffusers.schedulers.scheduling_unipc_multistep import UniPCMultistepScheduler
from diffusers.utils import export_to_video
from torchvision import transforms
import os
import cv2
import numpy as np


from pathlib import Path
import json
from safetensors.torch import safe_open

device = "cuda"
seed = 0

vae_repo = "/raid/yiyi/wan2.1_vae_diffusers"
vae = AutoencoderKLWan.from_pretrained(vae_repo)
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
vae = vae.to(device)

# TODO: impl FlowDPMSolverMultistepScheduler
scheduler = UniPCMultistepScheduler(prediction_type='flow_prediction', use_flow_sigmas=True, num_train_timesteps=1000, flow_shift=1.0)

text_encoder = UMT5EncoderModel.from_pretrained("google/umt5-xxl", torch_dtype=torch.bfloat16)
tokenizer = AutoTokenizer.from_pretrained("google/umt5-xxl")

# 14B
# transformer = WanTransformer3DModel.from_pretrained('StevenZhang/Wan2.1-T2V-14B-Diff', torch_dtype=torch.bfloat16)
transformer = WanTransformer3DModel.from_pretrained('StevenZhang/Wan2.1-T2V-1.3B-Diff', torch_dtype=torch.bfloat16)

components = {
    "transformer": transformer,
    "vae": vae,
    "scheduler": scheduler,
    "text_encoder": text_encoder,
    "tokenizer": tokenizer,
}
pipe = WanPipeline(**components)

pipe.to(device)

negative_prompt = '色调艳丽，过曝，静态，细节模糊不清，字幕，风格，作品，画作，画面，静止，整体发灰，最差质量，低质量，JPEG压缩残留，丑陋的，残缺的，多余的手指，画得不好的手部，画得不好的脸部，畸形的，毁容的，形态畸形的肢体，手指融合，静止不动的画面，杂乱的背景，三条腿，背景人很多，倒着走'

generator = torch.Generator(device=device).manual_seed(seed)
inputs = {
    "prompt": "两只拟人化的猫咪身穿舒适的拳击装备，戴着鲜艳的手套，在聚光灯照射的舞台上激烈对战",
    "negative_prompt": negative_prompt, # TODO
    "generator": generator,
    "num_inference_steps": 50,
    "flow_shift": 3.0,
    "guidance_scale": 5.0,
    "height": 480,
    "width": 832,
    "num_frames": 81,
    "max_sequence_length": 512,
    "output_type": "np"
}

video = pipe(**inputs).frames[0]

print(video.shape)

export_to_video(video, "output.mp4", fps=16)

* update * update * refactor rope * refactor pipeline * make fix-copies * add transformer test * update * update

HuggingFaceDocBuilderDev · 2025-02-28T13:45:03Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

docs/source/en/api/models/autoencoder_kl_wan.md

docs/source/en/api/models/wan_transformer_3d.md

src/diffusers/pipelines/wan/pipeline_wan.py

src/diffusers/pipelines/wan/pipeline_wan_i2v.py

yiyixuxu · 2025-03-02T02:57:24Z

@bot /style

scripts/convert_wan_to_diffusers.py

X-niper · 2025-03-05T06:32:24Z

In WanVAE encoder, the codes suggest offset and multiply the log variance directly using the same factor as mu.
However, the gaussian standard deviation should use the same factor.

Is the implementation in main branch correct? @a-r-r-o-w
mu = (mu - scale[0].view(1, self.z_dim, 1, 1, 1)) * scale[1].view(1, self.z_dim, 1, 1, 1)
# the original logvar = (logvar - scale[0].view(1, self.z_dim, 1, 1, 1)) * scale[1].view(1, self.z_dim, 1, 1, 1)
logvar = logvar + 2 * torch.log(scale[1].view(1, self.z_dim, 1, 1, 1)) # the proposal

yitongh and others added 13 commits February 18, 2025 15:45

Add wanx pipeline, model and example

d1e75ab

wanx_merged_v1

ca5c724

change WanX into Wan

5dd22a9

fix i2v fp32 oom error

fea59a6

Link: https://code.alibaba-inc.com/open_wanx2/diffusers/codereview/20607813

support t2v load fp32 ckpt

768995f

add example

128f1af

final merge v1

8220482

Update autoencoder_kl_wan.py

9cad60e

up

2e12d1b

update middle, test up_block

9c19bda

up up

2da1feb

one less nn.sequential

b7a3900

up more

5f2518a

yiyixuxu and others added 4 commits February 28, 2025 09:59

up

89f1c6a

more

9e8ef93

[refactor] [wip] Wan transformer/pipeline (#10926)

2e1924a

* update * update * refactor rope * refactor pipeline * make fix-copies * add transformer test * update * update

make style

425a3b0

a-r-r-o-w and others added 11 commits February 28, 2025 14:54

update tests

157a24d

tests

a9768d2

conversion script

d9f615d

conversion script

0122271

update

22ea488

docs

4c88dbb

Merge branch 'main' into yiyi-refactor-wan-vae

2a4cfb1

remove unused code

72e6eb2

fix _toctree.yml

094db80

update dtype

81c5dca

fix test

3eb31f9