Add Wan2.2 VACE - Fun #12324

linoytsaban · 2025-09-12T16:52:13Z

https://huggingface.co/alibaba-pai/Wan2.2-VACE-Fun-A14B

diffusers format: https://huggingface.co/linoyts/Wan2.2-VACE-Fun-14B-diffusers

Example with Reference(s)-to-Video:
Notes:

the boundary_ratio is set to 0.875 by default, I didn't experiment with the values
the videos attached were generated with Wan2.2 VACE using lightx2v LoRA for an 8-step inference
all other VACE use cases should also be applicable (see Wan VACE #11582 for more examples)

import torch
from diffusers import AutoencoderKLWan, WanVACEPipeline
from diffusers.schedulers.scheduling_unipc_multistep import UniPCMultistepScheduler
from diffusers.utils import export_to_video

model_id = "linoyts/Wan2.2-VACE-Fun-14B-diffusers"
vae = AutoencoderKLWan.from_pretrained(model_id, subfolder="vae", torch_dtype=torch.float32)
pipe = WanVACEPipeline.from_pretrained(model_id, vae=vae, torch_dtype=torch.bfloat16)
pipe.enable_model_cpu_offload()
pipe.to("cuda")


import torch
import PIL.Image
from diffusers import AutoencoderKLWan, WanVACEPipeline
from diffusers.schedulers.scheduling_unipc_multistep import UniPCMultistepScheduler
from diffusers.utils import export_to_video, load_image


def prepare_video_and_mask( height: int, width: int, num_frames: int, img: PIL.Image.Image = None):
    if img is not None: 
        img = img.resize((width, height))
        frames = [img]
        # Ideally, this should be 127.5 to match original code, but they perform computation on numpy arrays
        # whereas we are passing PIL images. If you choose to pass numpy arrays, you can set it to 127.5 to
        # match the original code.
        frames.extend([PIL.Image.new("RGB", (width, height), (128, 128, 128))] * (num_frames - 1))
        mask_black = PIL.Image.new("L", (width, height), 0)
        mask_white = PIL.Image.new("L", (width, height), 255)
        mask = [mask_black, *[mask_white] * (num_frames - 1)]
    else:
        frames = []
        # Ideally, this should be 127.5 to match original code, but they perform computation on numpy arrays
        # whereas we are passing PIL images. If you choose to pass numpy arrays, you can set it to 127.5 to
        # match the original code.
        frames.extend([PIL.Image.new("RGB", (width, height), (128, 128, 128))] * (num_frames))
        mask_white = PIL.Image.new("L", (width, height), 255)
        mask = [mask_white] * (num_frames)
    return frames, mask

prompt = "the robot is wearing the sunglasses and the hat that reads 'GPU poor' and playfully moves around"  
negative_prompt = "Bright tones, overexposed, static, blurred details, subtitles, typos, style, works, paintings, spelling mistakes, images, static, overall gray, worst quality, low quality, JPEG compression residue, ugly, incomplete, extra fingers, poorly drawn hands, poorly drawn faces, deformed, disfigured, misshapen limbs, fused fingers, still picture, messy background, three legs, many people in the background, walking backwards"

height = 480
width = 832
num_frames = 45
video, mask = prepare_video_and_mask(height, width, num_frames)
reference_images = [load_image("reachy.jpg"), load_image("sunglasses.jpg"),load_image("gpu_hat.png") ]

output = pipe(
    video=video,
    mask=mask,
    prompt=prompt,
    reference_images=reference_images,
    negative_prompt=negative_prompt,
    height=height,
    width=width,
    num_frames=num_frames,
    num_inference_steps=30,
    guidance_scale=5.0,
    generator=torch.Generator().manual_seed(42),
).frames[0]
export_to_video(output, "output_VACE_ref.mp4", fps=16)

to use with the fast inference LoRA:

pipe.load_lora_weights(
        "Kijai/WanVideo_comfy", 
        weight_name="Lightx2v/lightx2v_I2V_14B_480p_cfg_step_distill_rank128_bf16.safetensors", 
        adapter_name="lightx2v"
    )
kwargs_lora = {}
kwargs_lora["load_into_transformer_2"] = True
pipe.load_lora_weights(
    "Kijai/WanVideo_comfy", 
    weight_name="Lightx2v/lightx2v_I2V_14B_480p_cfg_step_distill_rank128_bf16.safetensors", 
    adapter_name="lightx2v_2", **kwargs_lora
)
pipe.set_adapters(["lightx2v", "lightx2v_2"], adapter_weights=[1., 1.])
pipe.fuse_lora(adapter_names=["lightx2v"], lora_scale=3., components=["transformer"]) 
pipe.fuse_lora(adapter_names=["lightx2v_2"], lora_scale=1., components=["transformer_2"])
pipe.unload_lora_weights()

output = pipe(
    video=video,
    mask=mask,
    prompt=prompt,
    reference_images=reference_images,
    negative_prompt=negative_prompt,
    height=height,
    width=width,
    num_frames=num_frames,
    num_inference_steps=8, # 6-10 is probably a good range
    guidance_scale=1.0, # advised to use 1.0
    generator=torch.Generator().manual_seed(42),
).frames[0]
export_to_video(output, "output_VACE_ref.mp4", fps=16)

output_video-6.mp4

output_video-8.mp4

HuggingFaceDocBuilderDev · 2025-09-12T16:59:44Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

yiyixuxu

thanks @linoytsaban !

linoytsaban · 2025-09-13T08:15:31Z

@bot /style

github-actions · 2025-09-13T08:16:00Z

Style bot fixed some files and pushed the changes.

sayakpaul · 2025-09-14T16:19:28Z

Could we check if the failing test is not being introduced in this PR?

J4BEZ · 2025-09-15T10:28:38Z

Very Awesome!
I really appreciate your hard work🙇‍♂️

linoytsaban · 2025-09-15T15:08:59Z

@sayakpaul @yiyixuxu I think current failing test is not related

sayakpaul · 2025-09-15T16:01:21Z

Indeed. The failure I pointed out has now gone 👍 Thanks for the work, Linoy!

bhack · 2025-09-16T05:52:25Z

@linoytsaban Does this support Masked V2V?

luke14free · 2025-09-16T09:56:54Z

@linoytsaban I noticed that using the lightx2v lora causes a lot of warnings about mismatching layers in console and also produces much worse results than yours. maybe it's the wrong lora link?

00Neil · 2025-09-17T04:31:59Z

Thank you for your hard work on this! I'm wondering if this model supports multi-GPU inference. The reason I ask is that I currently have 8 RTX 4090 graphics cards available, and using a single 4090 leads to an out-of-memory (OOM) error.

sayakpaul · 2025-09-17T04:46:11Z

@00Neil we don't yet support exotic forms of parallelism within the library. #11941 is in the works.

We have some guidance on how to reduce memory consumption and other speedup-related things we support from the library:

bhack · 2025-09-18T12:35:41Z

@sayakpaul @linoytsaban MV2V was just commited upstream:
aigc-apps/VideoX-Fun#328

linoytsaban added 3 commits September 12, 2025 11:16

support Wan2.2-VACE-Fun-A14B

d162ace

support Wan2.2-VACE-Fun-A14B

3c0c521

support Wan2.2-VACE-Fun-A14B

bfb8725

linoytsaban marked this pull request as ready for review September 12, 2025 17:07

yiyixuxu approved these changes Sep 12, 2025

View reviewed changes

github-actions bot and others added 2 commits September 13, 2025 08:16

Apply style fixes

417692e

Merge branch 'main' into vace_22

41d962b

linoytsaban added 2 commits September 15, 2025 16:51

test

deae16a

Merge remote-tracking branch 'origin/vace_22' into vace_22

cab5d39

sayakpaul merged commit b500140 into huggingface:main Sep 15, 2025
9 of 10 checks passed

bhack mentioned this pull request Sep 16, 2025

Support for WAN2.2 in VACE ali-vilab/VACE#113

Open

linoytsaban deleted the vace_22 branch September 16, 2025 12:35

Uh oh!

Add Wan2.2 VACE - Fun #12324

Add Wan2.2 VACE - Fun #12324

Uh oh!

Conversation

linoytsaban commented Sep 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

HuggingFaceDocBuilderDev commented Sep 12, 2025

Uh oh!

yiyixuxu left a comment

Choose a reason for hiding this comment

Uh oh!

linoytsaban commented Sep 13, 2025

Uh oh!

github-actions bot commented Sep 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sayakpaul commented Sep 14, 2025

Uh oh!

J4BEZ commented Sep 15, 2025

Uh oh!

linoytsaban commented Sep 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sayakpaul commented Sep 15, 2025

Uh oh!

Uh oh!

bhack commented Sep 16, 2025

Uh oh!

luke14free commented Sep 16, 2025

Uh oh!

00Neil commented Sep 17, 2025

Uh oh!

sayakpaul commented Sep 17, 2025

Uh oh!

bhack commented Sep 18, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

linoytsaban commented Sep 12, 2025 •

edited

Loading

github-actions bot commented Sep 13, 2025 •

edited

Loading

linoytsaban commented Sep 15, 2025 •

edited

Loading