[wip] Text to Image Pipeline for wan #12093

luke14free · 2025-08-07T11:09:27Z

What does this PR do?

Adds a new pipeline to support Wan low-noise transformer only image generation, which is getting a lot of traction (esp with new loras on civit).

Since this new pipeline is just a reshuffle of the components of t2v one, I am quite unsure if what I did by overwriting from_pretrained makes sense/is acceptable. I also had to add a new return type which again might not follow your style, feel free to ask for changes.

The general idea of the pipeline is that I am only using the transformer_2 (the low noise) which is being loaded as the main transformer. Then I perform the prediction of just one frame and return that. It works with ggufs as expected by passing a gguf transformer in the pipeline.

Who can review?

maybe @yiyixuxu ? not sure who is the right person

luke14free · 2025-08-07T11:21:18Z

[wip] I need to figure out why images are being generated as oversaturated/very red, think I am missing a video processing step after generation

asomoza · 2025-08-07T11:52:17Z

Hi @luke14free , what would be the difference between using the normal pipeline with a boundary_ratio = 1.0 and num_frames=1 and this?

luke14free · 2025-08-07T11:55:15Z

hey @asomoza it is pretty much the same, except you don't also load transformer (the high noise one). you can 100% make it work with existing pipelines. just thought it would have been cleaner to have a separate pipeline, but in case it's not i'll close

asomoza · 2025-08-07T12:02:01Z

@luke14free not completely sure (haven't tested it yet) but I think you can pass a transformer=None so it doesn't load. You can keep it open if you want and see if there's some interest in having a specific pipeline for this but mostly we don't create additional pipelines when you can do the same with the base one.

luke14free · 2025-08-07T12:12:55Z

indeed you can, I tested it myself and it's what I am doing here. I think you are right it makes little sense to have this, I'll close it anyways

luke14free · 2025-08-07T12:24:40Z

vae = AutoencoderKLWan.from_pretrained(
    "Wan-AI/Wan2.2-T2V-A14B-Diffusers", 
    subfolder="vae", 
    torch_dtype=torch.float32
)

transformer_low_noise = WanTransformer3DModel.from_pretrained(
    "Wan-AI/Wan2.2-T2V-A14B-Diffusers",
    subfolder="transformer_2",
    torch_dtype=torch.bfloat16,
)

self.pipe = WanPipeline.from_pretrained(
    "Wan-AI/Wan2.2-T2V-A14B-Diffusers",
    vae=vae,
    transformer=transformer_low_noise,
    boundary_ratio=None,
    transformer_2=None,
    torch_dtype=torch.bfloat16
)

output = pipe(
    prompt=input_data.prompt,
    negative_prompt=input_data.negative_prompt,
    height=height,
    width=width,
    num_frames=1,  # Required for text-to-image to create proper temporal dimension
    guidance_scale=input_data.guidance_scale,
    num_inference_steps=input_data.num_inference_steps,
).frames[0]

for anyone interested, this is how you do it with the existing pipelines :)

luke14free added 7 commits August 7, 2025 09:47

Add WanTextToImagePipeline with automatic transformer mapping

7e5b0ee

adding global import

ba933d6

adding global import

8c6f61c

fixing pipeline export

280ebbd

remove video reference

0ae724e

add specific return type for image pipeline

d27d118

removing video processor

7579c8e

luke14free changed the title ~~Text to Image Pipeline for wan~~ [wip] Text to Image Pipeline for wan Aug 7, 2025

add back the video processor

94fdc98

squeeze after video processing

2d36ee3

optional squeeze

0608c4f

luke14free closed this Aug 7, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[wip] Text to Image Pipeline for wan #12093

[wip] Text to Image Pipeline for wan #12093

Uh oh!

luke14free commented Aug 7, 2025 •

edited

Loading

Uh oh!

luke14free commented Aug 7, 2025

Uh oh!

asomoza commented Aug 7, 2025 •

edited

Loading

Uh oh!

luke14free commented Aug 7, 2025

Uh oh!

asomoza commented Aug 7, 2025

Uh oh!

luke14free commented Aug 7, 2025

Uh oh!

luke14free commented Aug 7, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[wip] Text to Image Pipeline for wan #12093

[wip] Text to Image Pipeline for wan #12093

Uh oh!

Conversation

luke14free commented Aug 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Who can review?

Uh oh!

luke14free commented Aug 7, 2025

Uh oh!

asomoza commented Aug 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

luke14free commented Aug 7, 2025

Uh oh!

asomoza commented Aug 7, 2025

Uh oh!

luke14free commented Aug 7, 2025

Uh oh!

luke14free commented Aug 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

luke14free commented Aug 7, 2025 •

edited

Loading

asomoza commented Aug 7, 2025 •

edited

Loading

luke14free commented Aug 7, 2025 •

edited

Loading