the quality of video downgrade very much when calling pipe for 50 different prompts in wan 2.2 with lighting lora #12952

chaowenguo · 2025-11-22T08:21:34Z

chaowenguo
Nov 22, 2025

I need to call WanImageToVideoPipeline 50 times for 50 different prompts to generate a long video. In each call, I delete the pipe and reconstruct and use torch.cuda.empty_cache() and torch.cuda.synchronize() to clean the gpu, but not helpful. The video downgrade very much in 1:30. https://huggingface.co/datasets/exact-railcar/wan?image-viewer=video-0-9E4B78C5298EEFC10A3F4A949155FF8DC0B00866 that is the first 12 prompts work great, but the rest downgrade very much. I use the last frame of last prompt as image input to the next prompt.

import asyncio, aiohttp.web, torch, pathlib, diffusers, io, PIL, numpy, uvloop, concurrent, os, math, sys
from modelscope.hub.snapshot_download import snapshot_download
from modelscope.hub.file_download import model_file_download

model_dir=snapshot_download('Wan-AI/Wan2.2-I2V-A14B-Diffusers')
diffusers.WanImageToVideoPipeline.from_pretrained(model_dir, vae=diffusers.AutoencoderKLWan.from_pretrained(model_dir, subfolder='vae', torch_dtype=torch.float32), torch_dtype=torch.bfloat16,
                                                  quantization_config=diffusers.PipelineQuantizationConfig(quant_backend='bitsandbytes_4bit', quant_kwargs={'load_in_4bit':True, 'bnb_4bit_quant_type':'nf4', 'bnb_4bit_compute_dtype':torch.bfloat16}, components_to_quantize=['transformer', 'transformer_2'])).save_pretrained('quntization')

def generate(image, prompt, negative):
    pipe = diffusers.WanImageToVideoPipeline.from_pretrained('quntization', vae=diffusers.AutoencoderKLWan.from_pretrained('quntization', subfolder='vae', torch_dtype=torch.float32).to('cuda'), torch_dtype=torch.bfloat16)
    pipe.load_lora_weights(model_file_download('lightx2v/Wan2.2-Lightning', 'Wan2.2-I2V-A14B-4steps-lora-rank64-Seko-V1/high_noise_model.safetensors'), adapter_name='lightning')
    pipe.load_lora_weights(model_file_download('lightx2v/Wan2.2-Lightning', 'Wan2.2-I2V-A14B-4steps-lora-rank64-Seko-V1/low_noise_model.safetensors'), adapter_name='lightning_2', load_into_transformer_2=True)
    pipe.set_adapters(['lightning', 'lightning_2'], adapter_weights=[1, 1])
    pipe.enable_model_cpu_offload()
    pipe.vae.enable_slicing()
    pipe.vae.enable_tiling()
    return numpy.clip(pipe(image=image, prompt=prompt, negative_prompt=negative, height=image.size[1], width=image.size[0], num_frames=81, num_inference_steps=4, guidance_scale=1).frames[0] * 255, 0, 255).astype(numpy.uint8)

for prompt in prompts:
    sys.modules.get(__name__).generate(image, prompt, '')
    torch.cuda.empty_cache()
    torch.cuda.synchronize()

@yiyixuxu @DN6

Aznix07 · 2025-11-23T06:07:45Z

Aznix07
Nov 23, 2025

Hi @chaowenguo0,

I analyzed your code and found the issue. Looking at this part:

def generate(image, prompt, negative):
    pipe = diffusers.WanImageToVideoPipeline.from_pretrained(...)       # new pipeline created here
    pipe.load_lora_weights(...)
    pipe.load_lora_weights(...)
    # pipe setup
    return frames

for prompt in prompts:
    sys.modules.get(__name__).generate(image, prompt, '')       # called this 50 times..
    torch.cuda.empty_cache()
    torch.cuda.synchronize()

Problem

Your generate() function creates a new pipeline instance every time it's called. Since you are calling it 50 times (once per prompt), you're:

loading the model from disk 50 times
loading LoRA weights 100 times (2 LoRAs - 50 times)
allocating and deallocating GPU memory 50 times

Solution

Move the pipeline init. outside the loop and reuse the same instance.

0 replies

chaowenguo · 2025-11-23T09:39:06Z

chaowenguo
Nov 23, 2025
Author

i already try to just create the pipe once and call it 50 times. nothing help. now i try to use qwen-image to create key frame. and use flf2v to connect key frame. But where are the wan 2.2 flf2v？

…

On Sat, Nov 22, 2025, 10:08 PM Aznix07 ***@***.***> wrote: *Aznix07* left a comment (huggingface/diffusers#12699) <#12699 (comment)> Hi @chaowenguo0 <https://github.com/chaowenguo0>, I analyzed your code and found the issue. Looking at this part: def generate(image, prompt, negative): pipe = diffusers.WanImageToVideoPipeline.from_pretrained(...) # new pipeline created here pipe.load_lora_weights(...) pipe.load_lora_weights(...) # pipe setup return frames for prompt in prompts: sys.modules.get(__name__).generate(image, prompt, '') # called this 50 times.. torch.cuda.empty_cache() torch.cuda.synchronize() Problem Your generate() function creates a new pipeline instance every time it's called. Since you are calling it 50 times (once per prompt), you're: - loading the model from disk 50 times - loading LoRA weights 100 times (2 LoRAs - 50 times) - allocating and deallocating GPU memory 50 times Solution Move the pipeline init. outside the loop and reuse the same instance. — Reply to this email directly, view it on GitHub <#12699 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/BWUG5QACTT76FU2JC33DJKL36FFMPAVCNFSM6AAAAACM4FDDQKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTKNRXGUZDSNZYHA> . You are receiving this because you were mentioned.Message ID: ***@***.***>

0 replies

Aznix07 · 2025-11-24T10:57:47Z

Aznix07
Nov 24, 2025

Hi @chaowenguo0,

You're right that recreating the pipe doesnt help, the issue isnt memory, it's error accumulation.

When you chain 50 I2V generations, small artifacts compound with each iteration. By generating 12+, the quality degrades significantly. This is the fundamental limitations of chaining I2V.

FLF2V Approach

Using keyframes + FLF2V is the correct solution.

Wan 2.2 does have FlF2V available:

ComfyUI workflow: https://www.runcomfy.com/comfyui-workflows/wan-2-2-flf2v-first-last-frame-video-generation
Check the Wan-AI model hub for the FLF2V pipeline/weights

How to use FLF2V

Instead of chaining 50 I2V calls:

# Generate keyframes first (using text-to-image or selective I2V)
keyframes = []
for i in range(0, len(prompts), 10):  # Every 10th prompt
    # Generate keyframe (use Qwen-VL as you mentioned, or SDXL)
    keyframe = generate_keyframe(prompts[i])
    keyframes.append(keyframe)

# Use FLF2V to connect keyframes
pipe_flf2v = diffusers.WanFirstLastFrameToVideoPipeline.from_pretrained(...)  # Check exact class name

final_video = []
for i in range(len(keyframes)-1):
    segment = pipe_flf2v(
        first_frame=keyframes[i],
        last_frame=keyframes[i+1],
        prompt=prompts[i],  # Or combined prompts for that segment
        # rest of the code
    )
    final_video.append(segment)

This will maintain quality across all 50 prompts without degradation.

Check the ComfyUI workflow above to see the exact parameters and model loading. You may need to adapt it to Diffusers API.

0 replies

2026-01-09T15:05:02Z

github-actions[bot]
bot Jan 9, 2026

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

the quality of video downgrade very much when calling pipe for 50 different prompts in wan 2.2 with lighting lora #12952

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 4 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

the quality of video downgrade very much when calling pipe for 50 different prompts in wan 2.2 with lighting lora #12952

Uh oh!

Uh oh!

chaowenguo Nov 22, 2025

Replies: 4 comments

Uh oh!

Aznix07 Nov 23, 2025

Problem

Solution

Uh oh!

chaowenguo Nov 23, 2025 Author

Uh oh!

Aznix07 Nov 24, 2025

FLF2V Approach

How to use FLF2V

Uh oh!

github-actions[bot] bot Jan 9, 2026

chaowenguo
Nov 22, 2025

Aznix07
Nov 23, 2025

chaowenguo
Nov 23, 2025
Author

Aznix07
Nov 24, 2025

github-actions[bot]
bot Jan 9, 2026