the quality of video downgrade very much when calling pipe for 50 different prompts in wan 2.2 with lighting lora #12952
Replies: 4 comments
-
|
Hi @chaowenguo0, I analyzed your code and found the issue. Looking at this part: def generate(image, prompt, negative):
pipe = diffusers.WanImageToVideoPipeline.from_pretrained(...) # new pipeline created here
pipe.load_lora_weights(...)
pipe.load_lora_weights(...)
# pipe setup
return frames
for prompt in prompts:
sys.modules.get(__name__).generate(image, prompt, '') # called this 50 times..
torch.cuda.empty_cache()
torch.cuda.synchronize()ProblemYour
SolutionMove the pipeline init. outside the loop and reuse the same instance. |
Beta Was this translation helpful? Give feedback.
-
|
i already try to just create the pipe once and call it 50 times. nothing
help. now i try to use qwen-image to create key frame. and use flf2v to
connect key frame. But where are the wan 2.2 flf2v?
…On Sat, Nov 22, 2025, 10:08 PM Aznix07 ***@***.***> wrote:
*Aznix07* left a comment (huggingface/diffusers#12699)
<#12699 (comment)>
Hi @chaowenguo0 <https://github.com/chaowenguo0>,
I analyzed your code and found the issue. Looking at this part:
def generate(image, prompt, negative):
pipe = diffusers.WanImageToVideoPipeline.from_pretrained(...) # new pipeline created here
pipe.load_lora_weights(...)
pipe.load_lora_weights(...)
# pipe setup
return frames
for prompt in prompts:
sys.modules.get(__name__).generate(image, prompt, '') # called this 50 times..
torch.cuda.empty_cache()
torch.cuda.synchronize()
Problem
Your generate() function creates a new pipeline instance every time it's
called. Since you are calling it 50 times (once per prompt), you're:
- loading the model from disk 50 times
- loading LoRA weights 100 times (2 LoRAs - 50 times)
- allocating and deallocating GPU memory 50 times
Solution
Move the pipeline init. outside the loop and reuse the same instance.
—
Reply to this email directly, view it on GitHub
<#12699 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/BWUG5QACTT76FU2JC33DJKL36FFMPAVCNFSM6AAAAACM4FDDQKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTKNRXGUZDSNZYHA>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Beta Was this translation helpful? Give feedback.
-
|
Hi @chaowenguo0, You're right that recreating the pipe doesnt help, the issue isnt memory, it's error accumulation. When you chain 50 I2V generations, small artifacts compound with each iteration. By generating 12+, the quality degrades significantly. This is the fundamental limitations of chaining I2V. FLF2V ApproachUsing keyframes + FLF2V is the correct solution. Wan 2.2 does have FlF2V available:
How to use FLF2VInstead of chaining 50 I2V calls: # Generate keyframes first (using text-to-image or selective I2V)
keyframes = []
for i in range(0, len(prompts), 10): # Every 10th prompt
# Generate keyframe (use Qwen-VL as you mentioned, or SDXL)
keyframe = generate_keyframe(prompts[i])
keyframes.append(keyframe)
# Use FLF2V to connect keyframes
pipe_flf2v = diffusers.WanFirstLastFrameToVideoPipeline.from_pretrained(...) # Check exact class name
final_video = []
for i in range(len(keyframes)-1):
segment = pipe_flf2v(
first_frame=keyframes[i],
last_frame=keyframes[i+1],
prompt=prompts[i], # Or combined prompts for that segment
# rest of the code
)
final_video.append(segment)This will maintain quality across all 50 prompts without degradation. Check the ComfyUI workflow above to see the exact parameters and model loading. You may need to adapt it to Diffusers API. |
Beta Was this translation helpful? Give feedback.
-
|
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Please note that issues that do not follow the contributing guidelines are likely to be ignored. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
I need to call WanImageToVideoPipeline 50 times for 50 different prompts to generate a long video. In each call, I delete the pipe and reconstruct and use torch.cuda.empty_cache() and torch.cuda.synchronize() to clean the gpu, but not helpful. The video downgrade very much in 1:30. https://huggingface.co/datasets/exact-railcar/wan?image-viewer=video-0-9E4B78C5298EEFC10A3F4A949155FF8DC0B00866 that is the first 12 prompts work great, but the rest downgrade very much. I use the last frame of last prompt as image input to the next prompt.
@yiyixuxu @DN6
Beta Was this translation helpful? Give feedback.
All reactions