-
Notifications
You must be signed in to change notification settings - Fork 986
Open
Description
Hi,
I am grateful for this amazing repository for video conditioned Wan2.2s. I am trying to use Wan2.2-Fun-A14B-Control and give a depth video for generation. I tried on a single GPU (A6000 48G ram) and it leads to OOM, so I turned into using four A6000s. I encountered two timeouts:
- When downloading the weights: This is solved by manually download the weights and assign the downloaded path in
ModelConfig - When running pipeline: The video encoding works normally (it shows a progress bar VAE_encoding and run 24 steps). But the actual denoising step stuck at the first step (a progress bar with 50 denoising steps). The GPU utilization is 100% but the vram usage is extremely low (around 7,8 GB per GPU).
The command I used for running my code is:
CUDA_VISIBLE_DEVICES=1,2,3,4 torchrun --nproc_per_node=4 generate_video_wan.pyThe code I run is as follows:
import torch
import torch.distributed as dist
from PIL import Image
import os
import json
from diffsynth import save_video, VideoData
from diffsynth.pipelines.wan_video_new import WanVideoPipeline, ModelConfig
NEGATIVE_PROMPT = "色调艳丽,过曝,静态,细节模糊不清,字幕,风格,作品,画作,画面,静止,整体发灰,最差质量,低质量,JPEG压缩残留,丑陋的,残缺的,多余的手指,画得不好的手部,画得不好的脸部,畸形的,毁容的,形态畸形的肢体,手指融合,静止不动的画面,杂乱的背景,三条腿,背景人很多,倒着走"
if __name__ == "__main__":
pipe = WanVideoPipeline.from_pretrained(
torch_dtype=torch.bfloat16,
device="cuda",
use_usp=True,
model_configs=[
ModelConfig(model_id="PAI/Wan2.2-Fun-A14B-Control", origin_file_pattern="high_noise_model/diffusion_pytorch_model*.safetensors", offload_device="cpu", path="models/PAI/Wan2.2-Fun-A14B-Control/high_noise_model/diffusion_pytorch_model.safetensors"),
ModelConfig(model_id="PAI/Wan2.2-Fun-A14B-Control", origin_file_pattern="low_noise_model/diffusion_pytorch_model*.safetensors", offload_device="cpu", path="models/PAI/Wan2.2-Fun-A14B-Control/low_noise_model/diffusion_pytorch_model.safetensors"),
ModelConfig(model_id="PAI/Wan2.2-Fun-A14B-Control", origin_file_pattern="models_t5_umt5-xxl-enc-bf16.pth", offload_device="cpu", path="models/PAI/Wan2.2-Fun-A14B-Control/models_t5_umt5-xxl-enc-bf16.pth"),
ModelConfig(model_id="PAI/Wan2.2-Fun-A14B-Control", origin_file_pattern="Wan2.1_VAE.pth", offload_device="cpu", path="models/PAI/Wan2.2-Fun-A14B-Control/Wan2.1_VAE.pth"),
],
tokenizer_config=ModelConfig(model_id="Wan-AI/Wan2.1-T2V-1.3B", origin_file_pattern="google/*", path="models/Wan-AI/Wan2.1-T2V-1.3B/google/umt5-xxl"),
)
pipe.enable_vram_management()
reference_image = Image.open("myimage.png") # The size is 768 * 1024
control_video = VideoData("myvideo.mp4", height=768, width=1024) # There are 121 frames in this video
prompt = "My prompt"
video = pipe(
prompt=prompt,
negative_prompt=NEGATIVE_PROMPT,
reference_image=reference_image,
control_video=control_video,
height=768,
width=1024,
num_frames=121,
seed=1,
tiled=True,
)
if dist.get_rank() == 0:
save_video(video, os.path.join(SCENE, "video", f"{INSTANCE_ID:03d}_wan.mp4"), fps=24, quality=5)I had looked over the issues posted in this repo, but seems that it is a new problem. Thank you for your time and I really wish to get respond from you.
Metadata
Metadata
Assignees
Labels
No labels