-
Notifications
You must be signed in to change notification settings - Fork 6.5k
Closed
Labels
bugSomething isn't workingSomething isn't working
Description
Describe the bug
LTXConditionPipeline Doc example results in poor video
Reproduction
Running this code from the documentation: https://huggingface.co/docs/diffusers/main/en/api/pipelines/ltx_video
Results in this video:
https://github.com/user-attachments/assets/374b82de-2d4f-4929-89d0-74ea98121bea
Is this really supposed to be the correct output?
BTW. this feature seems like it could need a more extensive documentation (including the text-prompt-only feature)?
import torch
from diffusers.pipelines.ltx.pipeline_ltx_condition import LTXConditionPipeline, LTXVideoCondition
from diffusers.utils import export_to_video, load_video, load_image
pipe = LTXConditionPipeline.from_pretrained("Lightricks/LTX-Video-0.9.5", torch_dtype=torch.bfloat16)
pipe.to("cuda")
# Load input image and video
video = load_video(
"https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/cosmos/cosmos-video2world-input-vid.mp4"
)
image = load_image(
"https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/cosmos/cosmos-video2world-input.jpg"
)
# Create conditioning objects
condition1 = LTXVideoCondition(
image=image,
frame_index=0,
)
condition2 = LTXVideoCondition(
video=video,
frame_index=80,
)
prompt = "The video depicts a long, straight highway stretching into the distance, flanked by metal guardrails. The road is divided into multiple lanes, with a few vehicles visible in the far distance. The surrounding landscape features dry, grassy fields on one side and rolling hills on the other. The sky is mostly clear with a few scattered clouds, suggesting a bright, sunny day. And then the camera switch to a winding mountain road covered in snow, with a single vehicle traveling along it. The road is flanked by steep, rocky cliffs and sparse vegetation. The landscape is characterized by rugged terrain and a river visible in the distance. The scene captures the solitude and beauty of a winter drive through a mountainous region."
negative_prompt = "worst quality, inconsistent motion, blurry, jittery, distorted"
# Generate video
generator = torch.Generator("cuda").manual_seed(0)
# Text-only conditioning is also supported without the need to pass `conditions`
video = pipe(
conditions=[condition1, condition2],
prompt=prompt,
negative_prompt=negative_prompt,
width=768,
height=512,
num_frames=161,
num_inference_steps=40,
generator=generator,
).frames[0]
export_to_video(video, "output.mp4", fps=24)
Logs
Loading checkpoint shards: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 4/4 [00:13<00:00, 3.43s/it]
Loading pipeline components...: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββ| 5/5 [00:14<00:00, 2.81s/it]
Token indices sequence length is longer than the specified maximum sequence length for this model (166 > 128). Running this sequence through the model will result in indexing errorsSystem Info
Diffusers 0.33.0
Who can help?
No response
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working