LTXConditionPipeline Doc example results in poor video

### Describe the bug

LTXConditionPipeline Doc example results in poor video

### Reproduction

Running this code from the documentation: https://huggingface.co/docs/diffusers/main/en/api/pipelines/ltx_video

Results in this video:
https://github.com/user-attachments/assets/374b82de-2d4f-4929-89d0-74ea98121bea

Is this really supposed to be the correct output? 

BTW. this feature seems like it could need a more extensive documentation (including the text-prompt-only feature)?

```
import torch
from diffusers.pipelines.ltx.pipeline_ltx_condition import LTXConditionPipeline, LTXVideoCondition
from diffusers.utils import export_to_video, load_video, load_image

pipe = LTXConditionPipeline.from_pretrained("Lightricks/LTX-Video-0.9.5", torch_dtype=torch.bfloat16)
pipe.to("cuda")

# Load input image and video
video = load_video(
    "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/cosmos/cosmos-video2world-input-vid.mp4"
)
image = load_image(
    "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/cosmos/cosmos-video2world-input.jpg"
)

# Create conditioning objects
condition1 = LTXVideoCondition(
    image=image,
    frame_index=0,
)
condition2 = LTXVideoCondition(
    video=video,
    frame_index=80,
)

prompt = "The video depicts a long, straight highway stretching into the distance, flanked by metal guardrails. The road is divided into multiple lanes, with a few vehicles visible in the far distance. The surrounding landscape features dry, grassy fields on one side and rolling hills on the other. The sky is mostly clear with a few scattered clouds, suggesting a bright, sunny day. And then the camera switch to a winding mountain road covered in snow, with a single vehicle traveling along it. The road is flanked by steep, rocky cliffs and sparse vegetation. The landscape is characterized by rugged terrain and a river visible in the distance. The scene captures the solitude and beauty of a winter drive through a mountainous region."
negative_prompt = "worst quality, inconsistent motion, blurry, jittery, distorted"

# Generate video
generator = torch.Generator("cuda").manual_seed(0)
# Text-only conditioning is also supported without the need to pass `conditions`
video = pipe(
    conditions=[condition1, condition2],
    prompt=prompt,
    negative_prompt=negative_prompt,
    width=768,
    height=512,
    num_frames=161,
    num_inference_steps=40,
    generator=generator,
).frames[0]

export_to_video(video, "output.mp4", fps=24)
```

### Logs

```shell
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████| 4/4 [00:13<00:00,  3.43s/it]
Loading pipeline components...: 100%|████████████████████████████████████████████████████| 5/5 [00:14<00:00,  2.81s/it]
Token indices sequence length is longer than the specified maximum sequence length for this model (166 > 128). Running this sequence through the model will result in indexing errors
```

### System Info

Diffusers 0.33.0

### Who can help?

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

LTXConditionPipeline Doc example results in poor video #11278

Describe the bug

Reproduction

Logs

System Info

Who can help?

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

LTXConditionPipeline Doc example results in poor video #11278

Description

Describe the bug

Reproduction

Logs

System Info

Who can help?

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions