Skip to content

Commit e22fc2f

Browse files
committed
update docs
1 parent 817b360 commit e22fc2f

File tree

1 file changed

+90
-3
lines changed

1 file changed

+90
-3
lines changed

docs/source/en/api/pipelines/ltx_video.md

Lines changed: 90 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -31,12 +31,93 @@ Available models:
3131

3232
| Model name | Recommended dtype |
3333
|:-------------:|:-----------------:|
34-
| [`LTX Video 0.9.0`](https://huggingface.co/Lightricks/LTX-Video/blob/main/ltx-video-2b-v0.9.safetensors) | `torch.bfloat16` |
35-
| [`LTX Video 0.9.1`](https://huggingface.co/Lightricks/LTX-Video/blob/main/ltx-video-2b-v0.9.1.safetensors) | `torch.bfloat16` |
36-
| [`LTX Video 0.9.5`](https://huggingface.co/Lightricks/LTX-Video/blob/main/ltx-video-2b-v0.9.5.safetensors) | `torch.bfloat16` |
34+
| [`LTX Video 2B 0.9.0`](https://huggingface.co/Lightricks/LTX-Video/blob/main/ltx-video-2b-v0.9.safetensors) | `torch.bfloat16` |
35+
| [`LTX Video 2B 0.9.1`](https://huggingface.co/Lightricks/LTX-Video/blob/main/ltx-video-2b-v0.9.1.safetensors) | `torch.bfloat16` |
36+
| [`LTX Video 2B 0.9.5`](https://huggingface.co/Lightricks/LTX-Video/blob/main/ltx-video-2b-v0.9.5.safetensors) | `torch.bfloat16` |
37+
| [`LTX Video 13B 0.9.7`](https://huggingface.co/Lightricks/LTX-Video/blob/main/ltxv-13b-0.9.7-dev.safetensors) | `torch.bfloat16` |
38+
| [`LTX Video Spatial Upscaler 0.9.7`](https://huggingface.co/Lightricks/LTX-Video/blob/main/ltxv-spatial-upscaler-0.9.7.safetensors) | `torch.bfloat16` |
3739

3840
Note: The recommended dtype is for the transformer component. The VAE and text encoders can be either `torch.float32`, `torch.bfloat16` or `torch.float16` but the recommended dtype is `torch.bfloat16` as used in the original repository.
3941

42+
## Using LTX Video 13B 0.9.7
43+
44+
LTX Video 0.9.7 comes with a spatial latent upscaler and a 13B parameter transformer. The inference involves generating a low resolution video first, which is very fast, followed by upscaling and refining the generated video.
45+
46+
```python
47+
import torch
48+
from diffusers import LTXConditionPipeline, LTXLatentUpsamplePipeline
49+
from diffusers.pipelines.ltx.pipeline_ltx_condition import LTXVideoCondition
50+
from diffusers.utils import export_to_video, load_video
51+
52+
pipe = LTXConditionPipeline.from_pretrained("/raid/aryan/diffusers-ltx/ltx_pipeline", torch_dtype=torch.bfloat16)
53+
pipe_upsample = LTXLatentUpsamplePipeline.from_pretrained("/raid/aryan/diffusers-ltx/ltx_upsample_pipeline", vae=pipe.vae, torch_dtype=torch.bfloat16)
54+
pipe.to("cuda")
55+
pipe_upsample.to("cuda")
56+
pipe.vae.enable_tiling()
57+
58+
def round_to_nearest_resolution_acceptable_by_vae(height, width):
59+
height = height - (height % pipe.vae_temporal_compression_ratio)
60+
width = width - (width % pipe.vae_temporal_compression_ratio)
61+
return height, width
62+
63+
video = load_video(
64+
"https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/cosmos/cosmos-video2world-input-vid.mp4"
65+
)[:21] # Use only the first 21 frames as conditioning
66+
condition1 = LTXVideoCondition(video=video, frame_index=0)
67+
68+
prompt = "The video depicts a winding mountain road covered in snow, with a single vehicle traveling along it. The road is flanked by steep, rocky cliffs and sparse vegetation. The landscape is characterized by rugged terrain and a river visible in the distance. The scene captures the solitude and beauty of a winter drive through a mountainous region."
69+
negative_prompt = "worst quality, inconsistent motion, blurry, jittery, distorted"
70+
expected_height, expected_width = 768, 1152
71+
downscale_factor = 2 / 3
72+
num_frames = 161
73+
74+
# Part 1. Generate video at smaller resolution
75+
# Text-only conditioning is also supported without the need to pass `conditions`
76+
downscaled_height, downscaled_width = int(expected_height * downscale_factor), int(expected_width * downscale_factor)
77+
downscaled_height, downscaled_width = round_to_nearest_resolution_acceptable_by_vae(downscaled_height, downscaled_width)
78+
latents = pipe(
79+
conditions=[condition1],
80+
prompt=prompt,
81+
negative_prompt=negative_prompt,
82+
width=downscaled_width,
83+
height=downscaled_height,
84+
num_frames=num_frames,
85+
num_inference_steps=30,
86+
generator=torch.Generator().manual_seed(0),
87+
output_type="latent",
88+
).frames
89+
90+
# Part 2. Upscale generated video using latent upsampler with fewer inference steps
91+
# The available latent upsampler upscales the height/width by 2x
92+
upscaled_height, upscaled_width = downscaled_height * 2, downscaled_width * 2
93+
upscaled_latents = pipe_upsample(
94+
latents=latents,
95+
output_type="latent"
96+
).frames
97+
98+
# Part 3. Denoise the upscaled video with few steps to improve texture (optional, but recommended)
99+
# No extra conditioning is passed, so this effectively is a low-step refinement of the upscaled video
100+
video = pipe(
101+
prompt=prompt,
102+
negative_prompt=negative_prompt,
103+
width=upscaled_width,
104+
height=upscaled_height,
105+
num_frames=num_frames,
106+
denoise_strength=0.4, # Effectively, 4 inference steps out of 10
107+
num_inference_steps=10,
108+
latents=upscaled_latents,
109+
decode_timestep=0.05,
110+
image_cond_noise_scale=0.025,
111+
generator=torch.Generator().manual_seed(0),
112+
output_type="pil",
113+
).frames[0]
114+
115+
# Part 4. Downscale the video to the expected resolution
116+
video = [frame.resize((expected_width, expected_height)) for frame in video]
117+
118+
export_to_video(video, "output.mp4", fps=24)
119+
```
120+
40121
## Loading Single Files
41122

42123
Loading the original LTX Video checkpoints is also possible with [`~ModelMixin.from_single_file`]. We recommend using `from_single_file` for the Lightricks series of models, as they plan to release multiple models in the future in the single file format.
@@ -204,6 +285,12 @@ export_to_video(video, "ship.mp4", fps=24)
204285
- all
205286
- __call__
206287

288+
## LTXLatentUpsamplePipeline
289+
290+
[[autodoc]] LTXLatentUpsamplePipeline
291+
- all
292+
- __call__
293+
207294
## LTXPipelineOutput
208295

209296
[[autodoc]] pipelines.ltx.pipeline_output.LTXPipelineOutput

0 commit comments

Comments
 (0)