Skip to content

Commit 6ff652c

Browse files
committed
Models: Release LTXV-13B-distilled
1 parent 832e043 commit 6ff652c

File tree

2 files changed

+59
-16
lines changed

2 files changed

+59
-16
lines changed

README.md

Lines changed: 31 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ This is the official repository for LTX-Video.
66

77
[Website](https://www.lightricks.com/ltxv) |
88
[Model](https://huggingface.co/Lightricks/LTX-Video) |
9-
[Demo](https://app.ltx.studio/ltx-video) |
9+
[Demo](https://app.ltx.studio/motion-workspace?videoModel=ltxv-13b) |
1010
[Paper](https://arxiv.org/abs/2501.00103) |
1111
[Trainer](https://github.com/Lightricks/LTX-Video-Trainer) |
1212
[Discord](https://discord.gg/Mn8BRgUKKy)
@@ -57,6 +57,16 @@ The model supports text-to-image, image-to-video, keyframe-based animation, vide
5757

5858
# News
5959

60+
## May, 14th, 2025: New distilled model 13B v0.9.7:
61+
- Release a new 13B distilled model [ltxv-13b-0.9.7-distilled](https://huggingface.co/Lightricks/LTX-Video/blob/main/ltxv-13b-0.9.7-distilled.safetensors)
62+
* Amazing for iterative work - generates HD videos in 10 seconds, with low-res preview after just 3 seconds (on H100)!
63+
* Does not require classifier-free guidance and spatio-temporal guidance.
64+
* Supports sampling with 8 (recommended), or less diffusion steps.
65+
* Also released a LoRA version of the distilled model, [ltxv-13b-0.9.7-distilled-lora128](https://huggingface.co/Lightricks/LTX-Video/blob/main/ltxv-13b-0.9.7-distilled-lora128.safetensors)
66+
* Requires only 1GB of VRAM
67+
* Can be used with the full 13B model for fast inference
68+
- Release a new quantized distilled model [ltxv-13b-0.9.7-distilled-fp8](https://huggingface.co/Lightricks/LTX-Video/blob/main/ltxv-13b-0.9.7-distilled-fp8.safetensors) for *real-time* generation (on H100) with even less VRAM (Supported in the [official CompfyUI workflow](https://github.com/Lightricks/ComfyUI-LTXVideo/))
69+
6070
## May, 5th, 2025: New model 13B v0.9.7:
6171
- Release a new 13B model [ltxv-13b-0.9.7-dev](https://huggingface.co/Lightricks/LTX-Video/blob/main/ltxv-13b-0.9.7-dev.safetensors)
6272
- Release a new quantized model [ltxv-13b-0.9.7-dev-fp8](https://huggingface.co/Lightricks/LTX-Video/blob/main/ltxv-13b-0.9.7-dev-fp8.safetensors) for faster inference with less VRam (Supported in the [official CompfyUI workflow](https://github.com/Lightricks/ComfyUI-LTXVideo/))
@@ -72,7 +82,7 @@ The model supports text-to-image, image-to-video, keyframe-based animation, vide
7282
- Release a new distilled model [ltxv-2b-0.9.6-distilled-04-25](https://huggingface.co/Lightricks/LTX-Video/blob/main/ltxv-2b-0.9.6-distilled-04-25.safetensors)
7383
* 15x faster inference than non-distilled model.
7484
* Does not require classifier-free guidance and spatio-temporal guidance.
75-
* Supports sampling with 8 (recommended), 4, 2 or 1 diffusion steps.
85+
* Supports sampling with 8 (recommended), or less diffusion steps.
7686
- Improved prompt adherence, motion quality and fine details.
7787
- New default resolution and FPS: 1216 × 704 pixels at 30 FPS
7888
* Still real time on H100 with the distilled model.
@@ -114,21 +124,26 @@ The model supports text-to-image, image-to-video, keyframe-based animation, vide
114124
- Support text-to-video and image-to-video generation
115125

116126

117-
# Models
127+
# Models & Workflows
118128

119-
| Model | Version | Notes | inference.py config | ComfyUI workflow (Recommended) |
120-
|--------------------|---------|--------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------|------------------|
121-
| ltxv-13b | 0.9.7 | Highest quality, requires more VRAM | [ltxv-13b-0.9.7-dev.yaml](https://github.com/Lightricks/LTX-Video/blob/main/configs/ltxv-13b-0.9.7-dev.yaml) | [ltxv-13b-i2v-base.json](https://github.com/Lightricks/ComfyUI-LTXVideo/blob/master/example_workflows/ltxv-13b-i2v-base.json) |
122-
| ltxv-13b-fp8 | 0.9.7 | Quantized version of ltxv-13b | Coming soon | [ltxv-13b-i2v-base-fp8.json](https://github.com/Lightricks/ComfyUI-LTXVideo/blob/master/example_workflows/ltxv-13b-i2v-base-fp8.json) |
123-
| ltxv-2b | 0.9.6 | Good quality, lower VRAM requirement than ltxv-13b | [ltxv-2b-0.9.6-dev.yaml](https://github.com/Lightricks/LTX-Video/blob/main/configs/ltxv-2b-0.9.6-dev.yaml) | [ltxvideo-i2v.json](https://github.com/Lightricks/ComfyUI-LTXVideo/blob/master/example_workflows/low_level/ltxvideo-i2v.json) |
124-
| ltxv-2b-distilled | 0.9.6 | 15× faster, real-time capable, fewer steps needed, no STG/CFG required | [ltxv-2b-0.9.6-distilled.yaml](https://github.com/Lightricks/LTX-Video/blob/main/configs/ltxv-2b-0.9.6-distilled.yaml) | [ltxvideo-i2v-distilled.json](https://github.com/Lightricks/ComfyUI-LTXVideo/blob/master/example_workflows/low_level/ltxvideo-i2v-distilled.json) |
129+
| Name | Notes | inference.py config | ComfyUI workflow (Recommended) |
130+
|-------------------------|--------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------|--------------------------------|
131+
| ltxv-13b-0.9.7-dev | Highest quality, requires more VRAM | [ltxv-13b-0.9.7-dev.yaml](https://github.com/Lightricks/LTX-Video/blob/main/configs/ltxv-13b-0.9.7-dev.yaml) | [ltxv-13b-i2v-base.json](https://github.com/Lightricks/ComfyUI-LTXVideo/blob/master/example_workflows/ltxv-13b-i2v-base.json) |
132+
| [ltxv-13b-0.9.7-mix](https://app.ltx.studio/motion-workspace?videoModel=ltxv-13b) | Mix ltxv-13b-dev and ltxv-13b-distilled in the same multi-scale rendering workflow for balanced speed-quality | N/A | [ltxv-13b-i2v-mix.json](https://github.com/Lightricks/ComfyUI-LTXVideo/blob/master/example_workflows/ltxv13b-i2v-mixed-multiscale.json) |
133+
[ltxv-13b-0.9.7-distilled](https://app.ltx.studio/motion-workspace?videoModel=ltxv) | Faster, less VRAM usage, slight quality reduction compared to 13b. Ideal for rapid iterations | [ltxv-13b-0.9.7-distilled.yaml](https://github.com/Lightricks/LTX-Video/blob/main/configs/ltxv-13b-0.9.7-dev.yaml) | [ltxv-13b-dist-i2v-base.json](https://github.com/Lightricks/ComfyUI-LTXVideo/blob/master/example_workflows/13b-distilled/ltxv-13b-dist-i2v-base.json) |
134+
| [ltxv-13b-0.9.7-distilled-lora128](https://huggingface.co/Lightricks/LTX-Video/blob/main/ltxv-13b-0.9.7-distilled-lora128.safetensors) | LoRA to make ltxv-13b-dev behave like the distilled model | N/A | N/A |
135+
| ltxv-13b-0.9.7-fp8 | Quantized version of ltxv-13b | Coming soon | [ltxv-13b-i2v-base-fp8.json](https://github.com/Lightricks/ComfyUI-LTXVideo/blob/master/example_workflows/ltxv-13b-i2v-base-fp8.json) |
136+
| ltxv-13b-0.9.7-distilled-fp8 | Quantized version of ltxv-13b-distilled | Coming soon | [ltxv-13b-dist-fp8-i2v-base.json](https://github.com/Lightricks/ComfyUI-LTXVideo/blob/master/example_workflows/13b-distilled/ltxv-13b-dist-fp8-i2v-base.json) |
137+
| ltxv-2b-0.9.6 | Good quality, lower VRAM requirement than ltxv-13b | [ltxv-2b-0.9.6-dev.yaml](https://github.com/Lightricks/LTX-Video/blob/main/configs/ltxv-2b-0.9.6-dev.yaml) | [ltxvideo-i2v.json](https://github.com/Lightricks/ComfyUI-LTXVideo/blob/master/example_workflows/low_level/ltxvideo-i2v.json) |
138+
| ltxv-2b-0.9.6-distilled | 15× faster, real-time capable, fewer steps needed, no STG/CFG required | [ltxv-2b-0.9.6-distilled.yaml](https://github.com/Lightricks/LTX-Video/blob/main/configs/ltxv-2b-0.9.6-distilled.yaml) | [ltxvideo-i2v-distilled.json](https://github.com/Lightricks/ComfyUI-LTXVideo/blob/master/example_workflows/low_level/ltxvideo-i2v-distilled.json) |
125139

126140

127141
# Quick Start Guide
128142

129143
## Online inference
130144
The model is accessible right away via the following links:
131-
- [LTX-Studio image-to-video](https://app.ltx.studio/ltx-video)
145+
- [LTX-Studio image-to-video (13B-mix)](https://app.ltx.studio/motion-workspace?videoModel=ltxv-13b)
146+
- [LTX-Studio image-to-video (13B distilled)](https://app.ltx.studio/motion-workspace?videoModel=ltxv)
132147
- [Fal.ai text-to-video](https://fal.ai/models/fal-ai/ltx-video)
133148
- [Fal.ai image-to-video](https://fal.ai/models/fal-ai/ltx-video/image-to-video)
134149
- [Replicate text-to-video and image-to-video](https://replicate.com/lightricks/ltx-video)
@@ -158,13 +173,13 @@ To use our model, please follow the inference code in [inference.py](./inference
158173
#### For text-to-video generation:
159174

160175
```bash
161-
python inference.py --prompt "PROMPT" --height HEIGHT --width WIDTH --num_frames NUM_FRAMES --seed SEED --pipeline_config configs/ltxv-13b-0.9.7-dev.yaml
176+
python inference.py --prompt "PROMPT" --height HEIGHT --width WIDTH --num_frames NUM_FRAMES --seed SEED --pipeline_config configs/ltxv-13b-0.9.7-distilled.yaml
162177
```
163178

164179
#### For image-to-video generation:
165180

166181
```bash
167-
python inference.py --prompt "PROMPT" --conditioning_media_paths IMAGE_PATH --conditioning_start_frames 0 --height HEIGHT --width WIDTH --num_frames NUM_FRAMES --seed SEED --pipeline_config configs/ltxv-13b-0.9.7-dev.yaml
182+
python inference.py --prompt "PROMPT" --conditioning_media_paths IMAGE_PATH --conditioning_start_frames 0 --height HEIGHT --width WIDTH --num_frames NUM_FRAMES --seed SEED --pipeline_config configs/ltxv-13b-0.9.7-distilled.yaml
168183
```
169184

170185
#### Extending a video:
@@ -173,7 +188,7 @@ python inference.py --prompt "PROMPT" --conditioning_media_paths IMAGE_PATH --co
173188

174189

175190
```bash
176-
python inference.py --prompt "PROMPT" --conditioning_media_paths VIDEO_PATH --conditioning_start_frames START_FRAME --height HEIGHT --width WIDTH --num_frames NUM_FRAMES --seed SEED --pipeline_config configs/ltxv-13b-0.9.7-dev.yaml
191+
python inference.py --prompt "PROMPT" --conditioning_media_paths VIDEO_PATH --conditioning_start_frames START_FRAME --height HEIGHT --width WIDTH --num_frames NUM_FRAMES --seed SEED --pipeline_config configs/ltxv-13b-0.9.7-distilled.yaml
177192
```
178193

179194
#### For video generation with multiple conditions:
@@ -182,7 +197,7 @@ You can now generate a video conditioned on a set of images and/or short video s
182197
Simply provide a list of paths to the images or video segments you want to condition on, along with their target frame numbers in the generated video. You can also specify the conditioning strength for each item (default: 1.0).
183198

184199
```bash
185-
python inference.py --prompt "PROMPT" --conditioning_media_paths IMAGE_OR_VIDEO_PATH_1 IMAGE_OR_VIDEO_PATH_2 --conditioning_start_frames TARGET_FRAME_1 TARGET_FRAME_2 --height HEIGHT --width WIDTH --num_frames NUM_FRAMES --seed SEED --pipeline_config configs/ltxv-13b-0.9.7-dev.yaml
200+
python inference.py --prompt "PROMPT" --conditioning_media_paths IMAGE_OR_VIDEO_PATH_1 IMAGE_OR_VIDEO_PATH_2 --conditioning_start_frames TARGET_FRAME_1 TARGET_FRAME_2 --height HEIGHT --width WIDTH --num_frames NUM_FRAMES --seed SEED --pipeline_config configs/ltxv-13b-0.9.7-distilled.yaml
186201
```
187202

188203
## ComfyUI Integration
@@ -268,8 +283,8 @@ please let us know by opening an issue or pull request.
268283

269284
# ⚡️ Training
270285

271-
We provide an open-source repository for fine-tuning the LTX-Video model: [LTX-Video-Trainer](https://github.com/Lightricks/LTX-Video-Trainer).
272-
This repository supports both the 2B and 13B model variants, enabling full fine-tuning as well as LoRA (Low-Rank Adaptation) fine-tuning for more efficient training.
286+
We provide an open-source repository for fine-tuning the LTX-Video model: [LTX-Video-Trainer](https://github.com/Lightricks/LTX-Video-Trainer).
287+
This repository supports both the 2B and 13B model variants, enabling full fine-tuning as well as LoRA (Low-Rank Adaptation) fine-tuning for more efficient training.
273288

274289
Explore the repository to customize the model for your specific use cases!
275290
More information and training instructions can be found in the [README](https://github.com/Lightricks/LTX-Video-Trainer/blob/main/README.md).
Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,28 @@
1+
pipeline_type: multi-scale
2+
checkpoint_path: "ltxv-13b-0.9.7-distilled.safetensors"
3+
downscale_factor: 0.6666666
4+
spatial_upscaler_model_path: "ltxv-spatial-upscaler-0.9.7.safetensors"
5+
stg_mode: "attention_values" # options: "attention_values", "attention_skip", "residual", "transformer_block"
6+
decode_timestep: 0.05
7+
decode_noise_scale: 0.025
8+
text_encoder_model_name_or_path: "PixArt-alpha/PixArt-XL-2-1024-MS"
9+
precision: "bfloat16"
10+
sampler: "from_checkpoint" # options: "uniform", "linear-quadratic", "from_checkpoint"
11+
prompt_enhancement_words_threshold: 120
12+
prompt_enhancer_image_caption_model_name_or_path: "MiaoshouAI/Florence-2-large-PromptGen-v2.0"
13+
prompt_enhancer_llm_model_name_or_path: "unsloth/Llama-3.2-3B-Instruct"
14+
stochastic_sampling: false
15+
16+
first_pass:
17+
timesteps: [1.0000, 0.9937, 0.9875, 0.9812, 0.9750, 0.9094, 0.7250]
18+
guidance_scale: 1
19+
stg_scale: 0
20+
rescaling_scale: 1
21+
skip_block_list: [42]
22+
23+
second_pass:
24+
timesteps: [0.9094, 0.7250, 0.4219]
25+
guidance_scale: 1
26+
stg_scale: 0
27+
rescaling_scale: 1
28+
skip_block_list: [42]

0 commit comments

Comments
 (0)