Skip to content

Commit bdc8f01

Browse files
authored
Merge pull request #221 from LightricksResearch/release/v0-9-8
LTXV-0.9.8: Long shot generation
2 parents 20799e5 + 02844ce commit bdc8f01

File tree

8 files changed

+137
-19
lines changed

8 files changed

+137
-19
lines changed

README.md

Lines changed: 26 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -59,6 +59,20 @@ The model supports image-to-video, keyframe-based animation, video extension (bo
5959

6060
# News
6161

62+
## July, 16th, 2025: New Distilled models v0.9.8 with up to 60 seconds of video:
63+
- Long shot generation in LTXV-13B!
64+
* LTX-Video now supports up to 60 seconds of video.
65+
* Compatible also with the official IC-LoRAs.
66+
* Try now in [ComfyUI](https://github.com/Lightricks/ComfyUI-LTXVideo/tree/master/example_workflows/ltxv-13b-i2v-long-multi-prompt.json).
67+
- Release a new distilled models:
68+
* 13B distilled model [ltxv-13b-0.9.8-distilled](https://github.com/Lightricks/LTX-Video/blob/main/configs/ltxv-13b-0.9.8-distilled.yaml)
69+
* 2B distilled model [ltxv-2b-0.9.8-distilled](https://github.com/Lightricks/LTX-Video/blob/main/configs/ltxv-2b-0.9.8-distilled.yaml)
70+
* Both models are distilled from the same base model [ltxv-13b-0.9.8-dev](https://github.com/Lightricks/LTX-Video/blob/main/configs/ltxv-13b-0.9.8-dev.yaml) and are compatible for use together in the same multiscale pipeline.
71+
* Improved prompt understanding and detail generation
72+
* Includes corresponding FP8 weights and workflows.
73+
- Release a new detailer model [LTX-Video-ICLoRA-detailer-13B-0.9.8](https://huggingface.co/Lightricks/LTX-Video-ICLoRA-detailer-13b-0.9.8)
74+
* Available in [ComfyUI](https://github.com/Lightricks/ComfyUI-LTXVideo/tree/master/example_workflows/ltxv-13b-upscale.json).
75+
6276
## July, 8th, 2025: New Control Models Released!
6377
- Released three new control models for LTX-Video on HuggingFace:
6478
* **Depth Control**: [LTX-Video-ICLoRA-depth-13b-0.9.7](https://huggingface.co/Lightricks/LTX-Video-ICLoRA-depth-13b-0.9.7)
@@ -137,12 +151,13 @@ The model supports image-to-video, keyframe-based animation, video extension (bo
137151

138152
| Name | Notes | inference.py config | ComfyUI workflow (Recommended) |
139153
|-------------------------|--------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------|--------------------------------|
140-
| ltxv-13b-0.9.7-dev | Highest quality, requires more VRAM | [ltxv-13b-0.9.7-dev.yaml](https://github.com/Lightricks/LTX-Video/blob/main/configs/ltxv-13b-0.9.7-dev.yaml) | [ltxv-13b-i2v-base.json](https://github.com/Lightricks/ComfyUI-LTXVideo/blob/master/example_workflows/ltxv-13b-i2v-base.json) |
141-
| [ltxv-13b-0.9.7-mix](https://app.ltx.studio/motion-workspace?videoModel=ltxv-13b) | Mix ltxv-13b-dev and ltxv-13b-distilled in the same multi-scale rendering workflow for balanced speed-quality | N/A | [ltxv-13b-i2v-mixed-multiscale.json](https://github.com/Lightricks/ComfyUI-LTXVideo/blob/master/example_workflows/ltxv-13b-i2v-mixed-multiscale.json) |
142-
[ltxv-13b-0.9.7-distilled](https://app.ltx.studio/motion-workspace?videoModel=ltxv) | Faster, less VRAM usage, slight quality reduction compared to 13b. Ideal for rapid iterations | [ltxv-13b-0.9.7-distilled.yaml](https://github.com/Lightricks/LTX-Video/blob/main/configs/ltxv-13b-0.9.7-dev.yaml) | [ltxv-13b-dist-i2v-base.json](https://github.com/Lightricks/ComfyUI-LTXVideo/blob/master/example_workflows/13b-distilled/ltxv-13b-dist-i2v-base.json) |
143-
| [ltxv-13b-0.9.7-distilled-lora128](https://huggingface.co/Lightricks/LTX-Video/blob/main/ltxv-13b-0.9.7-distilled-lora128.safetensors) | LoRA to make ltxv-13b-dev behave like the distilled model | N/A | N/A |
144-
| ltxv-13b-0.9.7-dev-fp8 | Quantized version of ltxv-13b | [ltxv-13b-0.9.7-dev-fp8.yaml](https://github.com/Lightricks/LTX-Video/blob/main/configs/ltxv-13b-0.9.7-dev-fp8.yaml) | [ltxv-13b-i2v-base-fp8.json](https://github.com/Lightricks/ComfyUI-LTXVideo/blob/master/example_workflows/ltxv-13b-i2v-base-fp8.json) |
145-
| ltxv-13b-0.9.7-distilled-fp8 | Quantized version of ltxv-13b-distilled | [ltxv-13b-0.9.7-distilled-fp8.yaml](https://github.com/Lightricks/LTX-Video/blob/main/configs/ltxv-13b-0.9.7-distilled-fp8.yaml) | [ltxv-13b-dist-i2v-base-fp8.json](https://github.com/Lightricks/ComfyUI-LTXVideo/blob/master/example_workflows/13b-distilled/ltxv-13b-dist-i2v-base-fp8.json) |
154+
| ltxv-13b-0.9.8-dev | Highest quality, requires more VRAM | [ltxv-13b-0.9.8-dev.yaml](https://github.com/Lightricks/LTX-Video/blob/main/configs/ltxv-13b-0.9.8-dev.yaml) | [ltxv-13b-i2v-base.json](https://github.com/Lightricks/ComfyUI-LTXVideo/blob/master/example_workflows/ltxv-13b-i2v-base.json) |
155+
| [ltxv-13b-0.9.8-mix](https://app.ltx.studio/motion-workspace?videoModel=ltxv-13b) | Mix ltxv-13b-dev and ltxv-13b-distilled in the same multi-scale rendering workflow for balanced speed-quality | N/A | [ltxv-13b-i2v-mixed-multiscale.json](https://github.com/Lightricks/ComfyUI-LTXVideo/blob/master/example_workflows/ltxv-13b-i2v-mixed-multiscale.json) |
156+
[ltxv-13b-0.9.8-distilled](https://app.ltx.studio/motion-workspace?videoModel=ltxv) | Faster, less VRAM usage, slight quality reduction compared to 13b. Ideal for rapid iterations | [ltxv-13b-0.9.8-distilled.yaml](https://github.com/Lightricks/LTX-Video/blob/main/configs/ltxv-13b-0.9.8-distilled.yaml) | [ltxv-13b-dist-i2v-base.json](https://github.com/Lightricks/ComfyUI-LTXVideo/blob/master/example_workflows/13b-distilled/ltxv-13b-dist-i2v-base.json) |
157+
ltxv-2b-0.9.8-distilled | Smaller model, slight quality reduction compared to 13b distilled. Ideal for fast generation with light VRAM usage | [ltxv-2b-0.9.8-distilled.yaml](https://github.com/Lightricks/LTX-Video/blob/main/configs/ltxv-2b-0.9.8-distilled.yaml) | N/A |
158+
| ltxv-13b-0.9.8-dev-fp8 | Quantized version of ltxv-13b | [ltxv-13b-0.9.8-dev-fp8.yaml](https://github.com/Lightricks/LTX-Video/blob/main/configs/ltxv-13b-0.9.8-dev-fp8.yaml) | [ltxv-13b-i2v-base-fp8.json](https://github.com/Lightricks/ComfyUI-LTXVideo/blob/master/example_workflows/ltxv-13b-i2v-base-fp8.json) |
159+
| ltxv-13b-0.9.8-distilled-fp8 | Quantized version of ltxv-13b-distilled | [ltxv-13b-0.9.8-distilled-fp8.yaml](https://github.com/Lightricks/LTX-Video/blob/main/configs/ltxv-13b-0.9.8-distilled-fp8.yaml) | [ltxv-13b-dist-i2v-base-fp8.json](https://github.com/Lightricks/ComfyUI-LTXVideo/blob/master/example_workflows/13b-distilled/ltxv-13b-dist-i2v-base-fp8.json) |
160+
| ltxv-2b-0.9.8-distilled-fp8 | Quantized version of ltxv-2b-distilled | [ltxv-2b-0.9.8-distilled-fp8.yaml](https://github.com/Lightricks/LTX-Video/blob/main/configs/ltxv-2b-0.9.8-distilled-fp8.yaml) | N/A |
146161
| ltxv-2b-0.9.6 | Good quality, lower VRAM requirement than ltxv-13b | [ltxv-2b-0.9.6-dev.yaml](https://github.com/Lightricks/LTX-Video/blob/main/configs/ltxv-2b-0.9.6-dev.yaml) | [ltxvideo-i2v.json](https://github.com/Lightricks/ComfyUI-LTXVideo/blob/master/example_workflows/low_level/ltxvideo-i2v.json) |
147162
| ltxv-2b-0.9.6-distilled | 15× faster, real-time capable, fewer steps needed, no STG/CFG required | [ltxv-2b-0.9.6-distilled.yaml](https://github.com/Lightricks/LTX-Video/blob/main/configs/ltxv-2b-0.9.6-distilled.yaml) | [ltxvideo-i2v-distilled.json](https://github.com/Lightricks/ComfyUI-LTXVideo/blob/master/example_workflows/low_level/ltxvideo-i2v-distilled.json) |
148163

@@ -170,7 +185,7 @@ cd LTX-Video
170185
# create env
171186
python -m venv env
172187
source env/bin/activate
173-
python -m pip install -e \[inference\]
188+
python -m pip install -e .\[inference\]
174189
```
175190

176191
#### FP8 Kernels (optional)
@@ -186,7 +201,7 @@ To use our model, please follow the inference code in [inference.py](./inference
186201
#### For image-to-video generation:
187202

188203
```bash
189-
python inference.py --prompt "PROMPT" --conditioning_media_paths IMAGE_PATH --conditioning_start_frames 0 --height HEIGHT --width WIDTH --num_frames NUM_FRAMES --seed SEED --pipeline_config configs/ltxv-13b-0.9.7-distilled.yaml
204+
python inference.py --prompt "PROMPT" --conditioning_media_paths IMAGE_PATH --conditioning_start_frames 0 --height HEIGHT --width WIDTH --num_frames NUM_FRAMES --seed SEED --pipeline_config configs/ltxv-13b-0.9.8-distilled.yaml
190205
```
191206

192207
#### Extending a video:
@@ -195,7 +210,7 @@ python inference.py --prompt "PROMPT" --conditioning_media_paths IMAGE_PATH --co
195210

196211

197212
```bash
198-
python inference.py --prompt "PROMPT" --conditioning_media_paths VIDEO_PATH --conditioning_start_frames START_FRAME --height HEIGHT --width WIDTH --num_frames NUM_FRAMES --seed SEED --pipeline_config configs/ltxv-13b-0.9.7-distilled.yaml
213+
python inference.py --prompt "PROMPT" --conditioning_media_paths VIDEO_PATH --conditioning_start_frames START_FRAME --height HEIGHT --width WIDTH --num_frames NUM_FRAMES --seed SEED --pipeline_config configs/ltxv-13b-0.9.8-distilled.yaml
199214
```
200215

201216
#### For video generation with multiple conditions:
@@ -204,7 +219,7 @@ You can now generate a video conditioned on a set of images and/or short video s
204219
Simply provide a list of paths to the images or video segments you want to condition on, along with their target frame numbers in the generated video. You can also specify the conditioning strength for each item (default: 1.0).
205220

206221
```bash
207-
python inference.py --prompt "PROMPT" --conditioning_media_paths IMAGE_OR_VIDEO_PATH_1 IMAGE_OR_VIDEO_PATH_2 --conditioning_start_frames TARGET_FRAME_1 TARGET_FRAME_2 --height HEIGHT --width WIDTH --num_frames NUM_FRAMES --seed SEED --pipeline_config configs/ltxv-13b-0.9.7-distilled.yaml
222+
python inference.py --prompt "PROMPT" --conditioning_media_paths IMAGE_OR_VIDEO_PATH_1 IMAGE_OR_VIDEO_PATH_2 --conditioning_start_frames TARGET_FRAME_1 TARGET_FRAME_2 --height HEIGHT --width WIDTH --num_frames NUM_FRAMES --seed SEED --pipeline_config configs/ltxv-13b-0.9.8-distilled.yaml
208223
```
209224

210225
### Using as a library
@@ -214,7 +229,7 @@ from ltx_video.inference import infer, InferenceConfig
214229

215230
infer(
216231
InferenceConfig(
217-
pipeline_config="configs/ltxv-13b-0.9.7-distilled.yaml",
232+
pipeline_config="configs/ltxv-13b-0.9.8-distilled.yaml",
218233
prompt=PROMPT,
219234
height=HEIGHT,
220235
width=WIDTH,
Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
pipeline_type: multi-scale
2-
checkpoint_path: "ltxv-13b-0.9.7-dev-fp8.safetensors"
2+
checkpoint_path: "ltxv-13b-0.9.8-dev-fp8.safetensors"
33
downscale_factor: 0.6666666
4-
spatial_upscaler_model_path: "ltxv-spatial-upscaler-0.9.7.safetensors"
4+
spatial_upscaler_model_path: "ltxv-spatial-upscaler-0.9.8.safetensors"
55
stg_mode: "attention_values" # options: "attention_values", "attention_skip", "residual", "transformer_block"
66
decode_timestep: 0.05
77
decode_noise_scale: 0.025
Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
pipeline_type: multi-scale
2-
checkpoint_path: "ltxv-13b-0.9.7-dev.safetensors"
2+
checkpoint_path: "ltxv-13b-0.9.8-dev.safetensors"
33
downscale_factor: 0.6666666
4-
spatial_upscaler_model_path: "ltxv-spatial-upscaler-0.9.7.safetensors"
4+
spatial_upscaler_model_path: "ltxv-spatial-upscaler-0.9.8.safetensors"
55
stg_mode: "attention_values" # options: "attention_values", "attention_skip", "residual", "transformer_block"
66
decode_timestep: 0.05
77
decode_noise_scale: 0.025
Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,29 @@
1+
pipeline_type: multi-scale
2+
checkpoint_path: "ltxv-13b-0.9.8-distilled-fp8.safetensors"
3+
downscale_factor: 0.6666666
4+
spatial_upscaler_model_path: "ltxv-spatial-upscaler-0.9.8.safetensors"
5+
stg_mode: "attention_values" # options: "attention_values", "attention_skip", "residual", "transformer_block"
6+
decode_timestep: 0.05
7+
decode_noise_scale: 0.025
8+
text_encoder_model_name_or_path: "PixArt-alpha/PixArt-XL-2-1024-MS"
9+
precision: "float8_e4m3fn" # options: "float8_e4m3fn", "bfloat16", "mixed_precision"
10+
sampler: "from_checkpoint" # options: "uniform", "linear-quadratic", "from_checkpoint"
11+
prompt_enhancement_words_threshold: 120
12+
prompt_enhancer_image_caption_model_name_or_path: "MiaoshouAI/Florence-2-large-PromptGen-v2.0"
13+
prompt_enhancer_llm_model_name_or_path: "unsloth/Llama-3.2-3B-Instruct"
14+
stochastic_sampling: false
15+
16+
first_pass:
17+
timesteps: [1.0000, 0.9937, 0.9875, 0.9812, 0.9750, 0.9094, 0.7250]
18+
guidance_scale: 1
19+
stg_scale: 0
20+
rescaling_scale: 1
21+
skip_block_list: [42]
22+
23+
second_pass:
24+
timesteps: [0.9094, 0.7250, 0.4219]
25+
guidance_scale: 1
26+
stg_scale: 0
27+
rescaling_scale: 1
28+
skip_block_list: [42]
29+
tone_map_compression_ratio: 0.6
Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,29 @@
1+
pipeline_type: multi-scale
2+
checkpoint_path: "ltxv-13b-0.9.8-distilled.safetensors"
3+
downscale_factor: 0.6666666
4+
spatial_upscaler_model_path: "ltxv-spatial-upscaler-0.9.8.safetensors"
5+
stg_mode: "attention_values" # options: "attention_values", "attention_skip", "residual", "transformer_block"
6+
decode_timestep: 0.05
7+
decode_noise_scale: 0.025
8+
text_encoder_model_name_or_path: "PixArt-alpha/PixArt-XL-2-1024-MS"
9+
precision: "bfloat16"
10+
sampler: "from_checkpoint" # options: "uniform", "linear-quadratic", "from_checkpoint"
11+
prompt_enhancement_words_threshold: 120
12+
prompt_enhancer_image_caption_model_name_or_path: "MiaoshouAI/Florence-2-large-PromptGen-v2.0"
13+
prompt_enhancer_llm_model_name_or_path: "unsloth/Llama-3.2-3B-Instruct"
14+
stochastic_sampling: false
15+
16+
first_pass:
17+
timesteps: [1.0000, 0.9937, 0.9875, 0.9812, 0.9750, 0.9094, 0.7250]
18+
guidance_scale: 1
19+
stg_scale: 0
20+
rescaling_scale: 1
21+
skip_block_list: [42]
22+
23+
second_pass:
24+
timesteps: [0.9094, 0.7250, 0.4219]
25+
guidance_scale: 1
26+
stg_scale: 0
27+
rescaling_scale: 1
28+
skip_block_list: [42]
29+
tone_map_compression_ratio: 0.6
Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
pipeline_type: multi-scale
2-
checkpoint_path: "ltxv-13b-0.9.7-distilled-fp8.safetensors"
2+
checkpoint_path: "ltxv-2b-0.9.8-distilled-fp8.safetensors"
33
downscale_factor: 0.6666666
4-
spatial_upscaler_model_path: "ltxv-spatial-upscaler-0.9.7.safetensors"
4+
spatial_upscaler_model_path: "ltxv-spatial-upscaler-0.9.8.safetensors"
55
stg_mode: "attention_values" # options: "attention_values", "attention_skip", "residual", "transformer_block"
66
decode_timestep: 0.05
77
decode_noise_scale: 0.025
Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
pipeline_type: multi-scale
2-
checkpoint_path: "ltxv-13b-0.9.7-distilled.safetensors"
2+
checkpoint_path: "ltxv-2b-0.9.8-distilled.safetensors"
33
downscale_factor: 0.6666666
4-
spatial_upscaler_model_path: "ltxv-spatial-upscaler-0.9.7.safetensors"
4+
spatial_upscaler_model_path: "ltxv-spatial-upscaler-0.9.8.safetensors"
55
stg_mode: "attention_values" # options: "attention_values", "attention_skip", "residual", "transformer_block"
66
decode_timestep: 0.05
77
decode_noise_scale: 0.025

ltx_video/pipelines/pipeline_ltx_video.py

Lines changed: 45 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -790,6 +790,7 @@ def __call__(
790790
text_encoder_max_tokens: int = 256,
791791
stochastic_sampling: bool = False,
792792
media_items: Optional[torch.Tensor] = None,
793+
tone_map_compression_ratio: float = 0.0,
793794
**kwargs,
794795
) -> Union[ImagePipelineOutput, Tuple]:
795796
"""
@@ -871,6 +872,8 @@ def __call__(
871872
If set to `True`, the sampling is stochastic. If set to `False`, the sampling is deterministic.
872873
media_items ('torch.Tensor', *optional*):
873874
The input media item used for image-to-image / video-to-video.
875+
tone_map_compression_ratio: compression ratio for tone mapping, defaults to 0.0.
876+
If set to 0.0, no tone mapping is applied. If set to 1.0 - full compression is applied.
874877
Examples:
875878
876879
Returns:
@@ -1320,6 +1323,7 @@ def __call__(
13201323
)
13211324
else:
13221325
decode_timestep = None
1326+
latents = self.tone_map_latents(latents, tone_map_compression_ratio)
13231327
image = vae_decode(
13241328
latents,
13251329
self.vae,
@@ -1741,6 +1745,47 @@ def trim_conditioning_sequence(
17411745
num_frames = (num_frames - 1) // scale_factor * scale_factor + 1
17421746
return num_frames
17431747

1748+
@staticmethod
1749+
def tone_map_latents(
1750+
latents: torch.Tensor,
1751+
compression: float,
1752+
) -> torch.Tensor:
1753+
"""
1754+
Applies a non-linear tone-mapping function to latent values to reduce their dynamic range
1755+
in a perceptually smooth way using a sigmoid-based compression.
1756+
1757+
This is useful for regularizing high-variance latents or for conditioning outputs
1758+
during generation, especially when controlling dynamic behavior with a `compression` factor.
1759+
1760+
Parameters:
1761+
----------
1762+
latents : torch.Tensor
1763+
Input latent tensor with arbitrary shape. Expected to be roughly in [-1, 1] or [0, 1] range.
1764+
compression : float
1765+
Compression strength in the range [0, 1].
1766+
- 0.0: No tone-mapping (identity transform)
1767+
- 1.0: Full compression effect
1768+
1769+
Returns:
1770+
-------
1771+
torch.Tensor
1772+
The tone-mapped latent tensor of the same shape as input.
1773+
"""
1774+
if not (0 <= compression <= 1):
1775+
raise ValueError("Compression must be in the range [0, 1]")
1776+
1777+
# Remap [0-1] to [0-0.75] and apply sigmoid compression in one shot
1778+
scale_factor = compression * 0.75
1779+
abs_latents = torch.abs(latents)
1780+
1781+
# Sigmoid compression: sigmoid shifts large values toward 0.2, small values stay ~1.0
1782+
# When scale_factor=0, sigmoid term vanishes, when scale_factor=0.75, full effect
1783+
sigmoid_term = torch.sigmoid(4.0 * scale_factor * (abs_latents - 1.0))
1784+
scales = 1.0 - 0.8 * scale_factor * sigmoid_term
1785+
1786+
filtered = latents * scales
1787+
return filtered
1788+
17441789

17451790
def adain_filter_latent(
17461791
latents: torch.Tensor, reference_latents: torch.Tensor, factor=1.0

0 commit comments

Comments
 (0)