Skip to content

Commit 581f051

Browse files
authored
Merge branch 'main' into feature/group-offload-pinning
2 parents 335dca8 + 8600b4c commit 581f051

File tree

75 files changed

+13163
-343
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

75 files changed

+13163
-343
lines changed

.github/workflows/codeql.yml

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,22 @@
1+
---
2+
name: CodeQL Security Analysis For Github Actions
3+
4+
on:
5+
push:
6+
branches: ["main"]
7+
workflow_dispatch:
8+
# pull_request:
9+
10+
jobs:
11+
codeql:
12+
name: CodeQL Analysis
13+
uses: huggingface/security-workflows/.github/workflows/codeql-reusable.yml@v1
14+
permissions:
15+
security-events: write
16+
packages: read
17+
actions: read
18+
contents: read
19+
with:
20+
languages: '["actions","python"]'
21+
queries: 'security-extended,security-and-quality'
22+
runner: 'ubuntu-latest' #optional if need custom runner

.github/workflows/mirror_community_pipeline.yml

Lines changed: 14 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -24,7 +24,6 @@ jobs:
2424
mirror_community_pipeline:
2525
env:
2626
SLACK_WEBHOOK_URL: ${{ secrets.SLACK_WEBHOOK_URL_COMMUNITY_MIRROR }}
27-
2827
runs-on: ubuntu-22.04
2928
steps:
3029
# Checkout to correct ref
@@ -39,25 +38,29 @@ jobs:
3938
# If ref is 'refs/heads/main' => set 'main'
4039
# Else it must be a tag => set {tag}
4140
- name: Set checkout_ref and path_in_repo
41+
env:
42+
EVENT_NAME: ${{ github.event_name }}
43+
EVENT_INPUT_REF: ${{ github.event.inputs.ref }}
44+
GITHUB_REF: ${{ github.ref }}
4245
run: |
43-
if [ "${{ github.event_name }}" == "workflow_dispatch" ]; then
44-
if [ -z "${{ github.event.inputs.ref }}" ]; then
46+
if [ "$EVENT_NAME" == "workflow_dispatch" ]; then
47+
if [ -z "$EVENT_INPUT_REF" ]; then
4548
echo "Error: Missing ref input"
4649
exit 1
47-
elif [ "${{ github.event.inputs.ref }}" == "main" ]; then
50+
elif [ "$EVENT_INPUT_REF" == "main" ]; then
4851
echo "CHECKOUT_REF=refs/heads/main" >> $GITHUB_ENV
4952
echo "PATH_IN_REPO=main" >> $GITHUB_ENV
5053
else
51-
echo "CHECKOUT_REF=refs/tags/${{ github.event.inputs.ref }}" >> $GITHUB_ENV
52-
echo "PATH_IN_REPO=${{ github.event.inputs.ref }}" >> $GITHUB_ENV
54+
echo "CHECKOUT_REF=refs/tags/$EVENT_INPUT_REF" >> $GITHUB_ENV
55+
echo "PATH_IN_REPO=$EVENT_INPUT_REF" >> $GITHUB_ENV
5356
fi
54-
elif [ "${{ github.ref }}" == "refs/heads/main" ]; then
55-
echo "CHECKOUT_REF=${{ github.ref }}" >> $GITHUB_ENV
57+
elif [ "$GITHUB_REF" == "refs/heads/main" ]; then
58+
echo "CHECKOUT_REF=$GITHUB_REF" >> $GITHUB_ENV
5659
echo "PATH_IN_REPO=main" >> $GITHUB_ENV
5760
else
5861
# e.g. refs/tags/v0.28.1 -> v0.28.1
59-
echo "CHECKOUT_REF=${{ github.ref }}" >> $GITHUB_ENV
60-
echo "PATH_IN_REPO=$(echo ${{ github.ref }} | sed 's/^refs\/tags\///')" >> $GITHUB_ENV
62+
echo "CHECKOUT_REF=$GITHUB_REF" >> $GITHUB_ENV
63+
echo "PATH_IN_REPO=$(echo $GITHUB_REF | sed 's/^refs\/tags\///')" >> $GITHUB_ENV
6164
fi
6265
- name: Print env vars
6366
run: |
@@ -99,4 +102,4 @@ jobs:
99102
- name: Report failure status
100103
if: ${{ failure() }}
101104
run: |
102-
pip install requests && python utils/notify_community_pipelines_mirror.py --status=failure
105+
pip install requests && python utils/notify_community_pipelines_mirror.py --status=failure

docs/source/en/_toctree.yml

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -367,6 +367,8 @@
367367
title: LatteTransformer3DModel
368368
- local: api/models/longcat_image_transformer2d
369369
title: LongCatImageTransformer2DModel
370+
- local: api/models/ltx2_video_transformer3d
371+
title: LTX2VideoTransformer3DModel
370372
- local: api/models/ltx_video_transformer3d
371373
title: LTXVideoTransformer3DModel
372374
- local: api/models/lumina2_transformer2d
@@ -443,6 +445,10 @@
443445
title: AutoencoderKLHunyuanVideo
444446
- local: api/models/autoencoder_kl_hunyuan_video15
445447
title: AutoencoderKLHunyuanVideo15
448+
- local: api/models/autoencoderkl_audio_ltx_2
449+
title: AutoencoderKLLTX2Audio
450+
- local: api/models/autoencoderkl_ltx_2
451+
title: AutoencoderKLLTX2Video
446452
- local: api/models/autoencoderkl_ltx_video
447453
title: AutoencoderKLLTXVideo
448454
- local: api/models/autoencoderkl_magvit
@@ -678,6 +684,8 @@
678684
title: Kandinsky 5.0 Video
679685
- local: api/pipelines/latte
680686
title: Latte
687+
- local: api/pipelines/ltx2
688+
title: LTX-2
681689
- local: api/pipelines/ltx_video
682690
title: LTXVideo
683691
- local: api/pipelines/mochi
Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,29 @@
1+
<!-- Copyright 2025 The HuggingFace Team. All rights reserved.
2+
3+
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
4+
the License. You may obtain a copy of the License at
5+
6+
http://www.apache.org/licenses/LICENSE-2.0
7+
8+
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
9+
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
10+
specific language governing permissions and limitations under the License. -->
11+
12+
# AutoencoderKLLTX2Audio
13+
14+
The 3D variational autoencoder (VAE) model with KL loss used in [LTX-2](https://huggingface.co/Lightricks/LTX-2) was introduced by Lightricks. This is for encoding and decoding audio latent representations.
15+
16+
The model can be loaded with the following code snippet.
17+
18+
```python
19+
from diffusers import AutoencoderKLLTX2Audio
20+
21+
vae = AutoencoderKLLTX2Audio.from_pretrained("Lightricks/LTX-2", subfolder="vae", torch_dtype=torch.float32).to("cuda")
22+
```
23+
24+
## AutoencoderKLLTX2Audio
25+
26+
[[autodoc]] AutoencoderKLLTX2Audio
27+
- encode
28+
- decode
29+
- all
Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,29 @@
1+
<!-- Copyright 2025 The HuggingFace Team. All rights reserved.
2+
3+
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
4+
the License. You may obtain a copy of the License at
5+
6+
http://www.apache.org/licenses/LICENSE-2.0
7+
8+
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
9+
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
10+
specific language governing permissions and limitations under the License. -->
11+
12+
# AutoencoderKLLTX2Video
13+
14+
The 3D variational autoencoder (VAE) model with KL loss used in [LTX-2](https://huggingface.co/Lightricks/LTX-2) was introduced by Lightricks.
15+
16+
The model can be loaded with the following code snippet.
17+
18+
```python
19+
from diffusers import AutoencoderKLLTX2Video
20+
21+
vae = AutoencoderKLLTX2Video.from_pretrained("Lightricks/LTX-2", subfolder="vae", torch_dtype=torch.float32).to("cuda")
22+
```
23+
24+
## AutoencoderKLLTX2Video
25+
26+
[[autodoc]] AutoencoderKLLTX2Video
27+
- decode
28+
- encode
29+
- all
Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,26 @@
1+
<!-- Copyright 2025 The HuggingFace Team. All rights reserved.
2+
3+
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
4+
the License. You may obtain a copy of the License at
5+
6+
http://www.apache.org/licenses/LICENSE-2.0
7+
8+
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
9+
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
10+
specific language governing permissions and limitations under the License. -->
11+
12+
# LTX2VideoTransformer3DModel
13+
14+
A Diffusion Transformer model for 3D data from [LTX](https://huggingface.co/Lightricks/LTX-2) was introduced by Lightricks.
15+
16+
The model can be loaded with the following code snippet.
17+
18+
```python
19+
from diffusers import LTX2VideoTransformer3DModel
20+
21+
transformer = LTX2VideoTransformer3DModel.from_pretrained("Lightricks/LTX-2", subfolder="transformer", torch_dtype=torch.bfloat16).to("cuda")
22+
```
23+
24+
## LTX2VideoTransformer3DModel
25+
26+
[[autodoc]] LTX2VideoTransformer3DModel
Lines changed: 43 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,43 @@
1+
<!-- Copyright 2025 The HuggingFace Team. All rights reserved.
2+
#
3+
# Licensed under the Apache License, Version 2.0 (the "License");
4+
# you may not use this file except in compliance with the License.
5+
# You may obtain a copy of the License at
6+
#
7+
# http://www.apache.org/licenses/LICENSE-2.0
8+
#
9+
# Unless required by applicable law or agreed to in writing, software
10+
# distributed under the License is distributed on an "AS IS" BASIS,
11+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12+
# See the License for the specific language governing permissions and
13+
# limitations under the License. -->
14+
15+
# LTX-2
16+
17+
LTX-2 is a DiT-based audio-video foundation model designed to generate synchronized video and audio within a single model. It brings together the core building blocks of modern video generation, with open weights and a focus on practical, local execution.
18+
19+
You can find all the original LTX-Video checkpoints under the [Lightricks](https://huggingface.co/Lightricks) organization.
20+
21+
The original codebase for LTX-2 can be found [here](https://github.com/Lightricks/LTX-2).
22+
23+
## LTX2Pipeline
24+
25+
[[autodoc]] LTX2Pipeline
26+
- all
27+
- __call__
28+
29+
## LTX2ImageToVideoPipeline
30+
31+
[[autodoc]] LTX2ImageToVideoPipeline
32+
- all
33+
- __call__
34+
35+
## LTX2LatentUpsamplePipeline
36+
37+
[[autodoc]] LTX2LatentUpsamplePipeline
38+
- all
39+
- __call__
40+
41+
## LTX2PipelineOutput
42+
43+
[[autodoc]] pipelines.ltx2.pipeline_output.LTX2PipelineOutput

docs/source/en/api/pipelines/ltx_video.md

Lines changed: 8 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -136,7 +136,7 @@ export_to_video(video, "output.mp4", fps=24)
136136
- The recommended dtype for the transformer, VAE, and text encoder is `torch.bfloat16`. The VAE and text encoder can also be `torch.float32` or `torch.float16`.
137137
- For guidance-distilled variants of LTX-Video, set `guidance_scale` to `1.0`. The `guidance_scale` for any other model should be set higher, like `5.0`, for good generation quality.
138138
- For timestep-aware VAE variants (LTX-Video 0.9.1 and above), set `decode_timestep` to `0.05` and `image_cond_noise_scale` to `0.025`.
139-
- For variants that support interpolation between multiple conditioning images and videos (LTX-Video 0.9.5 and above), use similar images and videos for the best results. Divergence from the conditioning inputs may lead to abrupt transitionts in the generated video.
139+
- For variants that support interpolation between multiple conditioning images and videos (LTX-Video 0.9.5 and above), use similar images and videos for the best results. Divergence from the conditioning inputs may lead to abrupt transitions in the generated video.
140140

141141
- LTX-Video 0.9.7 includes a spatial latent upscaler and a 13B parameter transformer. During inference, a low resolution video is quickly generated first and then upscaled and refined.
142142

@@ -329,7 +329,7 @@ export_to_video(video, "output.mp4", fps=24)
329329

330330
<details>
331331
<summary>Show example code</summary>
332-
332+
333333
```python
334334
import torch
335335
from diffusers import LTXConditionPipeline, LTXLatentUpsamplePipeline
@@ -474,6 +474,12 @@ export_to_video(video, "output.mp4", fps=24)
474474

475475
</details>
476476

477+
## LTXI2VLongMultiPromptPipeline
478+
479+
[[autodoc]] LTXI2VLongMultiPromptPipeline
480+
- all
481+
- __call__
482+
477483
## LTXPipeline
478484

479485
[[autodoc]] LTXPipeline

docs/source/en/api/pipelines/skyreels_v2.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -37,7 +37,8 @@ The following SkyReels-V2 models are supported in Diffusers:
3737
- [SkyReels-V2 I2V 1.3B - 540P](https://huggingface.co/Skywork/SkyReels-V2-I2V-1.3B-540P-Diffusers)
3838
- [SkyReels-V2 I2V 14B - 540P](https://huggingface.co/Skywork/SkyReels-V2-I2V-14B-540P-Diffusers)
3939
- [SkyReels-V2 I2V 14B - 720P](https://huggingface.co/Skywork/SkyReels-V2-I2V-14B-720P-Diffusers)
40-
- [SkyReels-V2 FLF2V 1.3B - 540P](https://huggingface.co/Skywork/SkyReels-V2-FLF2V-1.3B-540P-Diffusers)
40+
41+
This model was contributed by [M. Tolga Cangöz](https://github.com/tolgacangoz).
4142

4243
> [!TIP]
4344
> Click on the SkyReels-V2 models in the right sidebar for more examples of video generation.

docs/source/en/api/pipelines/wan.md

Lines changed: 0 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -250,9 +250,6 @@ The code snippets available in [this](https://github.com/huggingface/diffusers/p
250250

251251
The general rule of thumb to keep in mind when preparing inputs for the VACE pipeline is that the input images, or frames of a video that you want to use for conditioning, should have a corresponding mask that is black in color. The black mask signifies that the model will not generate new content for that area, and only use those parts for conditioning the generation process. For parts/frames that should be generated by the model, the mask should be white in color.
252252

253-
</hfoption>
254-
</hfoptions>
255-
256253
### Wan-Animate: Unified Character Animation and Replacement with Holistic Replication
257254

258255
[Wan-Animate](https://huggingface.co/papers/2509.14055) by the Wan Team.

0 commit comments

Comments
 (0)