Skip to content

Commit 0477526

Browse files
authored
Merge branch 'main' into txt_seq_lens
2 parents ac5ac24 + 152f7ca commit 0477526

File tree

71 files changed

+10926
-169
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

71 files changed

+10926
-169
lines changed

docs/source/en/_toctree.yml

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -349,6 +349,8 @@
349349
title: DiTTransformer2DModel
350350
- local: api/models/easyanimate_transformer3d
351351
title: EasyAnimateTransformer3DModel
352+
- local: api/models/flux2_transformer
353+
title: Flux2Transformer2DModel
352354
- local: api/models/flux_transformer
353355
title: FluxTransformer2DModel
354356
- local: api/models/hidream_image_transformer
@@ -525,6 +527,8 @@
525527
title: EasyAnimate
526528
- local: api/pipelines/flux
527529
title: Flux
530+
- local: api/pipelines/flux2
531+
title: Flux2
528532
- local: api/pipelines/control_flux_inpaint
529533
title: FluxControlInpaint
530534
- local: api/pipelines/hidream

docs/source/en/api/loaders/lora.md

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -30,7 +30,8 @@ LoRA is a fast and lightweight training method that inserts and trains a signifi
3030
- [`CogView4LoraLoaderMixin`] provides similar functions for [CogView4](https://huggingface.co/docs/diffusers/main/en/api/pipelines/cogview4).
3131
- [`AmusedLoraLoaderMixin`] is for the [`AmusedPipeline`].
3232
- [`HiDreamImageLoraLoaderMixin`] provides similar functions for [HiDream Image](https://huggingface.co/docs/diffusers/main/en/api/pipelines/hidream)
33-
- [`QwenImageLoraLoaderMixin`] provides similar functions for [Qwen Image](https://huggingface.co/docs/diffusers/main/en/api/pipelines/qwen)
33+
- [`QwenImageLoraLoaderMixin`] provides similar functions for [Qwen Image](https://huggingface.co/docs/diffusers/main/en/api/pipelines/qwen).
34+
- [`Flux2LoraLoaderMixin`] provides similar functions for [Flux2](https://huggingface.co/docs/diffusers/main/en/api/pipelines/flux2).
3435
- [`LoraBaseMixin`] provides a base class with several utility methods to fuse, unfuse, unload, LoRAs and more.
3536

3637
> [!TIP]
@@ -56,6 +57,10 @@ LoRA is a fast and lightweight training method that inserts and trains a signifi
5657

5758
[[autodoc]] loaders.lora_pipeline.FluxLoraLoaderMixin
5859

60+
## Flux2LoraLoaderMixin
61+
62+
[[autodoc]] loaders.lora_pipeline.Flux2LoraLoaderMixin
63+
5964
## CogVideoXLoraLoaderMixin
6065

6166
[[autodoc]] loaders.lora_pipeline.CogVideoXLoraLoaderMixin
Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,19 @@
1+
<!--Copyright 2025 The HuggingFace Team. All rights reserved.
2+
3+
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
4+
the License. You may obtain a copy of the License at
5+
6+
http://www.apache.org/licenses/LICENSE-2.0
7+
8+
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
9+
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
10+
specific language governing permissions and limitations under the License.
11+
-->
12+
13+
# Flux2Transformer2DModel
14+
15+
A Transformer model for image-like data from [Flux2](https://hf.co/black-forest-labs/FLUX.2-dev).
16+
17+
## Flux2Transformer2DModel
18+
19+
[[autodoc]] Flux2Transformer2DModel
Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,33 @@
1+
<!--Copyright 2025 The HuggingFace Team. All rights reserved.
2+
3+
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
4+
the License. You may obtain a copy of the License at
5+
6+
http://www.apache.org/licenses/LICENSE-2.0
7+
8+
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
9+
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
10+
specific language governing permissions and limitations under the License.
11+
-->
12+
13+
# Flux2
14+
15+
<div class="flex flex-wrap space-x-1">
16+
<img alt="LoRA" src="https://img.shields.io/badge/LoRA-d8b4fe?style=flat"/>
17+
<img alt="MPS" src="https://img.shields.io/badge/MPS-000000?style=flat&logo=apple&logoColor=white%22">
18+
</div>
19+
20+
Flux.2 is the recent series of image generation models from Black Forest Labs, preceded by the [Flux.1](./flux.md) series. It is an entirely new model with a new architecture and pre-training done from scratch!
21+
22+
Original model checkpoints for Flux can be found [here](https://huggingface.co/black-forest-labs). Original inference code can be found [here](https://github.com/black-forest-labs/flux2).
23+
24+
> [!TIP]
25+
> Flux2 can be quite expensive to run on consumer hardware devices. However, you can perform a suite of optimizations to run it faster and in a more memory-friendly manner. Check out [this section](https://huggingface.co/blog/sd3#memory-optimizations-for-sd3) for more details. Additionally, Flux can benefit from quantization for memory efficiency with a trade-off in inference latency. Refer to [this blog post](https://huggingface.co/blog/quanto-diffusers) to learn more.
26+
>
27+
> [Caching](../../optimization/cache) may also speed up inference by storing and reusing intermediate outputs.
28+
29+
## Flux2Pipeline
30+
31+
[[autodoc]] Flux2Pipeline
32+
- all
33+
- __call__

docs/source/en/api/pipelines/sana_video.md

Lines changed: 7 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -43,11 +43,13 @@ Note: The recommended dtype mentioned is for the transformer weights. The text e
4343
<hfoptions id="generation pipelines">`
4444
<hfoption id="Text-to-Video">
4545

46-
The example below demonstrates how to use the text-to-video pipeline to generate a video using a text descriptio and a starting frame.
46+
The example below demonstrates how to use the text-to-video pipeline to generate a video using a text description.
4747

4848
```python
49-
model_id =
50-
pipe = SanaVideoPipeline.from_pretrained("Efficient-Large-Model/SANA-Video_2B_480p_diffusers", torch_dtype=torch.bfloat16)
49+
pipe = SanaVideoPipeline.from_pretrained(
50+
"Efficient-Large-Model/SANA-Video_2B_480p_diffusers",
51+
torch_dtype=torch.bfloat16,
52+
)
5153
pipe.text_encoder.to(torch.bfloat16)
5254
pipe.vae.to(torch.float32)
5355
pipe.to("cuda")
@@ -75,12 +77,11 @@ export_to_video(video, "sana_video.mp4", fps=16)
7577
</hfoption>
7678
<hfoption id="Image-to-Video">
7779

78-
The example below demonstrates how to use the image-to-video pipeline to generate a video using a text descriptio and a starting frame.
80+
The example below demonstrates how to use the image-to-video pipeline to generate a video using a text description and a starting frame.
7981

8082
```python
81-
model_id = "Efficient-Large-Model/SANA-Video_2B_480p_diffusers"
8283
pipe = SanaImageToVideoPipeline.from_pretrained(
83-
model_id,
84+
"Efficient-Large-Model/SANA-Video_2B_480p_diffusers",
8485
torch_dtype=torch.bfloat16,
8586
)
8687
pipe.scheduler = FlowMatchEulerDiscreteScheduler.from_config(pipe.scheduler.config, flow_shift=8.0)

docs/source/en/modular_diffusers/guiders.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -159,7 +159,7 @@ Change the [`~ComponentSpec.default_creation_method`] to `from_pretrained` and u
159159
```py
160160
guider_spec = t2i_pipeline.get_component_spec("guider")
161161
guider_spec.default_creation_method="from_pretrained"
162-
guider_spec.repo="YiYiXu/modular-loader-t2i-guider"
162+
guider_spec.pretrained_model_name_or_path="YiYiXu/modular-loader-t2i-guider"
163163
guider_spec.subfolder="pag_guider"
164164
pag_guider = guider_spec.load()
165165
t2i_pipeline.update_components(guider=pag_guider)

docs/source/en/modular_diffusers/modular_pipeline.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -313,14 +313,14 @@ unet_spec
313313
ComponentSpec(
314314
name='unet',
315315
type_hint=<class 'diffusers.models.unets.unet_2d_condition.UNet2DConditionModel'>,
316-
repo='RunDiffusion/Juggernaut-XL-v9',
316+
pretrained_model_name_or_path='RunDiffusion/Juggernaut-XL-v9',
317317
subfolder='unet',
318318
variant='fp16',
319319
default_creation_method='from_pretrained'
320320
)
321321

322322
# modify to load from a different repository
323-
unet_spec.repo = "stabilityai/stable-diffusion-xl-base-1.0"
323+
unet_spec.pretrained_model_name_or_path = "stabilityai/stable-diffusion-xl-base-1.0"
324324

325325
# load component with modified spec
326326
unet = unet_spec.load(torch_dtype=torch.float16)

docs/source/en/optimization/attention_backends.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -139,12 +139,14 @@ Refer to the table below for a complete list of available attention backends and
139139
| `_native_npu` | [PyTorch native](https://docs.pytorch.org/docs/stable/generated/torch.nn.attention.SDPBackend.html#torch.nn.attention.SDPBackend) | NPU-optimized attention |
140140
| `_native_xla` | [PyTorch native](https://docs.pytorch.org/docs/stable/generated/torch.nn.attention.SDPBackend.html#torch.nn.attention.SDPBackend) | XLA-optimized attention |
141141
| `flash` | [FlashAttention](https://github.com/Dao-AILab/flash-attention) | FlashAttention-2 |
142+
| `flash_hub` | [FlashAttention](https://github.com/Dao-AILab/flash-attention) | FlashAttention-2 from kernels |
142143
| `flash_varlen` | [FlashAttention](https://github.com/Dao-AILab/flash-attention) | Variable length FlashAttention |
143144
| `aiter` | [AI Tensor Engine for ROCm](https://github.com/ROCm/aiter) | FlashAttention for AMD ROCm |
144145
| `_flash_3` | [FlashAttention](https://github.com/Dao-AILab/flash-attention) | FlashAttention-3 |
145146
| `_flash_varlen_3` | [FlashAttention](https://github.com/Dao-AILab/flash-attention) | Variable length FlashAttention-3 |
146147
| `_flash_3_hub` | [FlashAttention](https://github.com/Dao-AILab/flash-attention) | FlashAttention-3 from kernels |
147148
| `sage` | [SageAttention](https://github.com/thu-ml/SageAttention) | Quantized attention (INT8 QK) |
149+
| `sage_hub` | [SageAttention](https://github.com/thu-ml/SageAttention) | Quantized attention (INT8 QK) from kernels |
148150
| `sage_varlen` | [SageAttention](https://github.com/thu-ml/SageAttention) | Variable length SageAttention |
149151
| `_sage_qk_int8_pv_fp8_cuda` | [SageAttention](https://github.com/thu-ml/SageAttention) | INT8 QK + FP8 PV (CUDA) |
150152
| `_sage_qk_int8_pv_fp8_cuda_sm90` | [SageAttention](https://github.com/thu-ml/SageAttention) | INT8 QK + FP8 PV (SM90) |

docs/source/pt/_toctree.yml

Lines changed: 8 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,10 @@
11
- sections:
2-
- local: index
3-
title: 🧨 Diffusers
4-
- local: quicktour
5-
title: Tour rápido
6-
- local: installation
7-
title: Instalação
2+
- local: index
3+
title: Diffusers
4+
- local: installation
5+
title: Instalação
6+
- local: quicktour
7+
title: Tour rápido
8+
- local: stable_diffusion
9+
title: Desempenho básico
810
title: Primeiros passos

docs/source/pt/index.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -18,11 +18,11 @@ specific language governing permissions and limitations under the License.
1818

1919
# Diffusers
2020

21-
🤗 Diffusers é uma biblioteca de modelos de difusão de última geração para geração de imagens, áudio e até mesmo estruturas 3D de moléculas. Se você está procurando uma solução de geração simples ou queira treinar seu próprio modelo de difusão, 🤗 Diffusers é uma modular caixa de ferramentas que suporta ambos. Nossa biblioteca é desenhada com foco em [usabilidade em vez de desempenho](conceptual/philosophy#usability-over-performance), [simples em vez de fácil](conceptual/philosophy#simple-over-easy) e [customizável em vez de abstrações](conceptual/philosophy#tweakable-contributorfriendly-over-abstraction).
21+
🤗 Diffusers é uma biblioteca de modelos de difusão de última geração para geração de imagens, áudio e até mesmo estruturas 3D de moléculas. Se você está procurando uma solução de geração simples ou quer treinar seu próprio modelo de difusão, 🤗 Diffusers é uma caixa de ferramentas modular que suporta ambos. Nossa biblioteca é desenhada com foco em [usabilidade em vez de desempenho](conceptual/philosophy#usability-over-performance), [simples em vez de fácil](conceptual/philosophy#simple-over-easy) e [customizável em vez de abstrações](conceptual/philosophy#tweakable-contributorfriendly-over-abstraction).
2222

2323
A Biblioteca tem três componentes principais:
2424

25-
- Pipelines de última geração para a geração em poucas linhas de código. Têm muitos pipelines no 🤗 Diffusers, veja a tabela no pipeline [Visão geral](api/pipelines/overview) para uma lista completa de pipelines disponíveis e as tarefas que eles resolvem.
25+
- Pipelines de última geração para a geração em poucas linhas de código. muitos pipelines no 🤗 Diffusers, veja a tabela no pipeline [Visão geral](api/pipelines/overview) para uma lista completa de pipelines disponíveis e as tarefas que eles resolvem.
2626
- Intercambiáveis [agendadores de ruído](api/schedulers/overview) para balancear as compensações entre velocidade e qualidade de geração.
2727
- [Modelos](api/models) pré-treinados que podem ser usados como se fossem blocos de construção, e combinados com agendadores, para criar seu próprio sistema de difusão de ponta a ponta.
2828

0 commit comments

Comments
 (0)