Skip to content

Commit ca58ce3

Browse files
authored
Merge branch 'main' into guidance-warn
2 parents 342c1f9 + 673d435 commit ca58ce3

File tree

124 files changed

+10135
-658
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

124 files changed

+10135
-658
lines changed

docs/source/en/_toctree.yml

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@
1717
- local: tutorials/autopipeline
1818
title: AutoPipeline
1919
- local: using-diffusers/custom_pipeline_overview
20-
title: Load community pipelines and components
20+
title: Community pipelines and components
2121
- local: using-diffusers/callback
2222
title: Pipeline callbacks
2323
- local: using-diffusers/reusing_seeds
@@ -340,6 +340,8 @@
340340
title: AllegroTransformer3DModel
341341
- local: api/models/aura_flow_transformer2d
342342
title: AuraFlowTransformer2DModel
343+
- local: api/models/bria_transformer
344+
title: BriaTransformer2DModel
343345
- local: api/models/chroma_transformer
344346
title: ChromaTransformer2DModel
345347
- local: api/models/cogvideox_transformer3d
@@ -468,6 +470,8 @@
468470
title: AutoPipeline
469471
- local: api/pipelines/blip_diffusion
470472
title: BLIP-Diffusion
473+
- local: api/pipelines/bria_3_2
474+
title: Bria 3.2
471475
- local: api/pipelines/chroma
472476
title: Chroma
473477
- local: api/pipelines/cogvideox
Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,19 @@
1+
<!--Copyright 2025 The HuggingFace Team. All rights reserved.
2+
3+
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
4+
the License. You may obtain a copy of the License at
5+
6+
http://www.apache.org/licenses/LICENSE-2.0
7+
8+
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
9+
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
10+
specific language governing permissions and limitations under the License.
11+
-->
12+
13+
# BriaTransformer2DModel
14+
15+
A modified flux Transformer model from [Bria](https://huggingface.co/briaai/BRIA-3.2)
16+
17+
## BriaTransformer2DModel
18+
19+
[[autodoc]] BriaTransformer2DModel
Lines changed: 44 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,44 @@
1+
<!--Copyright 2025 The HuggingFace Team. All rights reserved.
2+
3+
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
4+
the License. You may obtain a copy of the License at
5+
6+
http://www.apache.org/licenses/LICENSE-2.0
7+
8+
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
9+
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
10+
specific language governing permissions and limitations under the License.
11+
-->
12+
13+
# Bria 3.2
14+
15+
Bria 3.2 is the next-generation commercial-ready text-to-image model. With just 4 billion parameters, it provides exceptional aesthetics and text rendering, evaluated to provide on par results to leading open-source models, and outperforming other licensed models.
16+
In addition to being built entirely on licensed data, 3.2 provides several advantages for enterprise and commercial use:
17+
18+
- Efficient Compute - the model is X3 smaller than the equivalent models in the market (4B parameters vs 12B parameters other open source models)
19+
- Architecture Consistency: Same architecture as 3.1—ideal for users looking to upgrade without disruption.
20+
- Fine-tuning Speedup: 2x faster fine-tuning on L40S and A100.
21+
22+
Original model checkpoints for Bria 3.2 can be found [here](https://huggingface.co/briaai/BRIA-3.2).
23+
Github repo for Bria 3.2 can be found [here](https://github.com/Bria-AI/BRIA-3.2).
24+
25+
If you want to learn more about the Bria platform, and get free traril access, please visit [bria.ai](https://bria.ai).
26+
27+
28+
## Usage
29+
30+
_As the model is gated, before using it with diffusers you first need to go to the [Bria 3.2 Hugging Face page](https://huggingface.co/briaai/BRIA-3.2), fill in the form and accept the gate. Once you are in, you need to login so that your system knows you’ve accepted the gate._
31+
32+
Use the command below to log in:
33+
34+
```bash
35+
hf auth login
36+
```
37+
38+
39+
## BriaPipeline
40+
41+
[[autodoc]] BriaPipeline
42+
- all
43+
- __call__
44+

docs/source/en/api/pipelines/flux.md

Lines changed: 73 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -316,6 +316,67 @@ if integrity_checker.test_image(image_):
316316
raise ValueError("Your image has been flagged. Choose another prompt/image or try again.")
317317
```
318318

319+
### Kontext Inpainting
320+
`FluxKontextInpaintPipeline` enables image modification within a fixed mask region. It currently supports both text-based conditioning and image-reference conditioning.
321+
<hfoptions id="kontext-inpaint">
322+
<hfoption id="text-only">
323+
324+
325+
```python
326+
import torch
327+
from diffusers import FluxKontextInpaintPipeline
328+
from diffusers.utils import load_image
329+
330+
prompt = "Change the yellow dinosaur to green one"
331+
img_url = (
332+
"https://github.com/ZenAI-Vietnam/Flux-Kontext-pipelines/blob/main/assets/dinosaur_input.jpeg?raw=true"
333+
)
334+
mask_url = (
335+
"https://github.com/ZenAI-Vietnam/Flux-Kontext-pipelines/blob/main/assets/dinosaur_mask.png?raw=true"
336+
)
337+
338+
source = load_image(img_url)
339+
mask = load_image(mask_url)
340+
341+
pipe = FluxKontextInpaintPipeline.from_pretrained(
342+
"black-forest-labs/FLUX.1-Kontext-dev", torch_dtype=torch.bfloat16
343+
)
344+
pipe.to("cuda")
345+
346+
image = pipe(prompt=prompt, image=source, mask_image=mask, strength=1.0).images[0]
347+
image.save("kontext_inpainting_normal.png")
348+
```
349+
</hfoption>
350+
<hfoption id="image conditioning">
351+
352+
```python
353+
import torch
354+
from diffusers import FluxKontextInpaintPipeline
355+
from diffusers.utils import load_image
356+
357+
pipe = FluxKontextInpaintPipeline.from_pretrained(
358+
"black-forest-labs/FLUX.1-Kontext-dev", torch_dtype=torch.bfloat16
359+
)
360+
pipe.to("cuda")
361+
362+
prompt = "Replace this ball"
363+
img_url = "https://images.pexels.com/photos/39362/the-ball-stadion-football-the-pitch-39362.jpeg?auto=compress&cs=tinysrgb&dpr=1&w=500"
364+
mask_url = "https://github.com/ZenAI-Vietnam/Flux-Kontext-pipelines/blob/main/assets/ball_mask.png?raw=true"
365+
image_reference_url = "https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcTah3x6OL_ECMBaZ5ZlJJhNsyC-OSMLWAI-xw&s"
366+
367+
source = load_image(img_url)
368+
mask = load_image(mask_url)
369+
image_reference = load_image(image_reference_url)
370+
371+
mask = pipe.mask_processor.blur(mask, blur_factor=12)
372+
image = pipe(
373+
prompt=prompt, image=source, mask_image=mask, image_reference=image_reference, strength=1.0
374+
).images[0]
375+
image.save("kontext_inpainting_ref.png")
376+
```
377+
</hfoption>
378+
</hfoptions>
379+
319380
## Combining Flux Turbo LoRAs with Flux Control, Fill, and Redux
320381

321382
We can combine Flux Turbo LoRAs with Flux Control and other pipelines like Fill and Redux to enable few-steps' inference. The example below shows how to do that for Flux Control LoRA for depth and turbo LoRA from [`ByteDance/Hyper-SD`](https://hf.co/ByteDance/Hyper-SD).
@@ -646,3 +707,15 @@ image.save("flux-fp8-dev.png")
646707
[[autodoc]] FluxFillPipeline
647708
- all
648709
- __call__
710+
711+
## FluxKontextPipeline
712+
713+
[[autodoc]] FluxKontextPipeline
714+
- all
715+
- __call__
716+
717+
## FluxKontextInpaintPipeline
718+
719+
[[autodoc]] FluxKontextInpaintPipeline
720+
- all
721+
- __call__

docs/source/en/api/pipelines/overview.md

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -37,6 +37,7 @@ The table below lists all the pipelines currently available in 🤗 Diffusers an
3737
| [AudioLDM2](audioldm2) | text2audio |
3838
| [AuraFlow](auraflow) | text2image |
3939
| [BLIP Diffusion](blip_diffusion) | text2image |
40+
| [Bria 3.2](bria_3_2) | text2image |
4041
| [CogVideoX](cogvideox) | text2video |
4142
| [Consistency Models](consistency_models) | unconditional image generation |
4243
| [ControlNet](controlnet) | text2image, image2image, inpainting |
@@ -112,3 +113,17 @@ The table below lists all the pipelines currently available in 🤗 Diffusers an
112113
## PushToHubMixin
113114

114115
[[autodoc]] utils.PushToHubMixin
116+
117+
## Callbacks
118+
119+
[[autodoc]] callbacks.PipelineCallback
120+
121+
[[autodoc]] callbacks.SDCFGCutoffCallback
122+
123+
[[autodoc]] callbacks.SDXLCFGCutoffCallback
124+
125+
[[autodoc]] callbacks.SDXLControlnetCFGCutoffCallback
126+
127+
[[autodoc]] callbacks.IPAdapterScaleCutoffCallback
128+
129+
[[autodoc]] callbacks.SD3CFGCutoffCallback

docs/source/en/api/pipelines/qwenimage.md

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,10 @@
1414

1515
# QwenImage
1616

17+
<div class="flex flex-wrap space-x-1">
18+
<img alt="LoRA" src="https://img.shields.io/badge/LoRA-d8b4fe?style=flat"/>
19+
</div>
20+
1721
Qwen-Image from the Qwen team is an image generation foundation model in the Qwen series that achieves significant advances in complex text rendering and precise image editing. Experiments show strong general capabilities in both image generation and editing, with exceptional performance in text rendering, especially for Chinese.
1822

1923
Qwen-Image comes in the following variants:
@@ -86,6 +90,12 @@ image.save("qwen_fewsteps.png")
8690

8791
</details>
8892

93+
<Tip>
94+
95+
The `guidance_scale` parameter in the pipeline is there to support future guidance-distilled models when they come up. Note that passing `guidance_scale` to the pipeline is ineffective. To enable classifier-free guidance, please pass `true_cfg_scale` and `negative_prompt` (even an empty negative prompt like " ") should enable classifier-free guidance computations.
96+
97+
</Tip>
98+
8999
## QwenImagePipeline
90100

91101
[[autodoc]] QwenImagePipeline
@@ -110,6 +120,10 @@ image.save("qwen_fewsteps.png")
110120
- all
111121
- __call__
112122

123+
## QwenImaggeControlNetPipeline
124+
- all
125+
- __call__
126+
113127
## QwenImagePipelineOutput
114128

115129
[[autodoc]] pipelines.qwenimage.pipeline_output.QwenImagePipelineOutput

docs/source/en/api/pipelines/wan.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -333,6 +333,8 @@ The general rule of thumb to keep in mind when preparing inputs for the VACE pip
333333

334334
- Wan 2.1 and 2.2 support using [LightX2V LoRAs](https://huggingface.co/Kijai/WanVideo_comfy/tree/main/Lightx2v) to speed up inference. Using them on Wan 2.2 is slightly more involed. Refer to [this code snippet](https://github.com/huggingface/diffusers/pull/12040#issuecomment-3144185272) to learn more.
335335

336+
- Wan 2.2 has two denoisers. By default, LoRAs are only loaded into the first denoiser. One can set `load_into_transformer_2=True` to load LoRAs into the second denoiser. Refer to [this](https://github.com/huggingface/diffusers/pull/12074#issue-3292620048) and [this](https://github.com/huggingface/diffusers/pull/12074#issuecomment-3155896144) examples to learn more.
337+
336338
## WanPipeline
337339

338340
[[autodoc]] WanPipeline

docs/source/en/quicktour.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -162,6 +162,9 @@ Take a look at the [Quantization](./quantization/overview) section for more deta
162162

163163
## Optimizations
164164

165+
> [!TIP]
166+
> Optimization is dependent on hardware specs such as memory. Use this [Space](https://huggingface.co/spaces/diffusers/optimized-diffusers-code) to generate code examples that include all of Diffusers' available memory and speed optimization techniques for any model you're using.
167+
165168
Modern diffusion models are very large and have billions of parameters. The iterative denoising process is also computationally intensive and slow. Diffusers provides techniques for reducing memory usage and boosting inference speed. These techniques can be combined with quantization to optimize for both memory usage and inference speed.
166169

167170
### Memory usage

0 commit comments

Comments
 (0)