Skip to content

Commit fb73873

Browse files
authored
Merge branch 'main' into tests-disable-mps-flax-onnx
2 parents b18f98e + 21543de commit fb73873

File tree

79 files changed

+4195
-210
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

79 files changed

+4195
-210
lines changed

docs/source/en/api/pipelines/flux.md

Lines changed: 41 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -39,6 +39,7 @@ Flux comes in the following variants:
3939
| Canny Control (LoRA) | [`black-forest-labs/FLUX.1-Canny-dev-lora`](https://huggingface.co/black-forest-labs/FLUX.1-Canny-dev-lora) |
4040
| Depth Control (LoRA) | [`black-forest-labs/FLUX.1-Depth-dev-lora`](https://huggingface.co/black-forest-labs/FLUX.1-Depth-dev-lora) |
4141
| Redux (Adapter) | [`black-forest-labs/FLUX.1-Redux-dev`](https://huggingface.co/black-forest-labs/FLUX.1-Redux-dev) |
42+
| Kontext | [`black-forest-labs/FLUX.1-kontext`](https://huggingface.co/black-forest-labs/FLUX.1-Kontext-dev) |
4243

4344
All checkpoints have different usage which we detail below.
4445

@@ -273,6 +274,46 @@ images = pipe(
273274
images[0].save("flux-redux.png")
274275
```
275276

277+
### Kontext
278+
279+
Flux Kontext is a model that allows in-context control of the image generation process, allowing for editing, refinement, relighting, style transfer, character customization, and more.
280+
281+
```python
282+
import torch
283+
from diffusers import FluxKontextPipeline
284+
from diffusers.utils import load_image
285+
286+
pipe = FluxKontextPipeline.from_pretrained(
287+
"black-forest-labs/FLUX.1-Kontext-dev", torch_dtype=torch.bfloat16
288+
)
289+
pipe.to("cuda")
290+
291+
image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/yarn-art-pikachu.png").convert("RGB")
292+
prompt = "Make Pikachu hold a sign that says 'Black Forest Labs is awesome', yarn art style, detailed, vibrant colors"
293+
image = pipe(
294+
image=image,
295+
prompt=prompt,
296+
guidance_scale=2.5,
297+
generator=torch.Generator().manual_seed(42),
298+
).images[0]
299+
image.save("flux-kontext.png")
300+
```
301+
302+
Flux Kontext comes with an integrity safety checker, which should be run after the image generation step. To run the safety checker, install the official repository from [black-forest-labs/flux](https://github.com/black-forest-labs/flux) and add the following code:
303+
304+
```python
305+
from flux.content_filters import PixtralContentFilter
306+
307+
# ... pipeline invocation to generate images
308+
309+
integrity_checker = PixtralContentFilter(torch.device("cuda"))
310+
image_ = np.array(image) / 255.0
311+
image_ = 2 * image_ - 1
312+
image_ = torch.from_numpy(image_).to("cuda", dtype=torch.float32).unsqueeze(0).permute(0, 3, 1, 2)
313+
if integrity_checker.test_image(image_):
314+
raise ValueError("Your image has been flagged. Choose another prompt/image or try again.")
315+
```
316+
276317
## Combining Flux Turbo LoRAs with Flux Control, Fill, and Redux
277318

278319
We can combine Flux Turbo LoRAs with Flux Control and other pipelines like Fill and Redux to enable few-steps' inference. The example below shows how to do that for Flux Control LoRA for depth and turbo LoRA from [`ByteDance/Hyper-SD`](https://hf.co/ByteDance/Hyper-SD).

docs/source/en/optimization/fp16.md

Lines changed: 57 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -150,11 +150,63 @@ pipeline(prompt, num_inference_steps=30).images[0]
150150

151151
Compilation is slow the first time, but once compiled, it is significantly faster. Try to only use the compiled pipeline on the same type of inference operations. Calling the compiled pipeline on a different image size retriggers compilation which is slow and inefficient.
152152

153+
### Dynamic shape compilation
154+
155+
> [!TIP]
156+
> Make sure to always use the nightly version of PyTorch for better support.
157+
158+
`torch.compile` keeps track of input shapes and conditions, and if these are different, it recompiles the model. For example, if a model is compiled on a 1024x1024 resolution image and used on an image with a different resolution, it triggers recompilation.
159+
160+
To avoid recompilation, add `dynamic=True` to try and generate a more dynamic kernel to avoid recompilation when conditions change.
161+
162+
```diff
163+
+ torch.fx.experimental._config.use_duck_shape = False
164+
+ pipeline.unet = torch.compile(
165+
pipeline.unet, fullgraph=True, dynamic=True
166+
)
167+
```
168+
169+
Specifying `use_duck_shape=False` instructs the compiler if it should use the same symbolic variable to represent input sizes that are the same. For more details, check out this [comment](https://github.com/huggingface/diffusers/pull/11327#discussion_r2047659790).
170+
171+
Not all models may benefit from dynamic compilation out of the box and may require changes. Refer to this [PR](https://github.com/huggingface/diffusers/pull/11297/) that improved the [`AuraFlowPipeline`] implementation to benefit from dynamic compilation.
172+
173+
Feel free to open an issue if dynamic compilation doesn't work as expected for a Diffusers model.
174+
153175
### Regional compilation
154176

155-
[Regional compilation](https://docs.pytorch.org/tutorials/recipes/regional_compilation.html) reduces the cold start compilation time by only compiling a specific repeated region (or block) of the model instead of the entire model. The compiler reuses the cached and compiled code for the other blocks.
156177

157-
[Accelerate](https://huggingface.co/docs/accelerate/index) provides the [compile_regions](https://github.com/huggingface/accelerate/blob/273799c85d849a1954a4f2e65767216eb37fa089/src/accelerate/utils/other.py#L78) method for automatically compiling the repeated blocks of a `nn.Module` sequentially. The rest of the model is compiled separately.
178+
[Regional compilation](https://docs.pytorch.org/tutorials/recipes/regional_compilation.html) trims cold-start latency by compiling **only the small, frequently-repeated block(s)** of a model, typically a Transformer layer, enabling reuse of compiled artifacts for every subsequent occurrence.
179+
For many diffusion architectures this delivers the *same* runtime speed-ups as full-graph compilation yet cuts compile time by **8–10 ×**.
180+
181+
To make this effortless, [`ModelMixin`] exposes [`ModelMixin.compile_repeated_blocks`] API, a helper that wraps `torch.compile` around any sub-modules you designate as repeatable:
182+
183+
```py
184+
# pip install -U diffusers
185+
import torch
186+
from diffusers import StableDiffusionXLPipeline
187+
188+
pipe = StableDiffusionXLPipeline.from_pretrained(
189+
"stabilityai/stable-diffusion-xl-base-1.0",
190+
torch_dtype=torch.float16,
191+
).to("cuda")
192+
193+
# Compile only the repeated Transformer layers inside the UNet
194+
pipe.unet.compile_repeated_blocks(fullgraph=True)
195+
```
196+
197+
To enable a new model with regional compilation, add a `_repeated_blocks` attribute to your model class containing the class names (as strings) of the blocks you want compiled:
198+
199+
200+
```py
201+
class MyUNet(ModelMixin):
202+
_repeated_blocks = ("Transformer2DModel",) # ← compiled by default
203+
```
204+
205+
For more examples, see the reference [PR](https://github.com/huggingface/diffusers/pull/11705).
206+
207+
**Relation to Accelerate compile_regions** There is also a separate API in [accelerate](https://huggingface.co/docs/accelerate/index) - [compile_regions](https://github.com/huggingface/accelerate/blob/273799c85d849a1954a4f2e65767216eb37fa089/src/accelerate/utils/other.py#L78). It takes a fully automatic approach: it walks the module, picks candidate blocks, then compiles the remaining graph separately. That hands-off experience is handy for quick experiments, but it also leaves fewer knobs when you want to fine-tune which blocks are compiled or adjust compilation flags.
208+
209+
158210

159211
```py
160212
# pip install -U accelerate
@@ -167,6 +219,8 @@ pipeline = StableDiffusionXLPipeline.from_pretrained(
167219
).to("cuda")
168220
pipeline.unet = compile_regions(pipeline.unet, mode="reduce-overhead", fullgraph=True)
169221
```
222+
`compile_repeated_blocks`, by contrast, is intentionally explicit. You list the repeated blocks once (via `_repeated_blocks`) and the helper compiles exactly those, nothing more. In practice this small dose of control hits a sweet spot for diffusion models: predictable behavior, easy reasoning about cache reuse, and still a one-liner for users.
223+
170224

171225
### Graph breaks
172226

@@ -241,4 +295,4 @@ An input is projected into three subspaces, represented by the projection matric
241295

242296
```py
243297
pipeline.fuse_qkv_projections()
244-
```
298+
```

examples/advanced_diffusion_training/train_dreambooth_lora_flux_advanced.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -75,7 +75,7 @@
7575
import wandb
7676

7777
# Will error if the minimal version of diffusers is not installed. Remove at your own risks.
78-
check_min_version("0.34.0.dev0")
78+
check_min_version("0.35.0.dev0")
7979

8080
logger = get_logger(__name__)
8181

examples/advanced_diffusion_training/train_dreambooth_lora_sd15_advanced.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -73,7 +73,7 @@
7373

7474

7575
# Will error if the minimal version of diffusers is not installed. Remove at your own risks.
76-
check_min_version("0.34.0.dev0")
76+
check_min_version("0.35.0.dev0")
7777

7878
logger = get_logger(__name__)
7979

examples/advanced_diffusion_training/train_dreambooth_lora_sdxl_advanced.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -80,7 +80,7 @@
8080
import wandb
8181

8282
# Will error if the minimal version of diffusers is not installed. Remove at your own risks.
83-
check_min_version("0.34.0.dev0")
83+
check_min_version("0.35.0.dev0")
8484

8585
logger = get_logger(__name__)
8686

examples/cogvideo/train_cogvideox_image_to_video_lora.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -61,7 +61,7 @@
6161
import wandb
6262

6363
# Will error if the minimal version of diffusers is not installed. Remove at your own risks.
64-
check_min_version("0.34.0.dev0")
64+
check_min_version("0.35.0.dev0")
6565

6666
logger = get_logger(__name__)
6767

examples/cogvideo/train_cogvideox_lora.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -52,7 +52,7 @@
5252
import wandb
5353

5454
# Will error if the minimal version of diffusers is not installed. Remove at your own risks.
55-
check_min_version("0.34.0.dev0")
55+
check_min_version("0.35.0.dev0")
5656

5757
logger = get_logger(__name__)
5858

examples/cogview4-control/train_control_cogview4.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -59,7 +59,7 @@
5959
import wandb
6060

6161
# Will error if the minimal version of diffusers is not installed. Remove at your own risks.
62-
check_min_version("0.34.0.dev0")
62+
check_min_version("0.35.0.dev0")
6363

6464
logger = get_logger(__name__)
6565

examples/community/marigold_depth_estimation.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -43,7 +43,7 @@
4343

4444

4545
# Will error if the minimal version of diffusers is not installed. Remove at your own risks.
46-
check_min_version("0.34.0.dev0")
46+
check_min_version("0.35.0.dev0")
4747

4848

4949
class MarigoldDepthOutput(BaseOutput):

examples/consistency_distillation/train_lcm_distill_lora_sd_wds.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -73,7 +73,7 @@
7373
import wandb
7474

7575
# Will error if the minimal version of diffusers is not installed. Remove at your own risks.
76-
check_min_version("0.34.0.dev0")
76+
check_min_version("0.35.0.dev0")
7777

7878
logger = get_logger(__name__)
7979

0 commit comments

Comments
 (0)