You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
All checkpoints have different usage which we detail below.
44
45
@@ -273,6 +274,46 @@ images = pipe(
273
274
images[0].save("flux-redux.png")
274
275
```
275
276
277
+
### Kontext
278
+
279
+
Flux Kontext is a model that allows in-context control of the image generation process, allowing for editing, refinement, relighting, style transfer, character customization, and more.
prompt ="Make Pikachu hold a sign that says 'Black Forest Labs is awesome', yarn art style, detailed, vibrant colors"
293
+
image = pipe(
294
+
image=image,
295
+
prompt=prompt,
296
+
guidance_scale=2.5,
297
+
generator=torch.Generator().manual_seed(42),
298
+
).images[0]
299
+
image.save("flux-kontext.png")
300
+
```
301
+
302
+
Flux Kontext comes with an integrity safety checker, which should be run after the image generation step. To run the safety checker, install the official repository from [black-forest-labs/flux](https://github.com/black-forest-labs/flux) and add the following code:
303
+
304
+
```python
305
+
from flux.content_filters import PixtralContentFilter
raiseValueError("Your image has been flagged. Choose another prompt/image or try again.")
315
+
```
316
+
276
317
## Combining Flux Turbo LoRAs with Flux Control, Fill, and Redux
277
318
278
319
We can combine Flux Turbo LoRAs with Flux Control and other pipelines like Fill and Redux to enable few-steps' inference. The example below shows how to do that for Flux Control LoRA for depth and turbo LoRA from [`ByteDance/Hyper-SD`](https://hf.co/ByteDance/Hyper-SD).
Compilation is slow the first time, but once compiled, it is significantly faster. Try to only use the compiled pipeline on the same type of inference operations. Calling the compiled pipeline on a different image size retriggers compilation which is slow and inefficient.
152
152
153
+
### Dynamic shape compilation
154
+
155
+
> [!TIP]
156
+
> Make sure to always use the nightly version of PyTorch for better support.
157
+
158
+
`torch.compile` keeps track of input shapes and conditions, and if these are different, it recompiles the model. For example, if a model is compiled on a 1024x1024 resolution image and used on an image with a different resolution, it triggers recompilation.
159
+
160
+
To avoid recompilation, add `dynamic=True` to try and generate a more dynamic kernel to avoid recompilation when conditions change.
Specifying `use_duck_shape=False` instructs the compiler if it should use the same symbolic variable to represent input sizes that are the same. For more details, check out this [comment](https://github.com/huggingface/diffusers/pull/11327#discussion_r2047659790).
170
+
171
+
Not all models may benefit from dynamic compilation out of the box and may require changes. Refer to this [PR](https://github.com/huggingface/diffusers/pull/11297/) that improved the [`AuraFlowPipeline`] implementation to benefit from dynamic compilation.
172
+
173
+
Feel free to open an issue if dynamic compilation doesn't work as expected for a Diffusers model.
174
+
153
175
### Regional compilation
154
176
155
-
[Regional compilation](https://docs.pytorch.org/tutorials/recipes/regional_compilation.html) reduces the cold start compilation time by only compiling a specific repeated region (or block) of the model instead of the entire model. The compiler reuses the cached and compiled code for the other blocks.
156
177
157
-
[Accelerate](https://huggingface.co/docs/accelerate/index) provides the [compile_regions](https://github.com/huggingface/accelerate/blob/273799c85d849a1954a4f2e65767216eb37fa089/src/accelerate/utils/other.py#L78) method for automatically compiling the repeated blocks of a `nn.Module` sequentially. The rest of the model is compiled separately.
178
+
[Regional compilation](https://docs.pytorch.org/tutorials/recipes/regional_compilation.html) trims cold-start latency by compiling **only the small, frequently-repeated block(s)** of a model, typically a Transformer layer, enabling reuse of compiled artifacts for every subsequent occurrence.
179
+
For many diffusion architectures this delivers the *same* runtime speed-ups as full-graph compilation yet cuts compile time by **8–10 ×**.
180
+
181
+
To make this effortless, [`ModelMixin`] exposes [`ModelMixin.compile_repeated_blocks`] API, a helper that wraps `torch.compile` around any sub-modules you designate as repeatable:
182
+
183
+
```py
184
+
# pip install -U diffusers
185
+
import torch
186
+
from diffusers import StableDiffusionXLPipeline
187
+
188
+
pipe = StableDiffusionXLPipeline.from_pretrained(
189
+
"stabilityai/stable-diffusion-xl-base-1.0",
190
+
torch_dtype=torch.float16,
191
+
).to("cuda")
192
+
193
+
# Compile only the repeated Transformer layers inside the UNet
194
+
pipe.unet.compile_repeated_blocks(fullgraph=True)
195
+
```
196
+
197
+
To enable a new model with regional compilation, add a `_repeated_blocks` attribute to your model class containing the class names (as strings) of the blocks you want compiled:
198
+
199
+
200
+
```py
201
+
classMyUNet(ModelMixin):
202
+
_repeated_blocks = ("Transformer2DModel",) # ← compiled by default
203
+
```
204
+
205
+
For more examples, see the reference [PR](https://github.com/huggingface/diffusers/pull/11705).
206
+
207
+
**Relation to Accelerate compile_regions** There is also a separate API in [accelerate](https://huggingface.co/docs/accelerate/index) - [compile_regions](https://github.com/huggingface/accelerate/blob/273799c85d849a1954a4f2e65767216eb37fa089/src/accelerate/utils/other.py#L78). It takes a fully automatic approach: it walks the module, picks candidate blocks, then compiles the remaining graph separately. That hands-off experience is handy for quick experiments, but it also leaves fewer knobs when you want to fine-tune which blocks are compiled or adjust compilation flags.
`compile_repeated_blocks`, by contrast, is intentionally explicit. You list the repeated blocks once (via `_repeated_blocks`) and the helper compiles exactly those, nothing more. In practice this small dose of control hits a sweet spot for diffusion models: predictable behavior, easy reasoning about cache reuse, and still a one-liner for users.
223
+
170
224
171
225
### Graph breaks
172
226
@@ -241,4 +295,4 @@ An input is projected into three subspaces, represented by the projection matric
0 commit comments