Skip to content
Merged
Show file tree
Hide file tree
Changes from 11 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
133 changes: 116 additions & 17 deletions docs/source/en/api/pipelines/flux.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,22 +18,22 @@ Original model checkpoints for Flux can be found [here](https://huggingface.co/b

<Tip>

Flux can be quite expensive to run on consumer hardware devices. However, you can perform a suite of optimizations to run it faster and in a more memory-friendly manner. Check out [this section](https://huggingface.co/blog/sd3#memory-optimizations-for-sd3) for more details. Additionally, Flux can benefit from quantization for memory efficiency with a trade-off in inference latency. Refer to [this blog post](https://huggingface.co/blog/quanto-diffusers) to learn more. For an exhaustive list of resources, check out [this gist](https://gist.github.com/sayakpaul/b664605caf0aa3bf8585ab109dd5ac9c).
Flux can be quite expensive to run on consumer hardware devices. However, you can perform a suite of optimizations to run it faster and in a more memory-friendly manner. Check out [this section](https://huggingface.co/blog/sd3#memory-optimizations-for-sd3) for more details. Additionally, Flux can benefit from quantization for memory efficiency with a trade-off in inference latency. Refer to [this blog post](https://huggingface.co/blog/quanto-diffusers) to learn more. For an exhaustive list of resources, check out [this gist](https://gist.github.com/sayakpaul/b664605caf0aa3bf8585ab109dd5ac9c).

</Tip>

Flux comes in two variants:

* Timestep-distilled (`black-forest-labs/FLUX.1-schnell`)
* Guidance-distilled (`black-forest-labs/FLUX.1-dev`)
- Timestep-distilled (`black-forest-labs/FLUX.1-schnell`)
- Guidance-distilled (`black-forest-labs/FLUX.1-dev`)

Both checkpoints have slightly difference usage which we detail below.

### Timestep-distilled

* `max_sequence_length` cannot be more than 256.
* `guidance_scale` needs to be 0.
* As this is a timestep-distilled model, it benefits from fewer sampling steps.
- `max_sequence_length` cannot be more than 256.
- `guidance_scale` needs to be 0.
- As this is a timestep-distilled model, it benefits from fewer sampling steps.

```python
import torch
Expand All @@ -56,8 +56,8 @@ out.save("image.png")

### Guidance-distilled

* The guidance-distilled variant takes about 50 sampling steps for good-quality generation.
* It doesn't have any limitations around the `max_sequence_length`.
- The guidance-distilled variant takes about 50 sampling steps for good-quality generation.
- It doesn't have any limitations around the `max_sequence_length`.

```python
import torch
Expand All @@ -78,9 +78,11 @@ out.save("image.png")
```

## Running FP16 inference

Flux can generate high-quality images with FP16 (i.e. to accelerate inference on Turing/Volta GPUs) but produces different outputs compared to FP32/BF16. The issue is that some activations in the text encoders have to be clipped when running in FP16, which affects the overall image. Forcing text encoders to run with FP32 inference thus removes this output difference. See [here](https://github.com/huggingface/diffusers/pull/9097#issuecomment-2272292516) for details.

FP16 inference code:

```python
import torch
from diffusers import FluxPipeline
Expand Down Expand Up @@ -160,18 +162,115 @@ image.save("flux-fp8-dev.png")

## FluxPipeline

[[autodoc]] FluxPipeline
- all
- __call__
[[autodoc]] FluxPipeline - all - **call**

## FluxImg2ImgPipeline

[[autodoc]] FluxImg2ImgPipeline
- all
- __call__
[[autodoc]] FluxImg2ImgPipeline - all - **call**

## FluxInpaintPipeline

[[autodoc]] FluxInpaintPipeline
- all
- __call__
[[autodoc]] FluxInpaintPipeline - all - **call**

## Flux ControlNet Inpaint Pipeline

The Flux ControlNet Inpaint pipeline is designed for controllable image inpainting using the Flux architecture.

### Overview

This pipeline combines the power of Flux's transformer-based architecture with ControlNet conditioning and inpainting capabilities. It allows for guided image generation within specified masked areas of an input image.

### Usage

```python
import torch
from diffusers import FluxControlNetInpaintPipeline
from diffusers.models import FluxControlNetModel
from diffusers.utils import load_image

device = "cuda" if torch.cuda.is_available() else "cpu"

controlnet = FluxControlNetModel.from_pretrained(
"InstantX/FLUX.1-dev-Controlnet-Canny-alpha", torch_dtype=torch.bfloat16
)

pipe = FluxControlNetInpaintPipeline.from_pretrained(
"black-forest-labs/FLUX.1-schnell", controlnet=controlnet, torch_dtype=torch.bfloat16
)

pipe.text_encoder.to(torch.float16)
pipe.controlnet.to(torch.float16)
pipe.to(device)

control_image = load_image(
"https://huggingface.co/InstantX/FLUX.1-dev-Controlnet-Canny-alpha/resolve/main/canny.jpg"
)
init_image = load_image(
"https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpainting_examples/overture-creations-5sI6fQgYIuo.png"
)
mask_image = load_image(
"https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpainting_examples/overture-creations-5sI6fQgYIuo_mask.png"
)

prompt = "A girl holding a sign that says InstantX"
image = pipe(
prompt,
image=init_image,
mask_image=mask_image,
control_image=control_image,
controlnet_conditioning_scale=0.7,
strength=0.7,
num_inference_steps=28,
guidance_scale=3.5,
).images[0]

image.save("flux_controlnet_inpaint.png")
```

## Flux ControlNet Image to Image Pipeline

The Flux ControlNet Img2Img pipeline enables controllable image-to-image translation using the Flux architecture combined with ControlNet conditioning.

### Overview

This pipeline allows for the transformation of input images based on text prompts and ControlNet conditions. It leverages the Flux transformer-based architecture to generate high-quality output images while maintaining control over the generation process.

### Usage

```python
import torch
from diffusers import FluxControlNetImg2ImgPipeline, FluxControlNetModel
from diffusers.utils import load_image

device = "cuda" if torch.cuda.is_available() else "cpu"

controlnet = FluxControlNetModel.from_pretrained(
"InstantX/FLUX.1-dev-Controlnet-Canny-alpha", torch_dtype=torch.bfloat16
)

pipe = FluxControlNetImg2ImgPipeline.from_pretrained(
"black-forest-labs/FLUX.1-schnell", controlnet=controlnet, torch_dtype=torch.float16
)

pipe.text_encoder.to(torch.float16)
pipe.controlnet.to(torch.float16)
pipe.to(device)

control_image = load_image("https://huggingface.co/InstantX/SD3-Controlnet-Canny/resolve/main/canny.jpg")
init_image = load_image(
"https://raw.githubusercontent.com/CompVis/stable-diffusion/main/assets/stable-samples/img2img/sketch-mountains-input.jpg"
)

prompt = "A girl in city, 25 years old, cool, futuristic"
image = pipe(
prompt,
image=init_image,
control_image=control_image,
controlnet_conditioning_scale=0.6,
strength=0.7,
num_inference_steps=2,
guidance_scale=3.5,
).images[0]

image.save("flux_controlnet_img2img.png")
```
4 changes: 4 additions & 0 deletions src/diffusers/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -258,6 +258,8 @@
"CogVideoXPipeline",
"CogVideoXVideoToVideoPipeline",
"CycleDiffusionPipeline",
"FluxControlNetImg2ImgPipeline",
"FluxControlNetInpaintPipeline",
"FluxControlNetPipeline",
"FluxImg2ImgPipeline",
"FluxInpaintPipeline",
Expand Down Expand Up @@ -706,6 +708,8 @@
CogVideoXPipeline,
CogVideoXVideoToVideoPipeline,
CycleDiffusionPipeline,
FluxControlNetImg2ImgPipeline,
FluxControlNetInpaintPipeline,
FluxControlNetPipeline,
FluxImg2ImgPipeline,
FluxInpaintPipeline,
Expand Down
11 changes: 10 additions & 1 deletion src/diffusers/pipelines/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -127,6 +127,8 @@
]
_import_structure["flux"] = [
"FluxControlNetPipeline",
"FluxControlNetImg2ImgPipeline",
"FluxControlNetInpaintPipeline",
"FluxImg2ImgPipeline",
"FluxInpaintPipeline",
"FluxPipeline",
Expand Down Expand Up @@ -501,7 +503,14 @@
VersatileDiffusionTextToImagePipeline,
VQDiffusionPipeline,
)
from .flux import FluxControlNetPipeline, FluxImg2ImgPipeline, FluxInpaintPipeline, FluxPipeline
from .flux import (
FluxControlNetImg2ImgPipeline,
FluxControlNetInpaintPipeline,
FluxControlNetPipeline,
FluxImg2ImgPipeline,
FluxInpaintPipeline,
FluxPipeline,
)
from .hunyuandit import HunyuanDiTPipeline
from .i2vgen_xl import I2VGenXLPipeline
from .kandinsky import (
Expand Down
4 changes: 4 additions & 0 deletions src/diffusers/pipelines/flux/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,8 @@
else:
_import_structure["pipeline_flux"] = ["FluxPipeline"]
_import_structure["pipeline_flux_controlnet"] = ["FluxControlNetPipeline"]
_import_structure["pipeline_flux_controlnet_image_to_image"] = ["FluxControlNetImg2ImgPipeline"]
_import_structure["pipeline_flux_controlnet_inpainting"] = ["FluxControlNetInpaintPipeline"]
_import_structure["pipeline_flux_img2img"] = ["FluxImg2ImgPipeline"]
_import_structure["pipeline_flux_inpaint"] = ["FluxInpaintPipeline"]
if TYPE_CHECKING or DIFFUSERS_SLOW_IMPORT:
Expand All @@ -35,6 +37,8 @@
else:
from .pipeline_flux import FluxPipeline
from .pipeline_flux_controlnet import FluxControlNetPipeline
from .pipeline_flux_controlnet_image_to_image import FluxControlNetImg2ImgPipeline
from .pipeline_flux_controlnet_inpainting import FluxControlNetInpaintPipeline
from .pipeline_flux_img2img import FluxImg2ImgPipeline
from .pipeline_flux_inpaint import FluxInpaintPipeline
else:
Expand Down
Loading
Loading