Skip to content

Commit b6be0ba

Browse files
authored
Merge branch 'main' into custom-gradient-checkpointing-fn
2 parents d0c3aae + 07860f9 commit b6be0ba

File tree

12 files changed

+2664
-19
lines changed

12 files changed

+2664
-19
lines changed

docs/source/en/api/pipelines/flux.md

Lines changed: 47 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -309,6 +309,53 @@ image.save("output.png")
309309

310310
When unloading the Control LoRA weights, call `pipe.unload_lora_weights(reset_to_overwritten_params=True)` to reset the `pipe.transformer` completely back to its original form. The resultant pipeline can then be used with methods like [`DiffusionPipeline.from_pipe`]. More details about this argument are available in [this PR](https://github.com/huggingface/diffusers/pull/10397).
311311

312+
## IP-Adapter
313+
314+
<Tip>
315+
316+
Check out [IP-Adapter](../../../using-diffusers/ip_adapter) to learn more about how IP-Adapters work.
317+
318+
</Tip>
319+
320+
An IP-Adapter lets you prompt Flux with images, in addition to the text prompt. This is especially useful when describing complex concepts that are difficult to articulate through text alone and you have reference images.
321+
322+
```python
323+
import torch
324+
from diffusers import FluxPipeline
325+
from diffusers.utils import load_image
326+
327+
pipe = FluxPipeline.from_pretrained(
328+
"black-forest-labs/FLUX.1-dev", torch_dtype=torch.bfloat16
329+
).to("cuda")
330+
331+
image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/flux_ip_adapter_input.jpg").resize((1024, 1024))
332+
333+
pipe.load_ip_adapter(
334+
"XLabs-AI/flux-ip-adapter",
335+
weight_name="ip_adapter.safetensors",
336+
image_encoder_pretrained_model_name_or_path="openai/clip-vit-large-patch14"
337+
)
338+
pipe.set_ip_adapter_scale(1.0)
339+
340+
image = pipe(
341+
width=1024,
342+
height=1024,
343+
prompt="wearing sunglasses",
344+
negative_prompt="",
345+
true_cfg=4.0,
346+
generator=torch.Generator().manual_seed(4444),
347+
ip_adapter_image=image,
348+
).images[0]
349+
350+
image.save('flux_ip_adapter_output.jpg')
351+
```
352+
353+
<div class="justify-center">
354+
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/flux_ip_adapter_output.jpg"/>
355+
<figcaption class="mt-2 text-sm text-center text-gray-500">IP-Adapter examples with prompt "wearing sunglasses"</figcaption>
356+
</div>
357+
358+
312359
## Running FP16 inference
313360

314361
Flux can generate high-quality images with FP16 (i.e. to accelerate inference on Turing/Volta GPUs) but produces different outputs compared to FP32/BF16. The issue is that some activations in the text encoders have to be clipped when running in FP16, which affects the overall image. Forcing text encoders to run with FP32 inference thus removes this output difference. See [here](https://github.com/huggingface/diffusers/pull/9097#issuecomment-2272292516) for details.

docs/source/en/installation.md

Lines changed: 34 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -23,32 +23,60 @@ You should install 🤗 Diffusers in a [virtual environment](https://docs.python
2323
If you're unfamiliar with Python virtual environments, take a look at this [guide](https://packaging.python.org/guides/installing-using-pip-and-virtual-environments/).
2424
A virtual environment makes it easier to manage different projects and avoid compatibility issues between dependencies.
2525

26-
Start by creating a virtual environment in your project directory:
26+
Create a virtual environment with Python or [uv](https://docs.astral.sh/uv/) (refer to [Installation](https://docs.astral.sh/uv/getting-started/installation/) for installation instructions), a fast Rust-based Python package and project manager.
27+
28+
<hfoptions id="install">
29+
<hfoption id="uv">
2730

2831
```bash
29-
python -m venv .env
32+
uv venv my-env
33+
source my-env/bin/activate
3034
```
3135

32-
Activate the virtual environment:
36+
</hfoption>
37+
<hfoption id="Python">
3338

3439
```bash
35-
source .env/bin/activate
40+
python -m venv my-env
41+
source my-env/bin/activate
3642
```
3743

38-
You should also install 🤗 Transformers because 🤗 Diffusers relies on its models:
44+
</hfoption>
45+
</hfoptions>
46+
47+
You should also install 🤗 Transformers because 🤗 Diffusers relies on its models.
3948

4049

4150
<frameworkcontent>
4251
<pt>
43-
Note - PyTorch only supports Python 3.8 - 3.11 on Windows.
52+
53+
PyTorch only supports Python 3.8 - 3.11 on Windows. Install Diffusers with uv.
54+
55+
```bash
56+
uv install diffusers["torch"] transformers
57+
```
58+
59+
You can also install Diffusers with pip.
60+
4461
```bash
4562
pip install diffusers["torch"] transformers
4663
```
64+
4765
</pt>
4866
<jax>
67+
68+
Install Diffusers with uv.
69+
70+
```bash
71+
uv pip install diffusers["flax"] transformers
72+
```
73+
74+
You can also install Diffusers with pip.
75+
4976
```bash
5077
pip install diffusers["flax"] transformers
5178
```
79+
5280
</jax>
5381
</frameworkcontent>
5482

docs/source/en/optimization/para_attn.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -29,7 +29,7 @@ However, it is hard to decide when to reuse the cache to ensure quality generate
2929
This achieves a 2x speedup on FLUX.1-dev and HunyuanVideo inference with very good quality.
3030

3131
<figure>
32-
<img src="https://huggingface.co/datasets/chengzeyi/documentation-images/resolve/main/diffusers/para-attn/ada-cache.png" alt="Cache in Diffusion Transformer" />
32+
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/para-attn/ada-cache.png" alt="Cache in Diffusion Transformer" />
3333
<figcaption>How AdaCache works, First Block Cache is a variant of it</figcaption>
3434
</figure>
3535

examples/community/README.md

100755100644
Lines changed: 90 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -77,6 +77,7 @@ Please also check out our [Community Scripts](https://github.com/huggingface/dif
7777
PIXART-α Controlnet pipeline | Implementation of the controlnet model for pixart alpha and its diffusers pipeline | [PIXART-α Controlnet pipeline](#pixart-α-controlnet-pipeline) | - | [Raul Ciotescu](https://github.com/raulc0399/) |
7878
| HunyuanDiT Differential Diffusion Pipeline | Applies [Differential Diffusion](https://github.com/exx8/differential-diffusion) to [HunyuanDiT](https://github.com/huggingface/diffusers/pull/8240). | [HunyuanDiT with Differential Diffusion](#hunyuandit-with-differential-diffusion) | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1v44a5fpzyr4Ffr4v2XBQ7BajzG874N4P?usp=sharing) | [Monjoy Choudhury](https://github.com/MnCSSJ4x) |
7979
| [🪆Matryoshka Diffusion Models](https://huggingface.co/papers/2310.15111) | A diffusion process that denoises inputs at multiple resolutions jointly and uses a NestedUNet architecture where features and parameters for small scale inputs are nested within those of the large scales. See [original codebase](https://github.com/apple/ml-mdm). | [🪆Matryoshka Diffusion Models](#matryoshka-diffusion-models) | [![Hugging Face Space](https://img.shields.io/badge/🤗%20Hugging%20Face-Space-yellow)](https://huggingface.co/spaces/pcuenq/mdm) [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/gist/tolgacangoz/1f54875fc7aeaabcf284ebde64820966/matryoshka_hf.ipynb) | [M. Tolga Cangöz](https://github.com/tolgacangoz) |
80+
| Stable Diffusion XL Attentive Eraser Pipeline |[[AAAI2025 Oral] Attentive Eraser](https://github.com/Anonym0u3/AttentiveEraser) is a novel tuning-free method that enhances object removal capabilities in pre-trained diffusion models.|[Stable Diffusion XL Attentive Eraser Pipeline](#stable-diffusion-xl-attentive-eraser-pipeline)|-|[Wenhao Sun](https://github.com/Anonym0u3) and [Benlei Cui](https://github.com/Benny079)|
8081

8182
To load a custom pipeline you just need to pass the `custom_pipeline` argument to `DiffusionPipeline`, as one of the files in `diffusers/examples/community`. Feel free to send a PR with your own pipelines, we will merge them quickly.
8283

@@ -4585,8 +4586,8 @@ image = pipe(
45854586
```
45864587

45874588
| ![Gradient](https://github.com/user-attachments/assets/e38ce4d5-1ae6-4df0-ab43-adc1b45716b5) | ![Input](https://github.com/user-attachments/assets/9c95679c-e9d7-4f5a-90d6-560203acd6b3) | ![Output](https://github.com/user-attachments/assets/5313ff64-a0c4-418b-8b55-a38f1a5e7532) |
4588-
| ------------------------------------------------------------------------------------------ | --------------------------------------------------------------------------------------- | ---------------------------------------------------------------------------------------- |
4589-
| Gradient | Input | Output |
4589+
| -------------------------------------------------------------------------------------------- | ----------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------ |
4590+
| Gradient | Input | Output |
45904591

45914592
A colab notebook demonstrating all results can be found [here](https://colab.research.google.com/drive/1v44a5fpzyr4Ffr4v2XBQ7BajzG874N4P?usp=sharing). Depth Maps have also been added in the same colab.
45924593

@@ -4634,6 +4635,93 @@ make_image_grid(image, rows=1, cols=len(image))
46344635
# 50+, 100+, and 250+ num_inference_steps are recommended for nesting levels 0, 1, and 2 respectively.
46354636
```
46364637

4638+
### Stable Diffusion XL Attentive Eraser Pipeline
4639+
<img src="https://raw.githubusercontent.com/Anonym0u3/Images/refs/heads/main/fenmian.png" width="600" />
4640+
4641+
**Stable Diffusion XL Attentive Eraser Pipeline** is an advanced object removal pipeline that leverages SDXL for precise content suppression and seamless region completion. This pipeline uses **self-attention redirection guidance** to modify the model’s self-attention mechanism, allowing for effective removal and inpainting across various levels of mask precision, including semantic segmentation masks, bounding boxes, and hand-drawn masks. If you are interested in more detailed information and have any questions, please refer to the [paper](https://arxiv.org/abs/2412.12974) and [official implementation](https://github.com/Anonym0u3/AttentiveEraser).
4642+
4643+
#### Key features
4644+
4645+
- **Tuning-Free**: No additional training is required, making it easy to integrate and use.
4646+
- **Flexible Mask Support**: Works with different types of masks for targeted object removal.
4647+
- **High-Quality Results**: Utilizes the inherent generative power of diffusion models for realistic content completion.
4648+
4649+
#### Usage example
4650+
To use the Stable Diffusion XL Attentive Eraser Pipeline, you can initialize it as follows:
4651+
```py
4652+
import torch
4653+
from diffusers import DDIMScheduler, DiffusionPipeline
4654+
from diffusers.utils import load_image
4655+
import torch.nn.functional as F
4656+
from torchvision.transforms.functional import to_tensor, gaussian_blur
4657+
4658+
dtype = torch.float16
4659+
device = torch.device("cuda") if torch.cuda.is_available() else torch.device("cpu")
4660+
4661+
scheduler = DDIMScheduler(beta_start=0.00085, beta_end=0.012, beta_schedule="scaled_linear", clip_sample=False, set_alpha_to_one=False)
4662+
pipeline = DiffusionPipeline.from_pretrained(
4663+
"stabilityai/stable-diffusion-xl-base-1.0",
4664+
custom_pipeline="pipeline_stable_diffusion_xl_attentive_eraser",
4665+
scheduler=scheduler,
4666+
variant="fp16",
4667+
use_safetensors=True,
4668+
torch_dtype=dtype,
4669+
).to(device)
4670+
4671+
4672+
def preprocess_image(image_path, device):
4673+
image = to_tensor((load_image(image_path)))
4674+
image = image.unsqueeze_(0).float() * 2 - 1 # [0,1] --> [-1,1]
4675+
if image.shape[1] != 3:
4676+
image = image.expand(-1, 3, -1, -1)
4677+
image = F.interpolate(image, (1024, 1024))
4678+
image = image.to(dtype).to(device)
4679+
return image
4680+
4681+
def preprocess_mask(mask_path, device):
4682+
mask = to_tensor((load_image(mask_path, convert_method=lambda img: img.convert('L'))))
4683+
mask = mask.unsqueeze_(0).float() # 0 or 1
4684+
mask = F.interpolate(mask, (1024, 1024))
4685+
mask = gaussian_blur(mask, kernel_size=(77, 77))
4686+
mask[mask < 0.1] = 0
4687+
mask[mask >= 0.1] = 1
4688+
mask = mask.to(dtype).to(device)
4689+
return mask
4690+
4691+
prompt = "" # Set prompt to null
4692+
seed=123
4693+
generator = torch.Generator(device=device).manual_seed(seed)
4694+
source_image_path = "https://raw.githubusercontent.com/Anonym0u3/Images/refs/heads/main/an1024.png"
4695+
mask_path = "https://raw.githubusercontent.com/Anonym0u3/Images/refs/heads/main/an1024_mask.png"
4696+
source_image = preprocess_image(source_image_path, device)
4697+
mask = preprocess_mask(mask_path, device)
4698+
4699+
image = pipeline(
4700+
prompt=prompt,
4701+
image=source_image,
4702+
mask_image=mask,
4703+
height=1024,
4704+
width=1024,
4705+
AAS=True, # enable AAS
4706+
strength=0.8, # inpainting strength
4707+
rm_guidance_scale=9, # removal guidance scale
4708+
ss_steps = 9, # similarity suppression steps
4709+
ss_scale = 0.3, # similarity suppression scale
4710+
AAS_start_step=0, # AAS start step
4711+
AAS_start_layer=34, # AAS start layer
4712+
AAS_end_layer=70, # AAS end layer
4713+
num_inference_steps=50, # number of inference steps # AAS_end_step = int(strength*num_inference_steps)
4714+
generator=generator,
4715+
guidance_scale=1,
4716+
).images[0]
4717+
image.save('./removed_img.png')
4718+
print("Object removal completed")
4719+
```
4720+
4721+
| Source Image | Mask | Output |
4722+
| ---------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------- | --------------------------------------------------------------------------------------------------- |
4723+
| ![Source Image](https://raw.githubusercontent.com/Anonym0u3/Images/refs/heads/main/an1024.png) | ![Mask](https://raw.githubusercontent.com/Anonym0u3/Images/refs/heads/main/an1024_mask.png) | ![Output](https://raw.githubusercontent.com/Anonym0u3/Images/refs/heads/main/AE_step40_layer34.png) |
4724+
46374725
# Perturbed-Attention Guidance
46384726

46394727
[Project](https://ku-cvlab.github.io/Perturbed-Attention-Guidance/) / [arXiv](https://arxiv.org/abs/2403.17377) / [GitHub](https://github.com/KU-CVLAB/Perturbed-Attention-Guidance)

0 commit comments

Comments
 (0)