Skip to content

Commit 16c2397

Browse files
use main example
1 parent 804f5cc commit 16c2397

File tree

6 files changed

+100
-136
lines changed

6 files changed

+100
-136
lines changed

examples/community/README.md

100644100755
Lines changed: 2 additions & 90 deletions
Original file line numberDiff line numberDiff line change
@@ -77,7 +77,6 @@ Please also check out our [Community Scripts](https://github.com/huggingface/dif
7777
PIXART-α Controlnet pipeline | Implementation of the controlnet model for pixart alpha and its diffusers pipeline | [PIXART-α Controlnet pipeline](#pixart-α-controlnet-pipeline) | - | [Raul Ciotescu](https://github.com/raulc0399/) |
7878
| HunyuanDiT Differential Diffusion Pipeline | Applies [Differential Diffusion](https://github.com/exx8/differential-diffusion) to [HunyuanDiT](https://github.com/huggingface/diffusers/pull/8240). | [HunyuanDiT with Differential Diffusion](#hunyuandit-with-differential-diffusion) | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1v44a5fpzyr4Ffr4v2XBQ7BajzG874N4P?usp=sharing) | [Monjoy Choudhury](https://github.com/MnCSSJ4x) |
7979
| [🪆Matryoshka Diffusion Models](https://huggingface.co/papers/2310.15111) | A diffusion process that denoises inputs at multiple resolutions jointly and uses a NestedUNet architecture where features and parameters for small scale inputs are nested within those of the large scales. See [original codebase](https://github.com/apple/ml-mdm). | [🪆Matryoshka Diffusion Models](#matryoshka-diffusion-models) | [![Hugging Face Space](https://img.shields.io/badge/🤗%20Hugging%20Face-Space-yellow)](https://huggingface.co/spaces/pcuenq/mdm) [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/gist/tolgacangoz/1f54875fc7aeaabcf284ebde64820966/matryoshka_hf.ipynb) | [M. Tolga Cangöz](https://github.com/tolgacangoz) |
80-
| Stable Diffusion XL Attentive Eraser Pipeline |[[AAAI2025 Oral] Attentive Eraser](https://github.com/Anonym0u3/AttentiveEraser) is a novel tuning-free method that enhances object removal capabilities in pre-trained diffusion models.|[Stable Diffusion XL Attentive Eraser Pipeline](#stable-diffusion-xl-attentive-eraser-pipeline)|-|[Wenhao Sun](https://github.com/Anonym0u3) and [Benlei Cui](https://github.com/Benny079)|
8180

8281
To load a custom pipeline you just need to pass the `custom_pipeline` argument to `DiffusionPipeline`, as one of the files in `diffusers/examples/community`. Feel free to send a PR with your own pipelines, we will merge them quickly.
8382

@@ -4586,8 +4585,8 @@ image = pipe(
45864585
```
45874586

45884587
| ![Gradient](https://github.com/user-attachments/assets/e38ce4d5-1ae6-4df0-ab43-adc1b45716b5) | ![Input](https://github.com/user-attachments/assets/9c95679c-e9d7-4f5a-90d6-560203acd6b3) | ![Output](https://github.com/user-attachments/assets/5313ff64-a0c4-418b-8b55-a38f1a5e7532) |
4589-
| -------------------------------------------------------------------------------------------- | ----------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------ |
4590-
| Gradient | Input | Output |
4588+
| ------------------------------------------------------------------------------------------ | --------------------------------------------------------------------------------------- | ---------------------------------------------------------------------------------------- |
4589+
| Gradient | Input | Output |
45914590

45924591
A colab notebook demonstrating all results can be found [here](https://colab.research.google.com/drive/1v44a5fpzyr4Ffr4v2XBQ7BajzG874N4P?usp=sharing). Depth Maps have also been added in the same colab.
45934592

@@ -4635,93 +4634,6 @@ make_image_grid(image, rows=1, cols=len(image))
46354634
# 50+, 100+, and 250+ num_inference_steps are recommended for nesting levels 0, 1, and 2 respectively.
46364635
```
46374636

4638-
### Stable Diffusion XL Attentive Eraser Pipeline
4639-
<img src="https://raw.githubusercontent.com/Anonym0u3/Images/refs/heads/main/fenmian.png" width="600" />
4640-
4641-
**Stable Diffusion XL Attentive Eraser Pipeline** is an advanced object removal pipeline that leverages SDXL for precise content suppression and seamless region completion. This pipeline uses **self-attention redirection guidance** to modify the model’s self-attention mechanism, allowing for effective removal and inpainting across various levels of mask precision, including semantic segmentation masks, bounding boxes, and hand-drawn masks. If you are interested in more detailed information and have any questions, please refer to the [paper](https://arxiv.org/abs/2412.12974) and [official implementation](https://github.com/Anonym0u3/AttentiveEraser).
4642-
4643-
#### Key features
4644-
4645-
- **Tuning-Free**: No additional training is required, making it easy to integrate and use.
4646-
- **Flexible Mask Support**: Works with different types of masks for targeted object removal.
4647-
- **High-Quality Results**: Utilizes the inherent generative power of diffusion models for realistic content completion.
4648-
4649-
#### Usage example
4650-
To use the Stable Diffusion XL Attentive Eraser Pipeline, you can initialize it as follows:
4651-
```py
4652-
import torch
4653-
from diffusers import DDIMScheduler, DiffusionPipeline
4654-
from diffusers.utils import load_image
4655-
import torch.nn.functional as F
4656-
from torchvision.transforms.functional import to_tensor, gaussian_blur
4657-
4658-
dtype = torch.float16
4659-
device = torch.device("cuda") if torch.cuda.is_available() else torch.device("cpu")
4660-
4661-
scheduler = DDIMScheduler(beta_start=0.00085, beta_end=0.012, beta_schedule="scaled_linear", clip_sample=False, set_alpha_to_one=False)
4662-
pipeline = DiffusionPipeline.from_pretrained(
4663-
"stabilityai/stable-diffusion-xl-base-1.0",
4664-
custom_pipeline="pipeline_stable_diffusion_xl_attentive_eraser",
4665-
scheduler=scheduler,
4666-
variant="fp16",
4667-
use_safetensors=True,
4668-
torch_dtype=dtype,
4669-
).to(device)
4670-
4671-
4672-
def preprocess_image(image_path, device):
4673-
image = to_tensor((load_image(image_path)))
4674-
image = image.unsqueeze_(0).float() * 2 - 1 # [0,1] --> [-1,1]
4675-
if image.shape[1] != 3:
4676-
image = image.expand(-1, 3, -1, -1)
4677-
image = F.interpolate(image, (1024, 1024))
4678-
image = image.to(dtype).to(device)
4679-
return image
4680-
4681-
def preprocess_mask(mask_path, device):
4682-
mask = to_tensor((load_image(mask_path, convert_method=lambda img: img.convert('L'))))
4683-
mask = mask.unsqueeze_(0).float() # 0 or 1
4684-
mask = F.interpolate(mask, (1024, 1024))
4685-
mask = gaussian_blur(mask, kernel_size=(77, 77))
4686-
mask[mask < 0.1] = 0
4687-
mask[mask >= 0.1] = 1
4688-
mask = mask.to(dtype).to(device)
4689-
return mask
4690-
4691-
prompt = "" # Set prompt to null
4692-
seed=123
4693-
generator = torch.Generator(device=device).manual_seed(seed)
4694-
source_image_path = "https://raw.githubusercontent.com/Anonym0u3/Images/refs/heads/main/an1024.png"
4695-
mask_path = "https://raw.githubusercontent.com/Anonym0u3/Images/refs/heads/main/an1024_mask.png"
4696-
source_image = preprocess_image(source_image_path, device)
4697-
mask = preprocess_mask(mask_path, device)
4698-
4699-
image = pipeline(
4700-
prompt=prompt,
4701-
image=source_image,
4702-
mask_image=mask,
4703-
height=1024,
4704-
width=1024,
4705-
AAS=True, # enable AAS
4706-
strength=0.8, # inpainting strength
4707-
rm_guidance_scale=9, # removal guidance scale
4708-
ss_steps = 9, # similarity suppression steps
4709-
ss_scale = 0.3, # similarity suppression scale
4710-
AAS_start_step=0, # AAS start step
4711-
AAS_start_layer=34, # AAS start layer
4712-
AAS_end_layer=70, # AAS end layer
4713-
num_inference_steps=50, # number of inference steps # AAS_end_step = int(strength*num_inference_steps)
4714-
generator=generator,
4715-
guidance_scale=1,
4716-
).images[0]
4717-
image.save('./removed_img.png')
4718-
print("Object removal completed")
4719-
```
4720-
4721-
| Source Image | Mask | Output |
4722-
| ---------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------- | --------------------------------------------------------------------------------------------------- |
4723-
| ![Source Image](https://raw.githubusercontent.com/Anonym0u3/Images/refs/heads/main/an1024.png) | ![Mask](https://raw.githubusercontent.com/Anonym0u3/Images/refs/heads/main/an1024_mask.png) | ![Output](https://raw.githubusercontent.com/Anonym0u3/Images/refs/heads/main/AE_step40_layer34.png) |
4724-
47254637
# Perturbed-Attention Guidance
47264638

47274639
[Project](https://ku-cvlab.github.io/Perturbed-Attention-Guidance/) / [arXiv](https://arxiv.org/abs/2403.17377) / [GitHub](https://github.com/KU-CVLAB/Perturbed-Attention-Guidance)

examples/community/matryoshka.py

Lines changed: 74 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -80,6 +80,7 @@
8080
USE_PEFT_BACKEND,
8181
BaseOutput,
8282
deprecate,
83+
is_torch_version,
8384
is_torch_xla_available,
8485
logging,
8586
replace_example_docstring,
@@ -868,7 +869,23 @@ def forward(
868869

869870
for i, (resnet, attn) in enumerate(blocks):
870871
if torch.is_grad_enabled() and self.gradient_checkpointing:
871-
hidden_states = self._gradient_checkpointing_func(resnet, hidden_states, temb)
872+
873+
def create_custom_forward(module, return_dict=None):
874+
def custom_forward(*inputs):
875+
if return_dict is not None:
876+
return module(*inputs, return_dict=return_dict)
877+
else:
878+
return module(*inputs)
879+
880+
return custom_forward
881+
882+
ckpt_kwargs: Dict[str, Any] = {"use_reentrant": False} if is_torch_version(">=", "1.11.0") else {}
883+
hidden_states = torch.utils.checkpoint.checkpoint(
884+
create_custom_forward(resnet),
885+
hidden_states,
886+
temb,
887+
**ckpt_kwargs,
888+
)
872889
hidden_states = attn(
873890
hidden_states,
874891
encoder_hidden_states=encoder_hidden_states,
@@ -1013,6 +1030,17 @@ def forward(
10131030
hidden_states = self.resnets[0](hidden_states, temb)
10141031
for attn, resnet in zip(self.attentions, self.resnets[1:]):
10151032
if torch.is_grad_enabled() and self.gradient_checkpointing:
1033+
1034+
def create_custom_forward(module, return_dict=None):
1035+
def custom_forward(*inputs):
1036+
if return_dict is not None:
1037+
return module(*inputs, return_dict=return_dict)
1038+
else:
1039+
return module(*inputs)
1040+
1041+
return custom_forward
1042+
1043+
ckpt_kwargs: Dict[str, Any] = {"use_reentrant": False} if is_torch_version(">=", "1.11.0") else {}
10161044
hidden_states = attn(
10171045
hidden_states,
10181046
encoder_hidden_states=encoder_hidden_states,
@@ -1021,7 +1049,12 @@ def forward(
10211049
encoder_attention_mask=encoder_attention_mask,
10221050
return_dict=False,
10231051
)[0]
1024-
hidden_states = self._gradient_checkpointing_func(resnet, hidden_states, temb)
1052+
hidden_states = torch.utils.checkpoint.checkpoint(
1053+
create_custom_forward(resnet),
1054+
hidden_states,
1055+
temb,
1056+
**ckpt_kwargs,
1057+
)
10251058
else:
10261059
hidden_states = attn(
10271060
hidden_states,
@@ -1159,7 +1192,23 @@ def forward(
11591192
hidden_states = torch.cat([hidden_states, res_hidden_states], dim=1)
11601193

11611194
if torch.is_grad_enabled() and self.gradient_checkpointing:
1162-
hidden_states = self._gradient_checkpointing_func(resnet, hidden_states, temb)
1195+
1196+
def create_custom_forward(module, return_dict=None):
1197+
def custom_forward(*inputs):
1198+
if return_dict is not None:
1199+
return module(*inputs, return_dict=return_dict)
1200+
else:
1201+
return module(*inputs)
1202+
1203+
return custom_forward
1204+
1205+
ckpt_kwargs: Dict[str, Any] = {"use_reentrant": False} if is_torch_version(">=", "1.11.0") else {}
1206+
hidden_states = torch.utils.checkpoint.checkpoint(
1207+
create_custom_forward(resnet),
1208+
hidden_states,
1209+
temb,
1210+
**ckpt_kwargs,
1211+
)
11631212
hidden_states = attn(
11641213
hidden_states,
11651214
encoder_hidden_states=encoder_hidden_states,
@@ -1233,6 +1282,10 @@ def __init__(
12331282
]
12341283
)
12351284

1285+
def _set_gradient_checkpointing(self, module, value=False):
1286+
if hasattr(module, "gradient_checkpointing"):
1287+
module.gradient_checkpointing = value
1288+
12361289
def forward(
12371290
self,
12381291
hidden_states: torch.Tensor,
@@ -1312,15 +1365,27 @@ def forward(
13121365
# Blocks
13131366
for block in self.transformer_blocks:
13141367
if torch.is_grad_enabled() and self.gradient_checkpointing:
1315-
hidden_states = self._gradient_checkpointing_func(
1316-
block,
1368+
1369+
def create_custom_forward(module, return_dict=None):
1370+
def custom_forward(*inputs):
1371+
if return_dict is not None:
1372+
return module(*inputs, return_dict=return_dict)
1373+
else:
1374+
return module(*inputs)
1375+
1376+
return custom_forward
1377+
1378+
ckpt_kwargs: Dict[str, Any] = {"use_reentrant": False} if is_torch_version(">=", "1.11.0") else {}
1379+
hidden_states = torch.utils.checkpoint.checkpoint(
1380+
create_custom_forward(block),
13171381
hidden_states,
13181382
attention_mask,
13191383
encoder_hidden_states,
13201384
encoder_attention_mask,
13211385
timestep,
13221386
cross_attention_kwargs,
13231387
class_labels,
1388+
**ckpt_kwargs,
13241389
)
13251390
else:
13261391
hidden_states = block(
@@ -2659,6 +2724,10 @@ def fn_recursive_set_attention_slice(module: torch.nn.Module, slice_size: List[i
26592724
for module in self.children():
26602725
fn_recursive_set_attention_slice(module, reversed_slice_size)
26612726

2727+
def _set_gradient_checkpointing(self, module, value=False):
2728+
if hasattr(module, "gradient_checkpointing"):
2729+
module.gradient_checkpointing = value
2730+
26622731
def enable_freeu(self, s1: float, s2: float, b1: float, b2: float):
26632732
r"""Enables the FreeU mechanism from https://arxiv.org/abs/2309.11497.
26642733

examples/dreambooth/README.md

Lines changed: 0 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -742,29 +742,3 @@ accelerate launch train_dreambooth.py \
742742
## Stable Diffusion XL
743743

744744
We support fine-tuning of the UNet shipped in [Stable Diffusion XL](https://huggingface.co/papers/2307.01952) with DreamBooth and LoRA via the `train_dreambooth_lora_sdxl.py` script. Please refer to the docs [here](./README_sdxl.md).
745-
746-
## Dataset
747-
748-
We support 🤗 [Datasets](https://huggingface.co/docs/datasets/index), you can find a dataset on the [Hugging Face Hub](https://huggingface.co/datasets) or use your own.
749-
750-
The quickest way to get started with your custom dataset is 🤗 Datasets' [`ImageFolder`](https://huggingface.co/docs/datasets/image_dataset#imagefolder).
751-
752-
We need to create a file `metadata.jsonl` in the directory with our images:
753-
754-
```
755-
{"file_name": "01.jpg", "prompt": "prompt 01"}
756-
{"file_name": "02.jpg", "prompt": "prompt 02"}
757-
```
758-
759-
If we have a directory with image-text pairs e.g. `01.jpg` and `01.txt` then `convert_to_imagefolder.py` can create `metadata.jsonl`.
760-
761-
```sh
762-
python convert_to_imagefolder.py --path my_dataset/
763-
```
764-
765-
We use `--dataset_name` and `--caption_column` with training scripts.
766-
767-
```
768-
--dataset_name=my_dataset/
769-
--caption_column=prompt
770-
```

examples/dreambooth/train_dreambooth_lora_sana.py

Lines changed: 2 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -63,7 +63,6 @@
6363
is_wandb_available,
6464
)
6565
from diffusers.utils.hub_utils import load_or_create_model_card, populate_model_card
66-
from diffusers.utils.import_utils import is_torch_npu_available
6766
from diffusers.utils.torch_utils import is_compiled_module
6867

6968

@@ -75,9 +74,6 @@
7574

7675
logger = get_logger(__name__)
7776

78-
if is_torch_npu_available():
79-
torch.npu.config.allow_internal_format = False
80-
8177

8278
def save_model_card(
8379
repo_id: str,
@@ -605,7 +601,6 @@ def parse_args(input_args=None):
605601
)
606602
parser.add_argument("--local_rank", type=int, default=-1, help="For distributed training: local_rank")
607603
parser.add_argument("--enable_vae_tiling", action="store_true", help="Enabla vae tiling in log validation")
608-
parser.add_argument("--enable_npu_flash_attention", action="store_true", help="Enabla Flash Attention for NPU")
609604

610605
if input_args is not None:
611606
args = parser.parse_args(input_args)
@@ -929,7 +924,8 @@ def main(args):
929924
image.save(image_filename)
930925

931926
del pipeline
932-
free_memory()
927+
if torch.cuda.is_available():
928+
torch.cuda.empty_cache()
933929

934930
# Handle the repository creation
935931
if accelerator.is_main_process:
@@ -992,13 +988,6 @@ def main(args):
992988
# because Gemma2 is particularly suited for bfloat16.
993989
text_encoder.to(dtype=torch.bfloat16)
994990

995-
if args.enable_npu_flash_attention:
996-
if is_torch_npu_available():
997-
logger.info("npu flash attention enabled.")
998-
transformer.enable_npu_flash_attention()
999-
else:
1000-
raise ValueError("npu flash attention requires torch_npu extensions and is supported only on npu device ")
1001-
1002991
# Initialize a text encoding pipeline and keep it to CPU for now.
1003992
text_encoding_pipeline = SanaPipeline.from_pretrained(
1004993
args.pretrained_model_name_or_path,

examples/research_projects/pixart/controlnet_pixart_alpha.py

Lines changed: 18 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,7 @@
88
from diffusers.models.attention import BasicTransformerBlock
99
from diffusers.models.modeling_outputs import Transformer2DModelOutput
1010
from diffusers.models.modeling_utils import ModelMixin
11+
from diffusers.utils.torch_utils import is_torch_version
1112

1213

1314
class PixArtControlNetAdapterBlock(nn.Module):
@@ -150,6 +151,10 @@ def __init__(
150151
self.transformer = transformer
151152
self.controlnet = controlnet
152153

154+
def _set_gradient_checkpointing(self, module, value=False):
155+
if hasattr(module, "gradient_checkpointing"):
156+
module.gradient_checkpointing = value
157+
153158
def forward(
154159
self,
155160
hidden_states: torch.Tensor,
@@ -215,15 +220,26 @@ def forward(
215220
print("Gradient checkpointing is not supported for the controlnet transformer model, yet.")
216221
exit(1)
217222

218-
hidden_states = self._gradient_checkpointing_func(
219-
block,
223+
def create_custom_forward(module, return_dict=None):
224+
def custom_forward(*inputs):
225+
if return_dict is not None:
226+
return module(*inputs, return_dict=return_dict)
227+
else:
228+
return module(*inputs)
229+
230+
return custom_forward
231+
232+
ckpt_kwargs: Dict[str, Any] = {"use_reentrant": False} if is_torch_version(">=", "1.11.0") else {}
233+
hidden_states = torch.utils.checkpoint.checkpoint(
234+
create_custom_forward(block),
220235
hidden_states,
221236
attention_mask,
222237
encoder_hidden_states,
223238
encoder_attention_mask,
224239
timestep,
225240
cross_attention_kwargs,
226241
None,
242+
**ckpt_kwargs,
227243
)
228244
else:
229245
# the control nets are only used for the blocks 1 to self.blocks_num

0 commit comments

Comments
 (0)