Make passing the IP Adapter mask to the attention mechanism optional #10346

elismasilva · 2024-12-22T22:51:53Z

What does this PR do?

When we are working with more than one IP adapter and we apply a mask to the images of an IP adapter, we are required to pass a mask to all the IPs involved.

This PR makes it optional to pass the IP Adapter mask to the attention mechanism if there is no need to apply it to a specific IP Adapter. This solution has already been applied to the Xformers attention mechanism. This PR simply replicates the same solution for the standard SDP attention mechanisms.

Reproduction code

This code reproduces both the problem and the solution.
If you run it without xformers active, you will receive the error in the log below.
For SDP you must provide a mask for the style IP (See the commented line in the pipeline call)
With the solution applied it will work for SDP as it already works for Xformers.

import numpy as np
import torch
from diffusers import AutoPipelineForText2Image
from transformers import CLIPVisionModelWithProjection
from diffusers.utils.loading_utils import load_image

MAX_SEED = np.iinfo(np.int32).max
base_model_path = "stabilityai/stable-diffusion-xl-base-1.0"
device = "cuda"
seed = 42

image_encoder = CLIPVisionModelWithProjection.from_pretrained(
    "h94/IP-Adapter", subfolder="models/image_encoder", torch_dtype=torch.float16
).to(device)

# load SDXL pipeline
pipe = AutoPipelineForText2Image.from_pretrained(
    base_model_path,    
    torch_dtype=torch.float16,
    image_encoder=image_encoder
).to(device)

#DEFAULT RUNNING ATTENTION IS 2.0
pipe.enable_vae_tiling() 
pipe.enable_model_cpu_offload()

from diffusers.image_processor import IPAdapterMaskProcessor

style_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/0052a70beed5bf71b92610a43a52df6d286cd5f3/diffusers/rabbit.jpg")
style_image.resize((512, 512))


style_mask = load_image("https://raw.githubusercontent.com/instantX-research/InstantStyle/main/assets/composition_mask.png")

mask1 = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/ip_mask_mask1.png")
mask2 = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/ip_mask_mask2.png")

output_height = 1024
output_width = 1024

processor = IPAdapterMaskProcessor()
masks = processor.preprocess([mask1, mask2], height=output_height, width=output_width)
style_mask = processor.preprocess([style_mask], height=output_height, width=output_width)

face_image1 = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/ip_mask_girl1.png")
face_image2 = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/ip_mask_girl2.png")

ip_images = [face_image1, face_image2]

masks = masks.reshape(1, masks.shape[0], masks.shape[2], masks.shape[3])
style_mask = style_mask.reshape(1, masks.shape[0], masks.shape[2], masks.shape[3])

# load ip-adapter
pipe.load_ip_adapter(["h94/IP-Adapter", "h94/IP-Adapter"],
    subfolder=["sdxl_models", "sdxl_models"],    
    weight_name=["ip-adapter-plus_sdxl_vit-h.bin", "ip-adapter-plus-face_sdxl_vit-h.safetensors"],
    image_encoder_folder=None,
)

# configure ip-adapter scales.
scale = {
    "up": {"block_0": [0.0, 1.0, 0.0]}, #style
}
scale_2 = [0.7, 0.7]

#pipe.enable_xformers_memory_efficient_attention() #UNCOMMENT this line to run WITH XFORMERS. This support skip style masking.

pipe.set_ip_adapter_scale([scale, scale_2])

generator = torch.Generator(device="cpu").manual_seed(seed)
num_images = 1

# generate image
image = pipe(
    prompt="2 girls",
    ip_adapter_image=[style_image, ip_images],
    negative_prompt="monochrome, lowres, bad anatomy, worst quality, low quality",
    num_inference_steps=20,
    num_images_per_prompt=num_images,
    height=output_height,
    width=output_width,    
    generator=generator,
    #cross_attention_kwargs={"ip_adapter_masks": [style_mask, masks] } #with SDP we need use mask
    cross_attention_kwargs={"ip_adapter_masks": [None, masks] } #proposed fix
).images[0]


image.save("./data/result_2girls.png")

Logs

Exception has occurred: ValueError       (note: full exception trace is shown but execution is paused at: _run_module_as_main)
Each element of the ip_adapter_masks array should be a tensor with shape [1, num_images_for_ip_adapter, height, width]. Please use `IPAdapterMaskProcessor` to preprocess your mask
  File "/mnt/f/Projetos/diffusers/src/diffusers/models/attention_processor.py", line 5060, in __call__
    raise ValueError(
  File "/mnt/f/Projetos/diffusers/src/diffusers/models/attention_processor.py", line 588, in forward
    return self.processor(
  File "/mnt/f/Projetos/diffusers/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
    return forward_call(*args, **kwargs)
  File "/mnt/f/Projetos/diffusers/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/mnt/f/Projetos/diffusers/src/diffusers/models/attention.py", line 552, in forward
    attn_output = self.attn2(
  File "/mnt/f/Projetos/diffusers/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
    return forward_call(*args, **kwargs)
  File "/mnt/f/Projetos/diffusers/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/mnt/f/Projetos/diffusers/src/diffusers/models/transformers/transformer_2d.py", line 442, in forward
    hidden_states = block(
  File "/mnt/f/Projetos/diffusers/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
    return forward_call(*args, **kwargs)
  File "/mnt/f/Projetos/diffusers/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/mnt/f/Projetos/diffusers/src/diffusers/models/unets/unet_2d_blocks.py", line 1334, in forward
    hidden_states = attn(
  File "/mnt/f/Projetos/diffusers/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
    return forward_call(*args, **kwargs)
  File "/mnt/f/Projetos/diffusers/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/mnt/f/Projetos/diffusers/src/diffusers/models/unets/unet_2d_condition.py", line 1216, in forward
    sample, res_samples = downsample_block(
  File "/mnt/f/Projetos/diffusers/venv/lib/python3.10/site-packages/accelerate/hooks.py", line 170, in new_forward
    output = module._old_forward(*args, **kwargs)
  File "/mnt/f/Projetos/diffusers/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
    return forward_call(*args, **kwargs)
  File "/mnt/f/Projetos/diffusers/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/mnt/f/Projetos/diffusers/src/diffusers/pipelines/stable_diffusion_xl/pipeline_stable_diffusion_xl.py", line 1208, in __call__
    noise_pred = self.unet(
  File "/mnt/f/Projetos/diffusers/venv/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
  File "/mnt/f/Projetos/diffusers/lang_code_data/test_ip_adapter_mask.py", line 70, in <module>
    image = pipe(
  File "/home/master/miniconda3/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/home/master/miniconda3/lib/python3.10/runpy.py", line 196, in _run_module_as_main (Current frame)
    return _run_code(code, main_globals, None,
ValueError: Each element of the ip_adapter_masks array should be a tensor with shape [1, num_images_for_ip_adapter, height, width]. Please use `IPAdapterMaskProcessor` to preprocess your mask

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline?
Did you read our philosophy doc (important for complex PRs)?
Was this discussed/approved via a GitHub issue or the forum? Please add a link to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.
@yiyixuxu

…if there is no need to apply it to a given IP Adapter.

HuggingFaceDocBuilderDev · 2024-12-23T11:27:34Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

hlky

Thank you @elismasilva

Make passing the IP Adapter mask to the attention mechanism optional …

3e89e2c

…if there is no need to apply it to a given IP Adapter.

hlky approved these changes Dec 23, 2024

View reviewed changes

hlky added the close-to-merge label Dec 23, 2024

Merge branch 'main' into allow-skip-ip-adapter-mask

db42a46

hlky merged commit c0c1168 into huggingface:main Dec 24, 2024
12 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Make passing the IP Adapter mask to the attention mechanism optional #10346

Make passing the IP Adapter mask to the attention mechanism optional #10346

Uh oh!

elismasilva commented Dec 22, 2024 •

edited

Loading

Uh oh!

HuggingFaceDocBuilderDev commented Dec 23, 2024

Uh oh!

hlky left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Make passing the IP Adapter mask to the attention mechanism optional #10346

Make passing the IP Adapter mask to the attention mechanism optional #10346

Uh oh!

Conversation

elismasilva commented Dec 22, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Reproduction code

Logs

Before submitting

Who can review?

Uh oh!

HuggingFaceDocBuilderDev commented Dec 23, 2024

Uh oh!

hlky left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

elismasilva commented Dec 22, 2024 •

edited

Loading