Skip to content

Conversation

@elismasilva
Copy link
Contributor

@elismasilva elismasilva commented Dec 22, 2024

What does this PR do?

When we are working with more than one IP adapter and we apply a mask to the images of an IP adapter, we are required to pass a mask to all the IPs involved.

This PR makes it optional to pass the IP Adapter mask to the attention mechanism if there is no need to apply it to a specific IP Adapter. This solution has already been applied to the Xformers attention mechanism. This PR simply replicates the same solution for the standard SDP attention mechanisms.

Reproduction code

This code reproduces both the problem and the solution.
If you run it without xformers active, you will receive the error in the log below.
For SDP you must provide a mask for the style IP (See the commented line in the pipeline call)
With the solution applied it will work for SDP as it already works for Xformers.

import numpy as np
import torch
from diffusers import AutoPipelineForText2Image
from transformers import CLIPVisionModelWithProjection
from diffusers.utils.loading_utils import load_image

MAX_SEED = np.iinfo(np.int32).max
base_model_path = "stabilityai/stable-diffusion-xl-base-1.0"
device = "cuda"
seed = 42

image_encoder = CLIPVisionModelWithProjection.from_pretrained(
    "h94/IP-Adapter", subfolder="models/image_encoder", torch_dtype=torch.float16
).to(device)

# load SDXL pipeline
pipe = AutoPipelineForText2Image.from_pretrained(
    base_model_path,    
    torch_dtype=torch.float16,
    image_encoder=image_encoder
).to(device)

#DEFAULT RUNNING ATTENTION IS 2.0
pipe.enable_vae_tiling() 
pipe.enable_model_cpu_offload()

from diffusers.image_processor import IPAdapterMaskProcessor

style_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/0052a70beed5bf71b92610a43a52df6d286cd5f3/diffusers/rabbit.jpg")
style_image.resize((512, 512))


style_mask = load_image("https://raw.githubusercontent.com/instantX-research/InstantStyle/main/assets/composition_mask.png")

mask1 = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/ip_mask_mask1.png")
mask2 = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/ip_mask_mask2.png")

output_height = 1024
output_width = 1024

processor = IPAdapterMaskProcessor()
masks = processor.preprocess([mask1, mask2], height=output_height, width=output_width)
style_mask = processor.preprocess([style_mask], height=output_height, width=output_width)

face_image1 = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/ip_mask_girl1.png")
face_image2 = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/ip_mask_girl2.png")

ip_images = [face_image1, face_image2]

masks = masks.reshape(1, masks.shape[0], masks.shape[2], masks.shape[3])
style_mask = style_mask.reshape(1, masks.shape[0], masks.shape[2], masks.shape[3])

# load ip-adapter
pipe.load_ip_adapter(["h94/IP-Adapter", "h94/IP-Adapter"],
    subfolder=["sdxl_models", "sdxl_models"],    
    weight_name=["ip-adapter-plus_sdxl_vit-h.bin", "ip-adapter-plus-face_sdxl_vit-h.safetensors"],
    image_encoder_folder=None,
)

# configure ip-adapter scales.
scale = {
    "up": {"block_0": [0.0, 1.0, 0.0]}, #style
}
scale_2 = [0.7, 0.7]

#pipe.enable_xformers_memory_efficient_attention() #UNCOMMENT this line to run WITH XFORMERS. This support skip style masking.

pipe.set_ip_adapter_scale([scale, scale_2])

generator = torch.Generator(device="cpu").manual_seed(seed)
num_images = 1

# generate image
image = pipe(
    prompt="2 girls",
    ip_adapter_image=[style_image, ip_images],
    negative_prompt="monochrome, lowres, bad anatomy, worst quality, low quality",
    num_inference_steps=20,
    num_images_per_prompt=num_images,
    height=output_height,
    width=output_width,    
    generator=generator,
    #cross_attention_kwargs={"ip_adapter_masks": [style_mask, masks] } #with SDP we need use mask
    cross_attention_kwargs={"ip_adapter_masks": [None, masks] } #proposed fix
).images[0]


image.save("./data/result_2girls.png")

Logs

Exception has occurred: ValueError       (note: full exception trace is shown but execution is paused at: _run_module_as_main)
Each element of the ip_adapter_masks array should be a tensor with shape [1, num_images_for_ip_adapter, height, width]. Please use `IPAdapterMaskProcessor` to preprocess your mask
  File "/mnt/f/Projetos/diffusers/src/diffusers/models/attention_processor.py", line 5060, in __call__
    raise ValueError(
  File "/mnt/f/Projetos/diffusers/src/diffusers/models/attention_processor.py", line 588, in forward
    return self.processor(
  File "/mnt/f/Projetos/diffusers/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
    return forward_call(*args, **kwargs)
  File "/mnt/f/Projetos/diffusers/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/mnt/f/Projetos/diffusers/src/diffusers/models/attention.py", line 552, in forward
    attn_output = self.attn2(
  File "/mnt/f/Projetos/diffusers/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
    return forward_call(*args, **kwargs)
  File "/mnt/f/Projetos/diffusers/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/mnt/f/Projetos/diffusers/src/diffusers/models/transformers/transformer_2d.py", line 442, in forward
    hidden_states = block(
  File "/mnt/f/Projetos/diffusers/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
    return forward_call(*args, **kwargs)
  File "/mnt/f/Projetos/diffusers/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/mnt/f/Projetos/diffusers/src/diffusers/models/unets/unet_2d_blocks.py", line 1334, in forward
    hidden_states = attn(
  File "/mnt/f/Projetos/diffusers/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
    return forward_call(*args, **kwargs)
  File "/mnt/f/Projetos/diffusers/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/mnt/f/Projetos/diffusers/src/diffusers/models/unets/unet_2d_condition.py", line 1216, in forward
    sample, res_samples = downsample_block(
  File "/mnt/f/Projetos/diffusers/venv/lib/python3.10/site-packages/accelerate/hooks.py", line 170, in new_forward
    output = module._old_forward(*args, **kwargs)
  File "/mnt/f/Projetos/diffusers/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
    return forward_call(*args, **kwargs)
  File "/mnt/f/Projetos/diffusers/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/mnt/f/Projetos/diffusers/src/diffusers/pipelines/stable_diffusion_xl/pipeline_stable_diffusion_xl.py", line 1208, in __call__
    noise_pred = self.unet(
  File "/mnt/f/Projetos/diffusers/venv/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
  File "/mnt/f/Projetos/diffusers/lang_code_data/test_ip_adapter_mask.py", line 70, in <module>
    image = pipe(
  File "/home/master/miniconda3/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/home/master/miniconda3/lib/python3.10/runpy.py", line 196, in _run_module_as_main (Current frame)
    return _run_code(code, main_globals, None,
ValueError: Each element of the ip_adapter_masks array should be a tensor with shape [1, num_images_for_ip_adapter, height, width]. Please use `IPAdapterMaskProcessor` to preprocess your mask

Before submitting

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.
@yiyixuxu

…if there is no need to apply it to a given IP Adapter.
@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link
Contributor

@hlky hlky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @elismasilva

@hlky hlky merged commit c0c1168 into huggingface:main Dec 24, 2024
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants