Image dimension checking for ControlNet FLUX #9550

christopher-beckham · 2024-09-28T18:27:10Z

What does this PR do?

Issue

This addresses an issue discussed in a two PRs, see #9406 (comment) and #9507 (comment)

The FLUX controlnet pipeline is actually lacking any checks for the shape or number of control images passed (for np.ndarray or torch.Tensor and PIL objects, respectively).

I will give a simple example. If you were to run the following code:

pipe = FluxControlNetPipeline.from_pretrained(
    base_model, vae=vae, controlnet=multi_controlnet, torch_dtype=torch.bfloat16
).to("cuda")
# image_t is a torch tensor of shape (2,3,h,w)
self.pipe(
    prompt=["test"],
    control_image=image_t, 
    control_mode=0, 
    num_images_per_prompt=1,
    num_inference_steps=2
)

you'd get the following error:

Traceback (most recent call last):
  File "/network/scratch/b/beckhamc/github/diffusion-pr/diffusers-tests/controlnet_pipeline_cleaner_api/flux.py", line 67, in test_torch_batched_ctrl_wrong_1ipp
    self.pipe(
  File "/home/beckhamc/envs/diffusers/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
  File "/network/scratch/b/beckhamc/github/diffusion-pr/diffusers/src/diffusers/pipelines/flux/pipeline_flux_controlnet.py", line 742, in __call__
    control_image = self._pack_latents(
  File "/network/scratch/b/beckhamc/github/diffusion-pr/diffusers/src/diffusers/pipelines/flux/pipeline_flux_controlnet.py", line 458, in _pack_latents
    latents = latents.view(batch_size, num_channels_latents, height // 2, 2, width // 2, 2)
RuntimeError: shape '[1, 16, 32, 2, 32, 2]' is invalid for input of size 131072

This is actually because the number of control images must match the number of prompts passed -- in this case we passed in a control image of batch size 2 but the number of prompts passed is 1. Because we don't catch for this, it results in a downstream error related to the packing of the latents.

It turns out SDXL's controlnet actually checks to make sure the number of control images are consistent with the number of prompts (I do recall one of the two are also allowed to be a singleton list, which is also fine). I essentially ported over the check_image method from StableDiffusionControlNetPipeline as well as modify check_inputs to actually check the control image as well. Now if you run the above code you will get the following error instead, which makes it much clearer what the issue is:

Traceback (most recent call last):
  File "/network/scratch/b/beckhamc/github/diffusion-pr/diffusers-tests/controlnet_pipeline_cleaner_api/flux.py", line 67, in test_torch_batched_ctrl_wrong_1ipp
    self.pipe(
  File "/home/beckhamc/envs/diffusers/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
  File "/network/scratch/b/beckhamc/github/diffusion-pr/diffusers/src/diffusers/pipelines/flux/pipeline_flux_controlnet.py", line 742, in __call__
    self.check_inputs(
  File "/network/scratch/b/beckhamc/github/diffusion-pr/diffusers/src/diffusers/pipelines/flux/pipeline_flux_controlnet.py", line 475, in check_inputs
    self.check_image(image, prompt, prompt_embeds)
  File "/network/scratch/b/beckhamc/github/diffusion-pr/diffusers/src/diffusers/pipelines/flux/pipeline_flux_controlnet.py", line 427, in check_image
    raise ValueError(
ValueError: If image batch size is not 1, image batch size must be same as prompt batch size. image batch size: 2, prompt batch size: 1

This fix should also work for MultiControlNet, which means you can do something like this:

multi_controlnet = FluxMultiControlNetModel([controlnet] * 2)
pipe = FluxControlNetPipeline.from_pretrained(
    base_model, vae=vae, controlnet=multi_controlnet, torch_dtype=torch.bfloat16
).to("cuda")
images = pipe(
    prompt=["1","2","3"],
    control_image=[images1, images2], 
    controlnet_conditioning_scale=[0.6, 0.6],
    control_mode=0,
    num_images_per_prompt=2
)

i.e. images and images2 are both torch.Tensor with a batch size of 3, and their corresponding ControlNet states (which will be effectively have double batch size due to num_images_per_prompt=2) will be summed together.

I have some tests you can copy and paste from here: https://github.com/christopher-beckham/diffusers-tests/blob/4b548f8/controlnet_pipeline_cleaner_api/flux.py

(you can run with python -m unittest flux.py)

Other concerns

There are some questions I have however. Why is it that we skip the image preprocessing if the image is torch.Tensor? i.e.

diffusers/src/diffusers/pipelines/flux/pipeline_flux_controlnet.py

Lines 526 to 529 in 9cd3755

    
           if isinstance(image, torch.Tensor): 
        
               pass 
        
           else: 
        
               image = self.image_processor.preprocess(image, height=height, width=width)

This also seems inconsistent with what is done in the SDXL ControlNet code:

diffusers/src/diffusers/pipelines/controlnet/pipeline_controlnet_sd_xl.py

Line 857 in 7071b74

    
           image = self.control_image_processor.preprocess(image, height=height, width=width).to(dtype=torch.float32)

It may also lead to unexpected behaviour because preprocess explicitly tries to use width and height to preprocess the image (if they are None, then a reasonable default is used instead, depending on what the precise model is). But this logic gets skipped entirely if a torch.Tensor is passed.

Thanks.

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline?
Did you read our philosophy doc (important for complex PRs)?
Was this discussed/approved via a GitHub issue or the forum? Please add a link to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings
Did you write any new necessary tests? (Yes but in my own standalone repo which I linked to )

Who can review?

@yiyixuxu @wangqixun

yiyixuxu · 2024-09-30T19:08:32Z

src/diffusers/pipelines/flux/pipeline_flux_controlnet.py


        return prompt_embeds, pooled_prompt_embeds, text_ids

+    # Copied from diffusers.pipelines.controlnet.pipeline_controlnet.StableDiffusionControlNetPipeline.check_image


so. I think this method was made before we introduced image processor, which set a standard image input format we accept across all our pipelines and check if it is a valid format there

diffusers/src/diffusers/image_processor.py

Line 535 in c4a8979

if not is_valid_image_imagelist(image):

…tch size check

christopher-beckham · 2024-10-02T21:25:21Z

@yiyixuxu Thanks that is good to know. I pushed a change so that check_image now only checks the consistency for prompt and image batch size. Though it could maybe do with a more useful name... not sure what to call it, maybe check_image_and_prompt. Let me know if you think it looks good. Thanks.

christopher-beckham · 2024-10-08T02:17:45Z

bump @yiyixuxu thanks!

yiyixuxu · 2024-10-20T00:24:39Z

src/diffusers/pipelines/flux/pipeline_flux_controlnet.py

        elif prompt_2 is not None and (not isinstance(prompt_2, str) and not isinstance(prompt_2, list)):
            raise ValueError(f"`prompt_2` has to be of type `str` or `list` but is {type(prompt_2)}")

+        if (


I think we can remove almost all of the checks and only do following two things:

make sure if it is a multi controlnet, image is a list with same length of number of controlnet

add a check here

diffusers/src/diffusers/pipelines/controlnet/pipeline_controlnet_sd_xl.py

Line 862 in 56d6d21

else:

if image_batch_size == 1: repeat_by = batch_size elif image_batch_size == batch_size: # image batch size is the same as prompt batch size repeat_by = num_images_per_prompt else: raise ValueError("...")

it should be sufficient no? would we miss anything here?

For the first question, I already added this checking in an earlier commit, see:

https://github.com/christopher-beckham/diffusers/blob/flux_controlnet_input_checking/src/diffusers/pipelines/flux/pipeline_flux_controlnet.py#L458-L474

For (2), that if elif statement will break everything since the batch_size that is actually passed into prepare_image outside of it is actually batch_size*num_images_per_prompt, i.e:

control_image = self.prepare_image( ... batch_size=batch_size * num_images_per_prompt, num_images_per_prompt=num_images_per_prompt, ... )

It's a little confusing to parse (esp since we also pass num_images_per_prompt into that method) so I changed it to the following:

control_image = self.prepare_image( ... batch_size=batch_size, num_images_per_prompt=num_images_per_prompt, ... )

and made an adjustment to the code inside that method, so now we have:

if image_batch_size == 1: repeat_by = batch_size*num_images_per_prompt elif image_batch_size == batch_size: # image batch size is the same as prompt batch size repeat_by = num_images_per_prompt else: raise ValueError(...)

I wrote an informative ValueError as well in the event the else statement gets tripped.

I'll push these changes momentarily.

…if statement bypassing preprocess for torch tensor type

christopher-beckham · 2024-11-01T20:24:51Z

Thanks for the comments above @yiyixuxu

Just one last thing, there is this to take care of:

diffusers/src/diffusers/pipelines/flux/pipeline_flux_controlnet.py

Lines 526 to 529 in 9cd3755

    
           if isinstance(image, torch.Tensor): 
        
               pass 
        
           else: 
        
               image = self.image_processor.preprocess(image, height=height, width=width)

As I previously said, I'm not sure why all the preprocessing gets skipped for torch.Tensor -- maybe it's an oversight by the original code author -- but this is not what happens for the corresponding SDXL controlnet pipeline, which runs self.image_processor.preprocess no matter what.

Fixing this however would side effect code which already uses this class with torch.Tensor. Even if the user sets width=None and height=None in pipeline.__call__ those width and height values will internally be redefined to be 1024:

diffusers/src/diffusers/pipelines/flux/pipeline_flux_controlnet.py

Lines 674 to 675 in 9cd3755

    
           height = height or self.default_sample_size * self.vae_scale_factor 
        
           width = width or self.default_sample_size * self.vae_scale_factor

I made the change in the latest commit but maybe it's worth discussing this further. If we go with my commit, then maybe it's worth adding in a warning in the event that torch.Tensor is passed.

github-actions · 2024-11-26T15:03:38Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

christopher-beckham · 2024-11-26T15:12:26Z

re-bump @yiyixuxu

github-actions · 2024-12-21T15:03:51Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

image checking for controlnet flux

07efa23

yiyixuxu reviewed Sep 30, 2024

View reviewed changes

remove image type checks since it's redundant, but keep the prompt ba…

1c8f02d

…tch size check

yiyixuxu reviewed Oct 20, 2024

View reviewed changes

add a just-in-case valuerror check inside prepare_image, also remove …

1bc52d2

…if statement bypassing preprocess for torch tensor type

github-actions bot added the stale Issues that haven't received updates label Nov 26, 2024

github-actions bot removed the stale Issues that haven't received updates label Nov 27, 2024

github-actions bot added the stale Issues that haven't received updates label Dec 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Image dimension checking for ControlNet FLUX #9550

Image dimension checking for ControlNet FLUX #9550

Uh oh!

christopher-beckham commented Sep 28, 2024 •

edited

Loading

Uh oh!

yiyixuxu Sep 30, 2024 •

edited

Loading

Uh oh!

christopher-beckham commented Oct 2, 2024

Uh oh!

christopher-beckham commented Oct 8, 2024

Uh oh!

yiyixuxu Oct 20, 2024

Uh oh!

christopher-beckham Nov 1, 2024

Uh oh!

christopher-beckham commented Nov 1, 2024

Uh oh!

github-actions bot commented Nov 26, 2024

Uh oh!

christopher-beckham commented Nov 26, 2024

Uh oh!

github-actions bot commented Dec 21, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	if isinstance(image, torch.Tensor):
	pass
	else:
	image = self.image_processor.preprocess(image, height=height, width=width)


		return prompt_embeds, pooled_prompt_embeds, text_ids

		# Copied from diffusers.pipelines.controlnet.pipeline_controlnet.StableDiffusionControlNetPipeline.check_image

Image dimension checking for ControlNet FLUX #9550

Are you sure you want to change the base?

Image dimension checking for ControlNet FLUX #9550

Uh oh!

Conversation

christopher-beckham commented Sep 28, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Issue

Other concerns

Before submitting

Who can review?

Uh oh!

yiyixuxu Sep 30, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

christopher-beckham commented Oct 2, 2024

Uh oh!

christopher-beckham commented Oct 8, 2024

Uh oh!

yiyixuxu Oct 20, 2024

Choose a reason for hiding this comment

Uh oh!

christopher-beckham Nov 1, 2024

Choose a reason for hiding this comment

Uh oh!

christopher-beckham commented Nov 1, 2024

Uh oh!

github-actions bot commented Nov 26, 2024

Uh oh!

christopher-beckham commented Nov 26, 2024

Uh oh!

github-actions bot commented Dec 21, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

christopher-beckham commented Sep 28, 2024 •

edited

Loading

yiyixuxu Sep 30, 2024 •

edited

Loading