-
Notifications
You must be signed in to change notification settings - Fork 6.4k
Image dimension checking for ControlNet FLUX #9550
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Image dimension checking for ControlNet FLUX #9550
Conversation
|
|
||
| return prompt_embeds, pooled_prompt_embeds, text_ids | ||
|
|
||
| # Copied from diffusers.pipelines.controlnet.pipeline_controlnet.StableDiffusionControlNetPipeline.check_image |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so. I think this method was made before we introduced image processor, which set a standard image input format we accept across all our pipelines and check if it is a valid format there
diffusers/src/diffusers/image_processor.py
Line 535 in c4a8979
| if not is_valid_image_imagelist(image): |
|
@yiyixuxu Thanks that is good to know. I pushed a change so that |
|
bump @yiyixuxu thanks! |
| elif prompt_2 is not None and (not isinstance(prompt_2, str) and not isinstance(prompt_2, list)): | ||
| raise ValueError(f"`prompt_2` has to be of type `str` or `list` but is {type(prompt_2)}") | ||
|
|
||
| if ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can remove almost all of the checks and only do following two things:
- make sure if it is a multi controlnet, image is a list with same length of number of controlnet
- add a check here
if image_batch_size == 1:
repeat_by = batch_size
elif image_batch_size == batch_size:
# image batch size is the same as prompt batch size
repeat_by = num_images_per_prompt
else:
raise ValueError("...")it should be sufficient no? would we miss anything here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the first question, I already added this checking in an earlier commit, see:
For (2), that if elif statement will break everything since the batch_size that is actually passed into prepare_image outside of it is actually batch_size*num_images_per_prompt, i.e:
control_image = self.prepare_image(
...
batch_size=batch_size * num_images_per_prompt,
num_images_per_prompt=num_images_per_prompt,
...
)
It's a little confusing to parse (esp since we also pass num_images_per_prompt into that method) so I changed it to the following:
control_image = self.prepare_image(
...
batch_size=batch_size,
num_images_per_prompt=num_images_per_prompt,
...
)
and made an adjustment to the code inside that method, so now we have:
if image_batch_size == 1:
repeat_by = batch_size*num_images_per_prompt
elif image_batch_size == batch_size:
# image batch size is the same as prompt batch size
repeat_by = num_images_per_prompt
else:
raise ValueError(...)
I wrote an informative ValueError as well in the event the else statement gets tripped.
I'll push these changes momentarily.
…if statement bypassing preprocess for torch tensor type
|
Thanks for the comments above @yiyixuxu Just one last thing, there is this to take care of: diffusers/src/diffusers/pipelines/flux/pipeline_flux_controlnet.py Lines 526 to 529 in 9cd3755
As I previously said, I'm not sure why all the preprocessing gets skipped for Fixing this however would side effect code which already uses this class with diffusers/src/diffusers/pipelines/flux/pipeline_flux_controlnet.py Lines 674 to 675 in 9cd3755
I made the change in the latest commit but maybe it's worth discussing this further. If we go with my commit, then maybe it's worth adding in a warning in the event that |
|
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Please note that issues that do not follow the contributing guidelines are likely to be ignored. |
|
re-bump @yiyixuxu |
|
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Please note that issues that do not follow the contributing guidelines are likely to be ignored. |
What does this PR do?
Issue
This addresses an issue discussed in a two PRs, see #9406 (comment) and #9507 (comment)
The FLUX controlnet pipeline is actually lacking any checks for the shape or number of control images passed (for
np.ndarray or torch.TensorandPILobjects, respectively).I will give a simple example. If you were to run the following code:
you'd get the following error:
This is actually because the number of control images must match the number of prompts passed -- in this case we passed in a control image of batch size 2 but the number of prompts passed is 1. Because we don't catch for this, it results in a downstream error related to the packing of the latents.
It turns out SDXL's controlnet actually checks to make sure the number of control images are consistent with the number of prompts (I do recall one of the two are also allowed to be a singleton list, which is also fine). I essentially ported over the
check_imagemethod fromStableDiffusionControlNetPipelineas well as modifycheck_inputsto actually check the control image as well. Now if you run the above code you will get the following error instead, which makes it much clearer what the issue is:This fix should also work for
MultiControlNet, which means you can do something like this:i.e.
imagesandimages2are bothtorch.Tensorwith a batch size of3, and their corresponding ControlNet states (which will be effectively have double batch size due tonum_images_per_prompt=2) will be summed together.I have some tests you can copy and paste from here: https://github.com/christopher-beckham/diffusers-tests/blob/4b548f8/controlnet_pipeline_cleaner_api/flux.py
(you can run with
python -m unittest flux.py)Other concerns
There are some questions I have however. Why is it that we skip the image preprocessing if the image is
torch.Tensor? i.e.diffusers/src/diffusers/pipelines/flux/pipeline_flux_controlnet.py
Lines 526 to 529 in 9cd3755
This also seems inconsistent with what is done in the SDXL ControlNet code:
diffusers/src/diffusers/pipelines/controlnet/pipeline_controlnet_sd_xl.py
Line 857 in 7071b74
It may also lead to unexpected behaviour because
preprocessexplicitly tries to usewidthandheightto preprocess the image (if they areNone, then a reasonable default is used instead, depending on what the precise model is). But this logic gets skipped entirely if atorch.Tensoris passed.Thanks.
Before submitting
documentation guidelines, and
here are tips on formatting docstrings
Who can review?
@yiyixuxu @wangqixun