Skip to content

Issue with Using Multiple Controls (Depth and Canny) with LoRA on FLUX.1-dev Model #10594

@pramishp

Description

@pramishp

Describe the bug

When attempting to use multiple control images (Depth and Canny) with LoRA on the FLUX.1-dev model, an error occurs during execution. The documentation indicates that multiple control images in PIL format can be supplied, but the pipeline throws a runtime error. Notably, the pipeline functions correctly with a single control image.

Expected Behavior

The pipeline should generate the output image without errors when multiple control images (Depth and Canny) are supplied.

Observed Behavior

The pipeline fails with the error RuntimeError: shape '[1, 16, 64, 2, 64, 2]' is invalid for input of size 524288.

Reproduction

1.	Set up the FLUX.1-dev model with multiple control images using LoRA.
2.	Use a Depth control image and a Canny control image.
3.	Execute the code with the following snippet:
import os
from huggingface_hub import login
from diffusers import FluxControlPipeline
from image_gen_aux import DepthPreprocessor
from diffusers.utils import load_image
from controlnet_aux import CannyDetector
import numpy as np
import torch

# Set Hugging Face directories
os.environ["HF_HOME"] = "/scratch/pramish_paudel/job_108669/hf"
os.environ["HF_DATASETS_CACHE"] = "/scratch/pramish_paudel/job_1086695/hf"

login(token="<REDACTED>")

control_pipe = FluxControlPipeline.from_pretrained("black-forest-labs/FLUX.1-dev", torch_dtype=torch.bfloat16)
control_pipe.load_lora_weights("black-forest-labs/FLUX.1-Depth-dev-lora", adapter_name="depth")
control_pipe.load_lora_weights("black-forest-labs/FLUX.1-Canny-dev-lora", adapter_name="canny")

control_pipe.set_adapters(["depth", "canny"], adapter_weights=[0.85, 0.85])
control_pipe.enable_model_cpu_offload()

prompt = "A robot made of exotic candies and chocolates of different kinds. The background is filled with confetti and celebratory gifts."
control_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/robot.png")

processor = DepthPreprocessor.from_pretrained("LiheYoung/depth-anything-large-hf")
control_image1 = processor(control_image)[0].convert("RGB")
shape = np.asarray(control_image1).shape[0]

processor = CannyDetector()
control_image2 = processor(control_image, low_threshold=50, high_threshold=200, detect_resolution=shape, image_resolution=shape)

image = control_pipe(
    prompt=prompt,
    control_image=[control_image1, control_image2],
    height=1024,
    width=1024,
    num_inference_steps=30,
    guidance_scale=10.0,
    generator=torch.Generator().manual_seed(42),
).images[0]
image.save("output.png")

### Logs

```shell
/lib/python3.12/site-packages/diffusers/pipelines/flux/pipeline_flux_control.py", line 474, in _pack_latents
    latents = latents.view(batch_size, num_channels_latents, height // 2, 2, width // 2, 2)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: shape '[1, 16, 64, 2, 64, 2]' is invalid for input of size 524288

System Info

•	diffusers version:  0.32.0
•	Python version: 3.12
•	System: Debian GNU/Linux
•	GPU: A6000

Who can help?

@sayakpaul @yiyixuxu @DN6

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions