-
Notifications
You must be signed in to change notification settings - Fork 6.5k
Open
Labels
bugSomething isn't workingSomething isn't working
Description
Describe the bug
When attempting to use multiple control images (Depth and Canny) with LoRA on the FLUX.1-dev model, an error occurs during execution. The documentation indicates that multiple control images in PIL format can be supplied, but the pipeline throws a runtime error. Notably, the pipeline functions correctly with a single control image.
Expected Behavior
The pipeline should generate the output image without errors when multiple control images (Depth and Canny) are supplied.
Observed Behavior
The pipeline fails with the error RuntimeError: shape '[1, 16, 64, 2, 64, 2]' is invalid for input of size 524288.
Reproduction
1. Set up the FLUX.1-dev model with multiple control images using LoRA.
2. Use a Depth control image and a Canny control image.
3. Execute the code with the following snippet:
import os
from huggingface_hub import login
from diffusers import FluxControlPipeline
from image_gen_aux import DepthPreprocessor
from diffusers.utils import load_image
from controlnet_aux import CannyDetector
import numpy as np
import torch
# Set Hugging Face directories
os.environ["HF_HOME"] = "/scratch/pramish_paudel/job_108669/hf"
os.environ["HF_DATASETS_CACHE"] = "/scratch/pramish_paudel/job_1086695/hf"
login(token="<REDACTED>")
control_pipe = FluxControlPipeline.from_pretrained("black-forest-labs/FLUX.1-dev", torch_dtype=torch.bfloat16)
control_pipe.load_lora_weights("black-forest-labs/FLUX.1-Depth-dev-lora", adapter_name="depth")
control_pipe.load_lora_weights("black-forest-labs/FLUX.1-Canny-dev-lora", adapter_name="canny")
control_pipe.set_adapters(["depth", "canny"], adapter_weights=[0.85, 0.85])
control_pipe.enable_model_cpu_offload()
prompt = "A robot made of exotic candies and chocolates of different kinds. The background is filled with confetti and celebratory gifts."
control_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/robot.png")
processor = DepthPreprocessor.from_pretrained("LiheYoung/depth-anything-large-hf")
control_image1 = processor(control_image)[0].convert("RGB")
shape = np.asarray(control_image1).shape[0]
processor = CannyDetector()
control_image2 = processor(control_image, low_threshold=50, high_threshold=200, detect_resolution=shape, image_resolution=shape)
image = control_pipe(
prompt=prompt,
control_image=[control_image1, control_image2],
height=1024,
width=1024,
num_inference_steps=30,
guidance_scale=10.0,
generator=torch.Generator().manual_seed(42),
).images[0]
image.save("output.png")
### Logs
```shell
/lib/python3.12/site-packages/diffusers/pipelines/flux/pipeline_flux_control.py", line 474, in _pack_latents
latents = latents.view(batch_size, num_channels_latents, height // 2, 2, width // 2, 2)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: shape '[1, 16, 64, 2, 64, 2]' is invalid for input of size 524288System Info
• diffusers version: 0.32.0
• Python version: 3.12
• System: Debian GNU/Linux
• GPU: A6000
Who can help?
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working