Skip to content

Add support for Multiple ControlNetXSAdapters in SDXL pipeline #12100

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

naomili0924
Copy link

@naomili0924 naomili0924 commented Aug 8, 2025

What does this PR do?

This PR is addressing the feature request from an open good-first issue: #8434 It extends the current controlnet adaptor logic to support multiple controlnet adaptors injected into diffusion model.

Before this change, StableDiffusionXLControlNetXSPipeline loads UNet base model and only supports a single point injection from only one controlnet, as shown below.


pipe = StableDiffusionXLControlNetXSPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-1.0", controlnet=depth_controlnet, torch_dtype=torch.float16
).to("cuda")

With this change, we allows the StableDiffusionXLControlNetXSPipeline to take a new UnetConditionModel named MultiControlUnetConditionModel to load weights from multiple controlnets and inject every single controlnet output to base model through zero convolution layers.

pipe = StableDiffusionXLControlNetXSPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-1.0", controlnet=[depth_controlnet, canny_controlnet], torch_dtype=torch.float16
).to("cuda")

Due to we didn't find any test repo to verify this change, we provide the following code and used it to verify our change (not included in this repo).

import torch
from diffusers.models.autoencoders import AutoencoderKL
from diffusers.models.controlnets import ControlNetXSAdapter
from diffusers.pipelines.controlnet_xs import StableDiffusionXLControlNetXSPipeline
from PIL import Image
from diffusers.utils import load_image
import cv2
import numpy as np

vae = AutoencoderKL.from_pretrained("madebyollin/sdxl-vae-fp16-fix", torch_dtype=torch.float16)
canny_controlnet = ControlNetXSAdapter.from_pretrained(
    "UmerHA/Testing-ConrolNetXS-SDXL-canny", torch_dtype=torch.float16
).to("cuda")
depth_controlnet = ControlNetXSAdapter.from_pretrained(
    "UmerHA/Testing-ConrolNetXS-SDXL-depth", torch_dtype=torch.float16
).to("cuda")

pipe = StableDiffusionXLControlNetXSPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-1.0", controlnet=[depth_controlnet, canny_controlnet], torch_dtype=torch.float16
).to("cuda")

# Load your conditioning images
raw_image = load_image("/content/drive/MyDrive/diffusers/src/test.jpg").convert("RGB")

# Generate Canny edge
def get_canny(image):
    image_np = np.array(image.resize((512, 512)))
    image_gray = cv2.cvtColor(image_np, cv2.COLOR_RGB2GRAY)
    edges = cv2.Canny(image_gray, 100, 200)
    edges = np.stack([edges] * 3, axis=-1)  # Make it 3-channel
    return Image.fromarray(edges)

# Generate a fake depth map (in real cases use a depth estimator)
def get_fake_depth(image):
    image_np = np.array(image.resize((512, 512)))
    gray = cv2.cvtColor(image_np, cv2.COLOR_RGB2GRAY)
    depth = np.stack([gray] * 3, axis=-1)
    return Image.fromarray(depth)

canny_image = get_canny(raw_image)
depth_image = get_fake_depth(raw_image)

# Run inference
prompt = "A flower"
output = pipe(
    prompt=prompt,
    image=[canny_image, depth_image],  # order matches controlnet list
    controlnet_conditioning_scale = [0.2, 0.8],
    num_inference_steps=30,
    generator=torch.manual_seed(0),
)

output.images[0].save("output.png")

Design Details:

Stage 1: Prepare embeddings for base Unet and ControlNets

Each controlnet has its own controlnet_cond_embedding module and control_to_base_for_conv_in module to calculate control embeddings and add the embedding onto h_base.

With the new change, we will have one h_base (input for base unet model) and a list of h_ctrls (inputs for controlnets) with the same length of controlnets after this stage.

image

Stage 2: Up and Mid Unet and ControlNet blocks

Each controlnet has its own base_to_control and control_to_base convolution layers, and the number of base_to_control and control_to_base layer is the same as the number of base UNet layers.

For each layer, we concate h_ctrl with b2c(h_base) as the input for controlnet. We probability need to retrain the b2c model because the h_base is a linear combination of original h_base and all the h_ctrl from all the controlnets. (previously, h_base is only contains h_base and one controlnet output).

After each resnet and attention block, we add weighted linear combination c2b(h_ctrls) to the h_base.

After this stage, we're going to have one h_base and a list of h_ctrl as the stage one.

image

Stage 3: Decoding Stage

In the decoding stage, we only use control_to_base convolution layers from each controlnet.

In the following image, zero convolution layers from each controlnet are grouped by layer, the zero convolution layers with the same dashed colors are in the same group. The residual output are connected with lines in the same color.

For each layer, the controlnet residual were added to the h_base by zero convolution layers and weighted.

After adding weighted controlnet residuals, the h_base were passed to resnet and attention model to decode images. We won't use and actually sometimes we don't have the resnet+attention blocks from each controlnet upblocks here.

image

Before submitting

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@naomili0924 naomili0924 changed the title Add support for list of ControlNetXSAdapters in SDXL pipeline Add support for Multiple ControlNetXSAdapters in SDXL pipeline Aug 8, 2025
@DN6
Copy link
Collaborator

DN6 commented Aug 12, 2025

Hi @naomili0924 thank you for putting the time into this, unfortunately we had to deprecate ControlnetXS due to low usage (You'll notice that the pipeline inherits from DeprecatedPipelineMixin.). We are not actively updating/adding features to it at this time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants