Sd35 controlnet #10020

yiyixuxu · 2024-11-25T23:55:13Z

test canny

import torch
from diffusers import StableDiffusion3ControlNetPipeline
from diffusers.models import SD3ControlNetModel
from diffusers.utils import load_image
from diffusers.image_processor import VaeImageProcessor

device = "cuda:3"
dtype = torch.float16

# load pipeline

class SD3CannyImageProcessor(VaeImageProcessor):
    def __init__(self):
        super().__init__(do_normalize=False)
    def preprocess(self, image, **kwargs):
        image = super().preprocess(image, **kwargs)
        image = image * 255 * 0.5 + 0.5
        return image
    def postprocess(self, image, do_denormalize=True, **kwargs):
        do_denormalize = [True] * image.shape[0]
        image = super().postprocess(image, **kwargs, do_denormalize=do_denormalize)
        return image


control_image_processor = SD3CannyImageProcessor()
controlnet = SD3ControlNetModel.from_pretrained("diffusers-internal-dev/sd35-controlnet-canny-8b", torch_dtype=dtype)
pipe = StableDiffusion3ControlNetPipeline.from_pretrained(
    "stabilityai/stable-diffusion-3.5-large",
    controlnet=controlnet,
    torch_dtype=dtype
)
pipe.to(device)
pipe.image_processor = control_image_processor

# config
control_image = load_image("https://huggingface.co/datasets/YiYiXu/testing-images/resolve/main/canny.png")
prompt =  "A Night time photo taken by Leica M11, portrait of a Japanese woman in a kimono, looking at the camera, Cherry blossoms"
max_sequence_length = 77

# to reproduce result in our example
generator = torch.Generator(device="cpu").manual_seed(23)
image = pipe(
    prompt, 
    control_image=control_image, 
    controlnet_conditioning_scale=1.0,
    guidance_scale=3.5,
    num_inference_steps=60,
    generator=generator,
    max_sequence_length=max_sequence_length,
).images[0]
image.save(f'yiyi_test_2_out_{max_sequence_length}.jpg')

test depth

import torch
from diffusers import StableDiffusion3ControlNetPipeline
from diffusers.models import SD3ControlNetModel
from diffusers.utils import load_image

device = "cuda:3"
dtype = torch.float16

# load pipeline
controlnet = SD3ControlNetModel.from_pretrained("diffusers-internal-dev/sd35-controlnet-depth-8b", torch_dtype=dtype)
pipe = StableDiffusion3ControlNetPipeline.from_pretrained(
    "stabilityai/stable-diffusion-3.5-large",
    controlnet=controlnet,
    torch_dtype=dtype
)
pipe.to(device)

# config
control_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/marigold/marigold_einstein_lcm_depth.png")
prompt = "a photo of a man"
max_sequence_length = 77

# to reproduce result in our example
generator = torch.Generator(device="cpu").manual_seed(0)
image = pipe(
    prompt, 
    control_image=control_image, 
    controlnet_conditioning_scale=1.0,
    guidance_scale=4.5,
    num_inference_steps=40,
    generator=generator,
    max_sequence_length=max_sequence_length,
).images[0]
image.save(f'yiyi_test_1_out_{max_sequence_length}.jpg')

test blur

# test blur
import torch
from diffusers import StableDiffusion3ControlNetPipeline
from diffusers.models import SD3ControlNetModel
from diffusers.utils import load_image

device = "cuda:3"
dtype = torch.float16

# load pipeline
controlnet = SD3ControlNetModel.from_pretrained("diffusers-internal-dev/sd35-controlnet-blur-8b", torch_dtype=dtype)
pipe = StableDiffusion3ControlNetPipeline.from_pretrained(
    "stabilityai/stable-diffusion-3.5-large",
    controlnet=controlnet,
    torch_dtype=dtype
)
pipe.to(device)

# config
control_image = load_image("https://huggingface.co/datasets/YiYiXu/testing-images/resolve/main/blur.png")
print(f" control image size: {control_image.size}")
prompt = "generated ai art, a tiny, lost rubber ducky in an action shot close-up, surfing the humongous waves, inside the tube, in the style of Kelly Slater"
max_sequence_length = 77

# to reproduce result in our example
generator = torch.Generator(device="cpu").manual_seed(0)
image = pipe(
    prompt, 
    control_image=control_image, 
    controlnet_conditioning_scale=1.0,
    guidance_scale=3.5,
    num_inference_steps=60,
    generator=generator,
    max_sequence_length=max_sequence_length,
).images[0]
image.save(f'yiyi_test_3_out_{max_sequence_length}.jpg')

HuggingFaceDocBuilderDev · 2024-11-26T00:01:48Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

…sd35-control

src/diffusers/models/controlnets/controlnet_sd3.py

sayakpaul

My comments are quite minor and not merge-blockers.

I guess we could add docs and tests merge merging.

sayakpaul · 2024-11-27T03:50:37Z

src/diffusers/models/controlnets/controlnet_sd3.py

-                for i in range(num_layers)
-            ]
-        )
+        if joint_attention_dim is not None:


I think this is a good enough condition for now. Because based joint_attention_dim we initialize both the context_embedded and transformer_blocks (that have the JointTransformerBlock type). I am okay with it.

sayakpaul · 2024-11-27T03:51:31Z

src/diffusers/models/controlnets/controlnet_sd3.py

+        # SD3.5 8b controlnet does not have a `pos_embed`,
+        # it use the `pos_embed` from the transformer to process input before passing to controlnet
+        elif self.pos_embed is None and hidden_states.ndim != 3:
+            raise ValueError("hidden_states must be 3D when pos_embed is not used")
+
+        if self.context_embedder is not None and encoder_hidden_states is None:
+            raise ValueError("encoder_hidden_states must be provided when context_embedder is used")
+        # SD3.5 8b controlnet does not have a `context_embedder`, it does not use `encoder_hidden_states`
+        elif self.context_embedder is None and encoder_hidden_states is not None:
+            raise ValueError("encoder_hidden_states should not be provided when context_embedder is not used")


Very useful!

sayakpaul · 2024-11-27T03:53:18Z

src/diffusers/models/transformers/transformer_sd3.py

+@maybe_allow_in_graph
+class SD3SingleTransformerBlock(nn.Module):
+    r"""
+    A Single Transformer block as part of the MMDiT architecture, used in Stable Diffusion 3 ControlNet.


Perhaps we could make this more explicit by:

Specifying it's needed for the SD3.5 ControlNet.

Providing a reference https://stability.ai/news/sd3-5-large-controlnets

sayakpaul · 2024-11-27T03:54:58Z

src/diffusers/pipelines/controlnet_sd3/pipeline_stable_diffusion_3_controlnet.py

+        controlnet_config = (
+            self.controlnet.config
+            if isinstance(self.controlnet, SD3ControlNetModel)
+            else self.controlnet.nets[0].config


I guess this is okay for now but could there be a case where we may have incompatibility configs when len(self.controlnet.nets) > 1? I guess we will know when we will know.

a-r-r-o-w

I'm not completely familiar with SD3 so all changes seem correct to me so far and inference works well 😅

LGTM!

…sd35-control

yiyixuxu · 2024-11-27T20:48:06Z

I will add tests once the weights PR are merged

vladmandic · 2024-11-27T21:06:01Z

this pr adds support for from_pretrained method, but actual stabiliyai models were released as single-file-safetensors and that does not work:

using from_single_file results in:

ValueError: FromOriginalModelMixin is currently only compatible with StableCascadeUNet, UNet2DConditionModel, AutoencoderKL, ControlNetModel, SD3Transformer2DModel, MotionAdapter, SparseControlNetModel, FluxTransformer2DModel

when adding new support, it should be in ORIGINAL format published by authors first, then internally-converted stuff.
right now, this PR adds support for something that does not exist - loading from stabilityai controlnet from diffusers repo
(or should we really point users to use diffusers-internal-dev INSTEAD of originally published models from StabilityAI).

yiyixuxu · 2024-11-27T23:09:44Z

@vladmandic

we try to support single-file format as best as we can but will not prioritize single-file format over diffusers format.
The checkpoints PR will be merged soon, for now the script here can be used #10020 (comment)

when adding new support, it should be in ORIGINAL format published by authors first, then internally-converted stuff.

vladmandic · 2024-11-28T13:45:15Z

it also fails in current form for any type of offloading - which is kind of a necessity given the model size.

/home/vlado/dev/sdnext/venv/lib/python3.12/site-packages/diffusers/pipelines/controlnet_sd3/pipeline_stable_diffusion_3_controlnet.py:1047 in call

│   1046 │   │   │   │   │   # sd35 (offical) 8b controlnet
│ ❱ 1047 │   │   │   │   │   controlnet_model_input = self.transformer.pos_embed(latent_model_input)

RuntimeError: Input type (CUDABFloat16Type) and weight type (CPUBFloat16Type) should be the same

yiyixuxu · 2024-11-28T18:04:33Z

@vladmandic ahh thanks! indeed

yiyixuxu · 2024-11-28T18:07:12Z

@vladmandic ohh, actually, but controlnet was never able to be offloaded because it was used at same time with transformers

diffusers/src/diffusers/pipelines/controlnet/pipeline_controlnet.py

Line 199 in fdec8bd

model_cpu_offload_seq = "text_encoder->image_encoder->unet->vae"

vladmandic · 2024-11-28T18:10:29Z

somehow, i dont have such problems with InstantX/SD3-Controlnet-Canny or other InstantX controlnets for SD3.
but it could be due to fact that they are 6x smaller than SAI models.
anyhow, any controlnet for sd35 or flux if pointless if it doesn't support offloading due to sheer model size.

yiyixuxu · 2024-11-28T18:27:15Z

oh make sense, we will look into this

* add model/pipeline Co-authored-by: Sayak Paul <[email protected]>

yiyixuxu added 2 commits November 25, 2024 23:38

add model

e1e14f9

add pipeline

6f6f0d7

yiyixuxu added 5 commits November 26, 2024 09:51

add shift

cbe5a42

Merge branch 'main' into sd35-control

c87e4a3

Merge branch 'sd35-control' of github.com:huggingface/diffusers into …

77dadd3

…sd35-control

fix

c5150de

fix so backward compatible

f9103b1

yiyixuxu commented Nov 27, 2024

View reviewed changes

src/diffusers/models/controlnets/controlnet_sd3.py Outdated Show resolved Hide resolved

Update src/diffusers/models/controlnets/controlnet_sd3.py

f93efef

yiyixuxu requested review from a-r-r-o-w and sayakpaul November 27, 2024 02:32

Merge branch 'main' into sd35-control

2502a0c

sayakpaul approved these changes Nov 27, 2024

View reviewed changes

a-r-r-o-w approved these changes Nov 27, 2024

View reviewed changes

yiyixuxu added 2 commits November 27, 2024 09:10

add conversion script

54fb3bc

Merge branch 'sd35-control' of github.com:huggingface/diffusers into …

6a6456b

…sd35-control

yiyixuxu merged commit 75bd1e8 into main Nov 27, 2024
18 checks passed

yiyixuxu deleted the sd35-control branch November 27, 2024 20:44

yiyixuxu added the roadmap Add to current release roadmap label Dec 4, 2024

sayakpaul added a commit that referenced this pull request Dec 23, 2024

Sd35 controlnet (#10020)

c5f6fb9

* add model/pipeline Co-authored-by: Sayak Paul <[email protected]>

Uh oh!

Sd35 controlnet #10020

Sd35 controlnet #10020

Uh oh!

Conversation

yiyixuxu commented Nov 25, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

HuggingFaceDocBuilderDev commented Nov 26, 2024

Uh oh!

Uh oh!

sayakpaul left a comment

Choose a reason for hiding this comment

Uh oh!

sayakpaul Nov 27, 2024

Choose a reason for hiding this comment

Uh oh!

sayakpaul Nov 27, 2024

Choose a reason for hiding this comment

Uh oh!

sayakpaul Nov 27, 2024

Choose a reason for hiding this comment

Uh oh!

sayakpaul Nov 27, 2024

Choose a reason for hiding this comment

Uh oh!

a-r-r-o-w left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

yiyixuxu commented Nov 27, 2024

Uh oh!

vladmandic commented Nov 27, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yiyixuxu commented Nov 27, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vladmandic commented Nov 28, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yiyixuxu commented Nov 28, 2024

Uh oh!

yiyixuxu commented Nov 28, 2024

Uh oh!

vladmandic commented Nov 28, 2024

Uh oh!

yiyixuxu commented Nov 28, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

yiyixuxu commented Nov 25, 2024 •

edited

Loading

vladmandic commented Nov 27, 2024 •

edited

Loading

yiyixuxu commented Nov 27, 2024 •

edited

Loading

vladmandic commented Nov 28, 2024 •

edited

Loading