Skip to content

Conversation

@yiyixuxu
Copy link
Collaborator

@yiyixuxu yiyixuxu commented Nov 25, 2024

test canny

import torch
from diffusers import StableDiffusion3ControlNetPipeline
from diffusers.models import SD3ControlNetModel
from diffusers.utils import load_image
from diffusers.image_processor import VaeImageProcessor

device = "cuda:3"
dtype = torch.float16

# load pipeline

class SD3CannyImageProcessor(VaeImageProcessor):
    def __init__(self):
        super().__init__(do_normalize=False)
    def preprocess(self, image, **kwargs):
        image = super().preprocess(image, **kwargs)
        image = image * 255 * 0.5 + 0.5
        return image
    def postprocess(self, image, do_denormalize=True, **kwargs):
        do_denormalize = [True] * image.shape[0]
        image = super().postprocess(image, **kwargs, do_denormalize=do_denormalize)
        return image


control_image_processor = SD3CannyImageProcessor()
controlnet = SD3ControlNetModel.from_pretrained("diffusers-internal-dev/sd35-controlnet-canny-8b", torch_dtype=dtype)
pipe = StableDiffusion3ControlNetPipeline.from_pretrained(
    "stabilityai/stable-diffusion-3.5-large",
    controlnet=controlnet,
    torch_dtype=dtype
)
pipe.to(device)
pipe.image_processor = control_image_processor

# config
control_image = load_image("https://huggingface.co/datasets/YiYiXu/testing-images/resolve/main/canny.png")
prompt =  "A Night time photo taken by Leica M11, portrait of a Japanese woman in a kimono, looking at the camera, Cherry blossoms"
max_sequence_length = 77

# to reproduce result in our example
generator = torch.Generator(device="cpu").manual_seed(23)
image = pipe(
    prompt, 
    control_image=control_image, 
    controlnet_conditioning_scale=1.0,
    guidance_scale=3.5,
    num_inference_steps=60,
    generator=generator,
    max_sequence_length=max_sequence_length,
).images[0]
image.save(f'yiyi_test_2_out_{max_sequence_length}.jpg')

test depth

import torch
from diffusers import StableDiffusion3ControlNetPipeline
from diffusers.models import SD3ControlNetModel
from diffusers.utils import load_image

device = "cuda:3"
dtype = torch.float16

# load pipeline
controlnet = SD3ControlNetModel.from_pretrained("diffusers-internal-dev/sd35-controlnet-depth-8b", torch_dtype=dtype)
pipe = StableDiffusion3ControlNetPipeline.from_pretrained(
    "stabilityai/stable-diffusion-3.5-large",
    controlnet=controlnet,
    torch_dtype=dtype
)
pipe.to(device)

# config
control_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/marigold/marigold_einstein_lcm_depth.png")
prompt = "a photo of a man"
max_sequence_length = 77

# to reproduce result in our example
generator = torch.Generator(device="cpu").manual_seed(0)
image = pipe(
    prompt, 
    control_image=control_image, 
    controlnet_conditioning_scale=1.0,
    guidance_scale=4.5,
    num_inference_steps=40,
    generator=generator,
    max_sequence_length=max_sequence_length,
).images[0]
image.save(f'yiyi_test_1_out_{max_sequence_length}.jpg')

test blur

# test blur
import torch
from diffusers import StableDiffusion3ControlNetPipeline
from diffusers.models import SD3ControlNetModel
from diffusers.utils import load_image

device = "cuda:3"
dtype = torch.float16

# load pipeline
controlnet = SD3ControlNetModel.from_pretrained("diffusers-internal-dev/sd35-controlnet-blur-8b", torch_dtype=dtype)
pipe = StableDiffusion3ControlNetPipeline.from_pretrained(
    "stabilityai/stable-diffusion-3.5-large",
    controlnet=controlnet,
    torch_dtype=dtype
)
pipe.to(device)

# config
control_image = load_image("https://huggingface.co/datasets/YiYiXu/testing-images/resolve/main/blur.png")
print(f" control image size: {control_image.size}")
prompt = "generated ai art, a tiny, lost rubber ducky in an action shot close-up, surfing the humongous waves, inside the tube, in the style of Kelly Slater"
max_sequence_length = 77

# to reproduce result in our example
generator = torch.Generator(device="cpu").manual_seed(0)
image = pipe(
    prompt, 
    control_image=control_image, 
    controlnet_conditioning_scale=1.0,
    guidance_scale=3.5,
    num_inference_steps=60,
    generator=generator,
    max_sequence_length=max_sequence_length,
).images[0]
image.save(f'yiyi_test_3_out_{max_sequence_length}.jpg')

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link
Member

@sayakpaul sayakpaul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My comments are quite minor and not merge-blockers.

I guess we could add docs and tests merge merging.

for i in range(num_layers)
]
)
if joint_attention_dim is not None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a good enough condition for now. Because based joint_attention_dim we initialize both the context_embedded and transformer_blocks (that have the JointTransformerBlock type). I am okay with it.

Comment on lines +345 to +354
# SD3.5 8b controlnet does not have a `pos_embed`,
# it use the `pos_embed` from the transformer to process input before passing to controlnet
elif self.pos_embed is None and hidden_states.ndim != 3:
raise ValueError("hidden_states must be 3D when pos_embed is not used")

if self.context_embedder is not None and encoder_hidden_states is None:
raise ValueError("encoder_hidden_states must be provided when context_embedder is used")
# SD3.5 8b controlnet does not have a `context_embedder`, it does not use `encoder_hidden_states`
elif self.context_embedder is None and encoder_hidden_states is not None:
raise ValueError("encoder_hidden_states should not be provided when context_embedder is not used")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very useful!

@maybe_allow_in_graph
class SD3SingleTransformerBlock(nn.Module):
r"""
A Single Transformer block as part of the MMDiT architecture, used in Stable Diffusion 3 ControlNet.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we could make this more explicit by:

controlnet_config = (
self.controlnet.config
if isinstance(self.controlnet, SD3ControlNetModel)
else self.controlnet.nets[0].config
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this is okay for now but could there be a case where we may have incompatibility configs when len(self.controlnet.nets) > 1? I guess we will know when we will know.

Copy link
Contributor

@a-r-r-o-w a-r-r-o-w left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not completely familiar with SD3 so all changes seem correct to me so far and inference works well 😅

LGTM!

@yiyixuxu yiyixuxu merged commit 75bd1e8 into main Nov 27, 2024
18 checks passed
@yiyixuxu yiyixuxu deleted the sd35-control branch November 27, 2024 20:44
@yiyixuxu
Copy link
Collaborator Author

I will add tests once the weights PR are merged

@vladmandic
Copy link
Contributor

vladmandic commented Nov 27, 2024

this pr adds support for from_pretrained method, but actual stabiliyai models were released as single-file-safetensors and that does not work:

using from_single_file results in:

ValueError: FromOriginalModelMixin is currently only compatible with StableCascadeUNet, UNet2DConditionModel, AutoencoderKL, ControlNetModel, SD3Transformer2DModel, MotionAdapter, SparseControlNetModel, FluxTransformer2DModel

when adding new support, it should be in ORIGINAL format published by authors first, then internally-converted stuff.
right now, this PR adds support for something that does not exist - loading from stabilityai controlnet from diffusers repo
(or should we really point users to use diffusers-internal-dev INSTEAD of originally published models from StabilityAI).

@yiyixuxu
Copy link
Collaborator Author

yiyixuxu commented Nov 27, 2024

@vladmandic

we try to support single-file format as best as we can but will not prioritize single-file format over diffusers format.
The checkpoints PR will be merged soon, for now the script here can be used #10020 (comment)

when adding new support, it should be in ORIGINAL format published by authors first, then internally-converted stuff.

@vladmandic
Copy link
Contributor

vladmandic commented Nov 28, 2024

it also fails in current form for any type of offloading - which is kind of a necessity given the model size.

/home/vlado/dev/sdnext/venv/lib/python3.12/site-packages/diffusers/pipelines/controlnet_sd3/pipeline_stable_diffusion_3_controlnet.py:1047 in call

│   1046 │   │   │   │   │   # sd35 (offical) 8b controlnet
│ ❱ 1047 │   │   │   │   │   controlnet_model_input = self.transformer.pos_embed(latent_model_input)

RuntimeError: Input type (CUDABFloat16Type) and weight type (CPUBFloat16Type) should be the same

@yiyixuxu
Copy link
Collaborator Author

@vladmandic ahh thanks! indeed

@yiyixuxu
Copy link
Collaborator Author

@vladmandic ohh, actually, but controlnet was never able to be offloaded because it was used at same time with transformers

model_cpu_offload_seq = "text_encoder->image_encoder->unet->vae"

@vladmandic
Copy link
Contributor

somehow, i dont have such problems with InstantX/SD3-Controlnet-Canny or other InstantX controlnets for SD3.
but it could be due to fact that they are 6x smaller than SAI models.
anyhow, any controlnet for sd35 or flux if pointless if it doesn't support offloading due to sheer model size.

@yiyixuxu
Copy link
Collaborator Author

oh make sense, we will look into this

@yiyixuxu yiyixuxu added the roadmap Add to current release roadmap label Dec 4, 2024
sayakpaul added a commit that referenced this pull request Dec 23, 2024
* add model/pipeline

Co-authored-by: Sayak Paul <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

roadmap Add to current release roadmap

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants