How exactly does Conditioning (Concat) work on a lower level? #1436

coreyryanhanson · 2023-09-06T19:34:34Z

coreyryanhanson
Sep 6, 2023

There isn't much documentation about the Conditioning (Concat) node. With it, you can bypass the 77 token limit passing in multiple prompts (replicating the behavior from the BREAK token used in Automatic1111 ), but how do these prompts actually interact with each other? Will Stable Diffusion:

Try to satisfy each condition tensor independently like an "AND" statement (Implying that chaining them together would be able to perform intersections on the denoising possibility space.)?
Do some form of matrix arithmetic between each condition before ingesting it in the model in a less variable form?
Be left completely up to interpretation from the denoising model (ie various degrees of concatted conditions have been passed through in the training data and no one effectively understands how exactly they work)?

I ran a few A/B tests to get a better idea of what is happening under the hood, but I don't have a good answer so far.

For example, this is the result of a simple prompt of "a dog":

This is the same seed and hyperparameters but with "a dog" concatted with an empty CLIP Text Encode

Doing the same thing but with "a dog" concatted with an empty CLIP Text Encode 4x

What is interesting is that not only does the image change with the inclusion of the condition of an empty string, but that it also changes to a much more minute degree when the empty string is passed multiple times. How can this be explained exactly? I guess a related question is also does an empty string restrict the possible outputs of a model and exert its own bias to the distribution of possible images?

ghostsquad · 2023-12-16T22:50:24Z

ghostsquad
Dec 16, 2023

I'd love to know more about this as well.

0 replies

poisenbery · 2023-12-18T01:14:39Z

poisenbery
Dec 18, 2023

Last I checked, it just does this: https://pytorch.org/docs/stable/generated/torch.cat.html

Normally, stablediffusion works by turning your entire prompt into a vector embedding for it to understand, but AI is stupid and doesn't understand things very well because it smooshes everything together and sometimes will bleed words/concepts into places where it was not specified.

Concat lets you break the prompt into "chunks" by making them separate entries. It's very useful for things like colors or character composition.

Example: if you combine 2 and 3, you get 5. But, if you give someone 5, they won't know that you started with 2 and 3, so they'll have a tendency to only make 5.
If you concat 2 and 3, you get [2],[3]. So if you give it to someone, they CAN make 5 if they want, or they can make [3],[2] or [2],[3]

I made a node setup that allows you to test prompts for a better visual understanding of how it functions: https://civitai.com/models/230634?modelVersionId=261739

If you want some advice on how to learn these things:
Generating images is generally a waste of time if you haven't studied the actual mathematical functions of what is going on. You're essentially having to guess at what is going on with 1 metric (the image).

In your example with the dog, you essentially told the AI that unconditional conditioning vectors were to be taken into consideration alongside the "Dog" prompt. I think this is why the composition improved overall, but it's just speculation on my part (Limbs were in correct place and paws were anatomically correct. I don't think that is coincidence that giving the AI more freedom allowed it to clean up the image).

2 replies

YacratesWyh Jan 15, 2024

Then why does ctrl net using torch.cat already?

poisenbery Jan 15, 2024

It means that the ControlNet conditioning vectors are being treated as significant entities alongside the prompt. It makes perfect sense when you consider what ControlNet does.

Think about it: You would not want your prompt to be blended with controlnet information. You want the Unet to treat the controlnet as an individual concept, so it can use it alongside your prompt.

This kind of explains why mixing conditioning with controlnets doesn't produce good results.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

How exactly does Conditioning (Concat) work on a lower level? #1436

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments 2 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Select a reply

Uh oh!

How exactly does Conditioning (Concat) work on a lower level? #1436

Uh oh!

coreyryanhanson Sep 6, 2023

Replies: 2 comments · 2 replies

Uh oh!

ghostsquad Dec 16, 2023

Uh oh!

Uh oh!

poisenbery Dec 18, 2023

Uh oh!

Uh oh!

YacratesWyh Jan 15, 2024

Uh oh!

Uh oh!

poisenbery Jan 15, 2024

coreyryanhanson
Sep 6, 2023

Replies: 2 comments 2 replies

ghostsquad
Dec 16, 2023

poisenbery
Dec 18, 2023