Skip to content

[SD-XL] Passing prompt_embeds/negative_prompt_embeds requires also passing pooled_prompt_embeds/negative_pooled_prompt_embeds #4043

@m-a-r-v-i-n

Description

@m-a-r-v-i-n

Describe the bug

When calling the StableDiffusionXLPipeline and passing the prompts as embeddings using the prompt_embeds/negative_prompt_embeds parameters, an error is generated requiring the pooled_prompt_embeds/negative_pooled_prompt_embeds parameters also to be passed.
There seems to be no documentation as to what these parameters should be (how the embeddings should be pooled). The documentation of the encode_prompt function states the following:

pooled_prompt_embeds (`torch.FloatTensor`, *optional*):
                Pre-generated pooled text embeddings. Can be used to easily tweak text inputs, *e.g.* prompt weighting.
                If not provided, pooled text embeddings will be generated from `prompt` input argument. 

negative_pooled_prompt_embeds (`torch.FloatTensor`, *optional*):
                Pre-generated negative pooled text embeddings. Can be used to easily tweak text inputs, *e.g.* prompt
                weighting. If not provided, pooled negative_prompt_embeds will be generated from `negative_prompt`
                input argument.

However, the check_inputs function is flagging an error when the pooled embeddings are not passed, and I couldn't find any code that automatically generates the pooled embeddings:

if prompt_embeds is not None and pooled_prompt_embeds is None:
            raise ValueError(
                "If `prompt_embeds` are provided, `pooled_prompt_embeds` also have to be passed. Make sure to generate `pooled_prompt_embeds` from the same text encoder that was used to generate `prompt_embeds`."
            )

        if negative_prompt_embeds is not None and negative_pooled_prompt_embeds is None:
            raise ValueError(
                "If `negative_prompt_embeds` are provided, `negative_pooled_prompt_embeds` also have to be passed. Make sure to generate `negative_pooled_prompt_embeds` from the same text encoder that was used to generate `negative_prompt_embeds`."
            )

Reproduction

from diffusers import DiffusionPipeline
from compel import Compel

pipe = DiffusionPipeline.from_pretrained("stabilityai/stable-diffusion-xl-base-0.9", torch_dtype=torch.float16, use_safetensors=True, variant="fp16")
pipe.to("cuda")
compel = Compel(tokenizer=pipe.tokenizer, text_encoder=pipe.text_encoder, truncate_long_prompts=False)

prompt = "Testing - this is just some dummy text to emulate a given prompt input which exceeds the token limit of CLIP, which is a fixed 77. For this reason I am trying to use the Compel library to circumvent the prompt truncation." # very long prompt

conditioning= compel.build_conditioning_tensor(prompt)
negative_conditioning= "" # a negative prompt is required, even if empty
negative_prompt = compel.build_conditioning_tensor(negative_prompt)
[conditioning, negative_conditioning] = compel.pad_conditioning_tensors_to_same_length([conditioning, negative_conditioning])

unrefined_img = pipe(prompt_embeds=conditioning, negative_prompt_embeds=negative_conditioning, output_type="latent").images

Logs

No response

System Info

  • diffusers version: 0.18.1
  • Platform: Linux-5.15.0-69-generic-x86_64-with-glibc2.31
  • Python version: 3.10.12
  • PyTorch version (GPU?): 2.0.1+cu118 (True)
  • Huggingface_hub version: 0.16.4
  • Transformers version: 4.30.2
  • Accelerate version: 0.20.3
  • xFormers version: not installed
  • Using GPU in script?: Yes (2xGPUs)

Who can help?

@patrickvonplaten
@sayakpaul
@williamberman

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions