-
Notifications
You must be signed in to change notification settings - Fork 6.5k
Closed
Labels
bugSomething isn't workingSomething isn't working
Description
Describe the bug
When calling the StableDiffusionXLPipeline and passing the prompts as embeddings using the prompt_embeds/negative_prompt_embeds parameters, an error is generated requiring the pooled_prompt_embeds/negative_pooled_prompt_embeds parameters also to be passed.
There seems to be no documentation as to what these parameters should be (how the embeddings should be pooled). The documentation of the encode_prompt function states the following:
pooled_prompt_embeds (`torch.FloatTensor`, *optional*):
Pre-generated pooled text embeddings. Can be used to easily tweak text inputs, *e.g.* prompt weighting.
If not provided, pooled text embeddings will be generated from `prompt` input argument.
negative_pooled_prompt_embeds (`torch.FloatTensor`, *optional*):
Pre-generated negative pooled text embeddings. Can be used to easily tweak text inputs, *e.g.* prompt
weighting. If not provided, pooled negative_prompt_embeds will be generated from `negative_prompt`
input argument.
However, the check_inputs function is flagging an error when the pooled embeddings are not passed, and I couldn't find any code that automatically generates the pooled embeddings:
if prompt_embeds is not None and pooled_prompt_embeds is None:
raise ValueError(
"If `prompt_embeds` are provided, `pooled_prompt_embeds` also have to be passed. Make sure to generate `pooled_prompt_embeds` from the same text encoder that was used to generate `prompt_embeds`."
)
if negative_prompt_embeds is not None and negative_pooled_prompt_embeds is None:
raise ValueError(
"If `negative_prompt_embeds` are provided, `negative_pooled_prompt_embeds` also have to be passed. Make sure to generate `negative_pooled_prompt_embeds` from the same text encoder that was used to generate `negative_prompt_embeds`."
)
Reproduction
from diffusers import DiffusionPipeline
from compel import Compel
pipe = DiffusionPipeline.from_pretrained("stabilityai/stable-diffusion-xl-base-0.9", torch_dtype=torch.float16, use_safetensors=True, variant="fp16")
pipe.to("cuda")
compel = Compel(tokenizer=pipe.tokenizer, text_encoder=pipe.text_encoder, truncate_long_prompts=False)
prompt = "Testing - this is just some dummy text to emulate a given prompt input which exceeds the token limit of CLIP, which is a fixed 77. For this reason I am trying to use the Compel library to circumvent the prompt truncation." # very long prompt
conditioning= compel.build_conditioning_tensor(prompt)
negative_conditioning= "" # a negative prompt is required, even if empty
negative_prompt = compel.build_conditioning_tensor(negative_prompt)
[conditioning, negative_conditioning] = compel.pad_conditioning_tensors_to_same_length([conditioning, negative_conditioning])
unrefined_img = pipe(prompt_embeds=conditioning, negative_prompt_embeds=negative_conditioning, output_type="latent").images
Logs
No response
System Info
diffusersversion: 0.18.1- Platform: Linux-5.15.0-69-generic-x86_64-with-glibc2.31
- Python version: 3.10.12
- PyTorch version (GPU?): 2.0.1+cu118 (True)
- Huggingface_hub version: 0.16.4
- Transformers version: 4.30.2
- Accelerate version: 0.20.3
- xFormers version: not installed
- Using GPU in script?: Yes (2xGPUs)
Who can help?
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working