Skip to content

Commit 78b2a5d

Browse files
committed
sd_embed
1 parent 680a8ed commit 78b2a5d

File tree

1 file changed

+132
-174
lines changed

1 file changed

+132
-174
lines changed

docs/source/en/using-diffusers/weighted_prompts.md

Lines changed: 132 additions & 174 deletions
Original file line numberDiff line numberDiff line change
@@ -215,144 +215,100 @@ image
215215

216216
Prompt weighting provides a way to emphasize or de-emphasize certain parts of a prompt, allowing for more control over the generated image. A prompt can include several concepts, which gets turned into contextualized text embeddings. The embeddings are used by the model to condition its cross-attention layers to generate an image (read the Stable Diffusion [blog post](https://huggingface.co/blog/stable_diffusion) to learn more about how it works).
217217

218-
Prompt weighting works by increasing or decreasing the scale of the text embedding vector that corresponds to its concept in the prompt because you may not necessarily want the model to focus on all concepts equally. The easiest way to prepare the prompt-weighted embeddings is to use [Compel](https://github.com/damian0815/compel), a text prompt-weighting and blending library. Once you have the prompt-weighted embeddings, you can pass them to any pipeline that has a [`prompt_embeds`](https://huggingface.co/docs/diffusers/en/api/pipelines/stable_diffusion/text2img#diffusers.StableDiffusionPipeline.__call__.prompt_embeds) (and optionally [`negative_prompt_embeds`](https://huggingface.co/docs/diffusers/en/api/pipelines/stable_diffusion/text2img#diffusers.StableDiffusionPipeline.__call__.negative_prompt_embeds)) parameter, such as [`StableDiffusionPipeline`], [`StableDiffusionControlNetPipeline`], and [`StableDiffusionXLPipeline`].
218+
Prompt weighting works by increasing or decreasing the scale of the text embedding vector that corresponds to its concept in the prompt because you may not necessarily want the model to focus on all concepts equally. The easiest way to prepare the prompt embeddings is to use [Stable Diffusion Long Prompt Weighted Embedding](https://github.com/xhinker/sd_embed) (sd_embed). Once you have the prompt-weighted embeddings, you can pass them to any pipeline that has a [prompt_embeds](https://huggingface.co/docs/diffusers/en/api/pipelines/stable_diffusion/text2img#diffusers.StableDiffusionPipeline.__call__.prompt_embeds) (and optionally [negative_prompt_embeds](https://huggingface.co/docs/diffusers/en/api/pipelines/stable_diffusion/text2img#diffusers.StableDiffusionPipeline.__call__.negative_prompt_embeds)) parameter, such as [`StableDiffusionPipeline`], [`StableDiffusionControlNetPipeline`], and [`StableDiffusionXLPipeline`].
219219

220220
<Tip>
221221

222222
If your favorite pipeline doesn't have a `prompt_embeds` parameter, please open an [issue](https://github.com/huggingface/diffusers/issues/new/choose) so we can add it!
223223

224224
</Tip>
225225

226-
This guide will show you how to weight and blend your prompts with Compel in 🤗 Diffusers.
226+
This guide will show you how to weight your prompts with sd_embed.
227227

228-
Before you begin, make sure you have the latest version of Compel installed:
228+
Before you begin, make sure you have the latest version of sd_embed installed:
229229

230-
```py
231-
# uncomment to install in Colab
232-
#!pip install compel --upgrade
230+
```bash
231+
pip install git+https://github.com/xhinker/sd_embed.git@main
233232
```
234233

235-
For this guide, let's generate an image with the prompt `"a red cat playing with a ball"` using the [`StableDiffusionPipeline`]:
234+
For this example, let's use [`StableDiffusionXLPipeline`].
236235

237236
```py
238-
from diffusers import StableDiffusionPipeline, UniPCMultistepScheduler
237+
from diffusers import StableDiffusionXLPipeline, UniPCMultistepScheduler
239238
import torch
240239

241-
pipe = StableDiffusionPipeline.from_pretrained("CompVis/stable-diffusion-v1-4", use_safetensors=True)
240+
pipe = StableDiffusionXLPipeline.from_pretrained("Lykon/dreamshaper-xl-1-0", torch_dtype=torch.float16)
242241
pipe.scheduler = UniPCMultistepScheduler.from_config(pipe.scheduler.config)
243242
pipe.to("cuda")
244-
245-
prompt = "a red cat playing with a ball"
246-
247-
generator = torch.Generator(device="cpu").manual_seed(33)
248-
249-
image = pipe(prompt, generator=generator, num_inference_steps=20).images[0]
250-
image
251-
```
252-
253-
<div class="flex justify-center">
254-
<img class="rounded-xl" src="https://huggingface.co/datasets/hf-internal-testing/diffusers-images/resolve/main/compel/forest_0.png"/>
255-
</div>
256-
257-
### Weighting
258-
259-
You'll notice there is no "ball" in the image! Let's use compel to upweight the concept of "ball" in the prompt. Create a [`Compel`](https://github.com/damian0815/compel/blob/main/doc/compel.md#compel-objects) object, and pass it a tokenizer and text encoder:
260-
261-
```py
262-
from compel import Compel
263-
264-
compel_proc = Compel(tokenizer=pipe.tokenizer, text_encoder=pipe.text_encoder)
265243
```
266244

267-
compel uses `+` or `-` to increase or decrease the weight of a word in the prompt. To increase the weight of "ball":
245+
To upweight or downweight a concept, surround the text with parentheses. More parentheses applies a heavier weight on the text. You can also append a numerical multiplier to the text to indicate how much you want to increase or decrease its weights by.
268246

269-
<Tip>
270-
271-
`+` corresponds to the value `1.1`, `++` corresponds to `1.1^2`, and so on. Similarly, `-` corresponds to `0.9` and `--` corresponds to `0.9^2`. Feel free to experiment with adding more `+` or `-` in your prompt!
247+
| format | multiplier |
248+
|---|---|
249+
| `(hippo)` | increase by 1.1x |
250+
| `((hippo))` | increase by 1.21x |
251+
| `(hippo:1.5)` | increase by 1.5x |
252+
| `(hippo:0.5)` | decrease by 4x |
272253

273-
</Tip>
254+
Create a prompt and use a combination of parentheses and numerical multipliers to upweight various text.
274255

275256
```py
276-
prompt = "a red cat playing with a ball++"
277-
```
278-
279-
Pass the prompt to `compel_proc` to create the new prompt embeddings which are passed to the pipeline:
280-
281-
```py
282-
prompt_embeds = compel_proc(prompt)
283-
generator = torch.manual_seed(33)
284-
285-
image = pipe(prompt_embeds=prompt_embeds, generator=generator, num_inference_steps=20).images[0]
286-
image
287-
```
288-
289-
<div class="flex justify-center">
290-
<img class="rounded-xl" src="https://huggingface.co/datasets/hf-internal-testing/diffusers-images/resolve/main/compel/forest_1.png"/>
291-
</div>
292-
293-
To downweight parts of the prompt, use the `-` suffix:
294-
295-
```py
296-
prompt = "a red------- cat playing with a ball"
297-
prompt_embeds = compel_proc(prompt)
298-
299-
generator = torch.manual_seed(33)
300-
301-
image = pipe(prompt_embeds=prompt_embeds, generator=generator, num_inference_steps=20).images[0]
302-
image
303-
```
304-
305-
<div class="flex justify-center">
306-
<img class="rounded-xl" src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/compel-neg.png"/>
307-
</div>
308-
309-
You can even up or downweight multiple concepts in the same prompt:
310-
311-
```py
312-
prompt = "a red cat++ playing with a ball----"
313-
prompt_embeds = compel_proc(prompt)
314-
315-
generator = torch.manual_seed(33)
316-
317-
image = pipe(prompt_embeds=prompt_embeds, generator=generator, num_inference_steps=20).images[0]
318-
image
257+
from sd_embed.embedding_funcs import get_weighted_text_embeddings_sdxl
258+
259+
prompt = """A whimsical and creative image depicting a hybrid creature that is a mix of a waffle and a hippopotamus.
260+
This imaginative creature features the distinctive, bulky body of a hippo,
261+
but with a texture and appearance resembling a golden-brown, crispy waffle.
262+
The creature might have elements like waffle squares across its skin and a syrup-like sheen.
263+
It's set in a surreal environment that playfully combines a natural water habitat of a hippo with elements of a breakfast table setting,
264+
possibly including oversized utensils or plates in the background.
265+
The image should evoke a sense of playful absurdity and culinary fantasy.
266+
"""
267+
268+
neg_prompt = """\
269+
skin spots,acnes,skin blemishes,age spot,(ugly:1.2),(duplicate:1.2),(morbid:1.21),(mutilated:1.2),\
270+
(tranny:1.2),mutated hands,(poorly drawn hands:1.5),blurry,(bad anatomy:1.2),(bad proportions:1.3),\
271+
extra limbs,(disfigured:1.2),(missing arms:1.2),(extra legs:1.2),(fused fingers:1.5),\
272+
(too many fingers:1.5),(unclear eyes:1.2),lowers,bad hands,missing fingers,extra digit,\
273+
bad hands,missing fingers,(extra arms and legs),(worst quality:2),(low quality:2),\
274+
(normal quality:2),lowres,((monochrome)),((grayscale))
275+
"""
319276
```
320277

321-
<div class="flex justify-center">
322-
<img class="rounded-xl" src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/compel-pos-neg.png"/>
323-
</div>
324-
325-
### Blending
326-
327-
You can also create a weighted *blend* of prompts by adding `.blend()` to a list of prompts and passing it some weights. Your blend may not always produce the result you expect because it breaks some assumptions about how the text encoder functions, so just have fun and experiment with it!
278+
Use the `get_weighted_text_embeddings_sdxl` function to generate the prompt embeddings and the negative prompt embeddings. It'll also generated the pooled and negative pooled prompt embeddings since you're using the SDXL model.
328279

329280
```py
330-
prompt_embeds = compel_proc('("a red cat playing with a ball", "jungle").blend(0.7, 0.8)')
331-
generator = torch.Generator(device="cuda").manual_seed(33)
281+
(
282+
prompt_embeds,
283+
prompt_neg_embeds,
284+
pooled_prompt_embeds,
285+
negative_pooled_prompt_embeds
286+
) = get_weighted_text_embeddings_sdxl(
287+
pipe,
288+
prompt=prompt,
289+
neg_prompt=neg_prompt
290+
)
332291

333-
image = pipe(prompt_embeds=prompt_embeds, generator=generator, num_inference_steps=20).images[0]
292+
image = pipe(
293+
prompt_embeds=prompt_embeds,
294+
negative_prompt_embeds=prompt_neg_embeds,
295+
pooled_prompt_embeds=pooled_prompt_embeds,
296+
negative_pooled_prompt_embeds=negative_pooled_prompt_embeds,
297+
num_inference_steps=30,
298+
height=1024,
299+
width=1024 + 512,
300+
guidance_scale=4.0,
301+
generator=torch.Generator("cuda").manual_seed(2)
302+
).images[0]
334303
image
335304
```
336305

337-
<div class="flex justify-center">
338-
<img class="rounded-xl" src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/compel-blend.png"/>
306+
<div class="flex jsutify-center">
307+
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/sd_embed_sdxl.png"/>
339308
</div>
340309

341-
### Conjunction
342-
343-
A conjunction diffuses each prompt independently and concatenates their results by their weighted sum. Add `.and()` to the end of a list of prompts to create a conjunction:
344-
345-
```py
346-
prompt_embeds = compel_proc('["a red cat", "playing with a", "ball"].and()')
347-
generator = torch.Generator(device="cuda").manual_seed(55)
348-
349-
image = pipe(prompt_embeds=prompt_embeds, generator=generator, num_inference_steps=20).images[0]
350-
image
351-
```
352-
353-
<div class="flex justify-center">
354-
<img class="rounded-xl" src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/compel-conj.png"/>
355-
</div>
310+
> [!TIP]
311+
> Refer to the [sd_embed](https://github.com/xhinker/sd_embed) repository for additional details about long prompt weighting for FLUX.1, Stable Cascade, and Stable Diffusion 1.5.
356312
357313
### Textual inversion
358314

@@ -363,35 +319,63 @@ Create a pipeline and use the [`~loaders.TextualInversionLoaderMixin.load_textua
363319
```py
364320
import torch
365321
from diffusers import StableDiffusionPipeline
366-
from compel import Compel, DiffusersTextualInversionManager
367322

368323
pipe = StableDiffusionPipeline.from_pretrained(
369-
"stable-diffusion-v1-5/stable-diffusion-v1-5", torch_dtype=torch.float16,
370-
use_safetensors=True, variant="fp16").to("cuda")
324+
"stable-diffusion-v1-5/stable-diffusion-v1-5",
325+
torch_dtype=torch.float16,
326+
).to("cuda")
371327
pipe.load_textual_inversion("sd-concepts-library/midjourney-style")
372328
```
373329

374-
Compel provides a `DiffusersTextualInversionManager` class to simplify prompt weighting with textual inversion. Instantiate `DiffusersTextualInversionManager` and pass it to the `Compel` class:
330+
Add the `<midjourney-style>` text to the prompt to trigger the textual inversion.
375331

376332
```py
377-
textual_inversion_manager = DiffusersTextualInversionManager(pipe)
378-
compel_proc = Compel(
379-
tokenizer=pipe.tokenizer,
380-
text_encoder=pipe.text_encoder,
381-
textual_inversion_manager=textual_inversion_manager)
333+
from sd_embed.embedding_funcs import get_weighted_text_embeddings_sd15
334+
335+
prompt = """<midjourney-style> A whimsical and creative image depicting a hybrid creature that is a mix of a waffle and a hippopotamus.
336+
This imaginative creature features the distinctive, bulky body of a hippo,
337+
but with a texture and appearance resembling a golden-brown, crispy waffle.
338+
The creature might have elements like waffle squares across its skin and a syrup-like sheen.
339+
It's set in a surreal environment that playfully combines a natural water habitat of a hippo with elements of a breakfast table setting,
340+
possibly including oversized utensils or plates in the background.
341+
The image should evoke a sense of playful absurdity and culinary fantasy.
342+
"""
343+
344+
neg_prompt = """\
345+
skin spots,acnes,skin blemishes,age spot,(ugly:1.2),(duplicate:1.2),(morbid:1.21),(mutilated:1.2),\
346+
(tranny:1.2),mutated hands,(poorly drawn hands:1.5),blurry,(bad anatomy:1.2),(bad proportions:1.3),\
347+
extra limbs,(disfigured:1.2),(missing arms:1.2),(extra legs:1.2),(fused fingers:1.5),\
348+
(too many fingers:1.5),(unclear eyes:1.2),lowers,bad hands,missing fingers,extra digit,\
349+
bad hands,missing fingers,(extra arms and legs),(worst quality:2),(low quality:2),\
350+
(normal quality:2),lowres,((monochrome)),((grayscale))
351+
"""
382352
```
383353

384-
Incorporate the concept to condition a prompt with using the `<concept>` syntax:
354+
Use the `get_weighted_text_embeddings_sd15` function to generate the prompt embeddings and the negative prompt embeddings.
385355

386356
```py
387-
prompt_embeds = compel_proc('("A red cat++ playing with a ball <midjourney-style>")')
357+
(
358+
prompt_embeds,
359+
prompt_neg_embeds,
360+
) = get_weighted_text_embeddings_sd15(
361+
pipe,
362+
prompt=prompt,
363+
neg_prompt=neg_prompt
364+
)
388365

389-
image = pipe(prompt_embeds=prompt_embeds).images[0]
366+
image = pipe(
367+
prompt_embeds=prompt_embeds,
368+
negative_prompt_embeds=prompt_neg_embeds,
369+
height=768,
370+
width=896,
371+
guidance_scale=4.0,
372+
generator=torch.Generator("cuda").manual_seed(2)
373+
).images[0]
390374
image
391375
```
392376

393377
<div class="flex justify-center">
394-
<img class="rounded-xl" src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/compel-text-inversion.png"/>
378+
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/sd_embed_textual_inversion.png"/>
395379
</div>
396380

397381
### DreamBooth
@@ -401,70 +385,44 @@ image
401385
```py
402386
import torch
403387
from diffusers import DiffusionPipeline, UniPCMultistepScheduler
404-
from compel import Compel
405388

406389
pipe = DiffusionPipeline.from_pretrained("sd-dreambooth-library/dndcoverart-v1", torch_dtype=torch.float16).to("cuda")
407390
pipe.scheduler = UniPCMultistepScheduler.from_config(pipe.scheduler.config)
408391
```
409392

410-
Create a `Compel` class with a tokenizer and text encoder, and pass your prompt to it. Depending on the model you use, you'll need to incorporate the model's unique identifier into your prompt. For example, the `dndcoverart-v1` model uses the identifier `dndcoverart`:
393+
Depending on the model you use, you'll need to incorporate the model's unique identifier into your prompt. For example, the `dndcoverart-v1` model uses the identifier `dndcoverart`:
411394

412395
```py
413-
compel_proc = Compel(tokenizer=pipe.tokenizer, text_encoder=pipe.text_encoder)
414-
prompt_embeds = compel_proc('("magazine cover of a dndcoverart dragon, high quality, intricate details, larry elmore art style").and()')
415-
image = pipe(prompt_embeds=prompt_embeds).images[0]
416-
image
417-
```
418-
419-
<div class="flex justify-center">
420-
<img class="rounded-xl" src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/compel-dreambooth.png"/>
421-
</div>
422-
423-
### Stable Diffusion XL
424-
425-
Stable Diffusion XL (SDXL) has two tokenizers and text encoders so it's usage is a bit different. To address this, you should pass both tokenizers and encoders to the `Compel` class:
426-
427-
```py
428-
from compel import Compel, ReturnedEmbeddingsType
429-
from diffusers import DiffusionPipeline
430-
from diffusers.utils import make_image_grid
431-
import torch
432-
433-
pipeline = DiffusionPipeline.from_pretrained(
434-
"stabilityai/stable-diffusion-xl-base-1.0",
435-
variant="fp16",
436-
use_safetensors=True,
437-
torch_dtype=torch.float16
438-
).to("cuda")
439-
440-
compel = Compel(
441-
tokenizer=[pipeline.tokenizer, pipeline.tokenizer_2] ,
442-
text_encoder=[pipeline.text_encoder, pipeline.text_encoder_2],
443-
returned_embeddings_type=ReturnedEmbeddingsType.PENULTIMATE_HIDDEN_STATES_NON_NORMALIZED,
444-
requires_pooled=[False, True]
396+
from sd_embed.embedding_funcs import get_weighted_text_embeddings_sd15
397+
398+
prompt = """dndcoverart of A whimsical and creative image depicting a hybrid creature that is a mix of a waffle and a hippopotamus.
399+
This imaginative creature features the distinctive, bulky body of a hippo,
400+
but with a texture and appearance resembling a golden-brown, crispy waffle.
401+
The creature might have elements like waffle squares across its skin and a syrup-like sheen.
402+
It's set in a surreal environment that playfully combines a natural water habitat of a hippo with elements of a breakfast table setting,
403+
possibly including oversized utensils or plates in the background.
404+
The image should evoke a sense of playful absurdity and culinary fantasy.
405+
"""
406+
407+
neg_prompt = """\
408+
skin spots,acnes,skin blemishes,age spot,(ugly:1.2),(duplicate:1.2),(morbid:1.21),(mutilated:1.2),\
409+
(tranny:1.2),mutated hands,(poorly drawn hands:1.5),blurry,(bad anatomy:1.2),(bad proportions:1.3),\
410+
extra limbs,(disfigured:1.2),(missing arms:1.2),(extra legs:1.2),(fused fingers:1.5),\
411+
(too many fingers:1.5),(unclear eyes:1.2),lowers,bad hands,missing fingers,extra digit,\
412+
bad hands,missing fingers,(extra arms and legs),(worst quality:2),(low quality:2),\
413+
(normal quality:2),lowres,((monochrome)),((grayscale))
414+
"""
415+
416+
(
417+
prompt_embeds
418+
, prompt_neg_embeds
419+
) = get_weighted_text_embeddings_sd15(
420+
pipe
421+
, prompt = prompt
422+
, neg_prompt = neg_prompt
445423
)
446424
```
447425

448-
This time, let's upweight "ball" by a factor of 1.5 for the first prompt, and downweight "ball" by 0.6 for the second prompt. The [`StableDiffusionXLPipeline`] also requires [`pooled_prompt_embeds`](https://huggingface.co/docs/diffusers/en/api/pipelines/stable_diffusion/stable_diffusion_xl#diffusers.StableDiffusionXLInpaintPipeline.__call__.pooled_prompt_embeds) (and optionally [`negative_pooled_prompt_embeds`](https://huggingface.co/docs/diffusers/en/api/pipelines/stable_diffusion/stable_diffusion_xl#diffusers.StableDiffusionXLInpaintPipeline.__call__.negative_pooled_prompt_embeds)) so you should pass those to the pipeline along with the conditioning tensors:
449-
450-
```py
451-
# apply weights
452-
prompt = ["a red cat playing with a (ball)1.5", "a red cat playing with a (ball)0.6"]
453-
conditioning, pooled = compel(prompt)
454-
455-
# generate image
456-
generator = [torch.Generator().manual_seed(33) for _ in range(len(prompt))]
457-
images = pipeline(prompt_embeds=conditioning, pooled_prompt_embeds=pooled, generator=generator, num_inference_steps=30).images
458-
make_image_grid(images, rows=1, cols=2)
459-
```
460-
461-
<div class="flex gap-4">
462-
<div>
463-
<img class="rounded-xl" src="https://huggingface.co/datasets/hf-internal-testing/diffusers-images/resolve/main/compel/sdxl_ball1.png"/>
464-
<figcaption class="mt-2 text-center text-sm text-gray-500">"a red cat playing with a (ball)1.5"</figcaption>
465-
</div>
466-
<div>
467-
<img class="rounded-xl" src="https://huggingface.co/datasets/hf-internal-testing/diffusers-images/resolve/main/compel/sdxl_ball2.png"/>
468-
<figcaption class="mt-2 text-center text-sm text-gray-500">"a red cat playing with a (ball)0.6"</figcaption>
469-
</div>
426+
<div class="flex justify-center">
427+
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/sd_embed_dreambooth.png"/>
470428
</div>

0 commit comments

Comments
 (0)