Skip to content

Commit d3d500b

Browse files
committed
batch inf
1 parent b362a4f commit d3d500b

File tree

2 files changed

+44
-135
lines changed

2 files changed

+44
-135
lines changed

docs/source/en/using-diffusers/batched_inference.md

Lines changed: 40 additions & 131 deletions
Original file line numberDiff line numberDiff line change
@@ -16,43 +16,7 @@ Batch inference processes multiple prompts at a time to increase throughput. It
1616

1717
The downside is increased latency because you must wait for the entire batch to complete, and more GPU memory is required for large batches.
1818

19-
<hfoptions id="usage">
20-
<hfoption id="text-to-image">
21-
22-
For text-to-image, pass a list of prompts to the pipeline.
23-
24-
```py
25-
import torch
26-
from diffusers import DiffusionPipeline
27-
28-
pipeline = DiffusionPipeline.from_pretrained(
29-
"stabilityai/stable-diffusion-xl-base-1.0",
30-
torch_dtype=torch.float16
31-
).to("cuda")
32-
33-
prompts = [
34-
"cinematic photo of A beautiful sunset over mountains, 35mm photograph, film, professional, 4k, highly detailed",
35-
"cinematic film still of a cat basking in the sun on a roof in Turkey, highly detailed, high budget hollywood movie, cinemascope, moody, epic, gorgeous, film grain",
36-
"pixel-art a cozy coffee shop interior, low-res, blocky, pixel art style, 8-bit graphics"
37-
]
38-
39-
images = pipeline(
40-
prompt=prompts,
41-
).images
42-
43-
fig, axes = plt.subplots(2, 2, figsize=(12, 12))
44-
axes = axes.flatten()
45-
46-
for i, image in enumerate(images):
47-
axes[i].imshow(image)
48-
axes[i].set_title(f"Image {i+1}")
49-
axes[i].axis('off')
50-
51-
plt.tight_layout()
52-
plt.show()
53-
```
54-
55-
To generate multiple variations of one prompt, use the `num_images_per_prompt` argument.
19+
For text-to-image, pass a list of prompts to the pipeline and for image-to-image, pass a list of images and prompts to the pipeline. The example below demonstrates batched text-to-image inference.
5620

5721
```py
5822
import torch
@@ -61,78 +25,19 @@ from diffusers import DiffusionPipeline
6125

6226
pipeline = DiffusionPipeline.from_pretrained(
6327
"stabilityai/stable-diffusion-xl-base-1.0",
64-
torch_dtype=torch.float16
65-
).to("cuda")
66-
67-
images = pipeline(
68-
prompt="pixel-art a cozy coffee shop interior, low-res, blocky, pixel art style, 8-bit graphics",
69-
num_images_per_prompt=4
70-
).images
71-
72-
fig, axes = plt.subplots(2, 2, figsize=(12, 12))
73-
axes = axes.flatten()
74-
75-
for i, image in enumerate(images):
76-
axes[i].imshow(image)
77-
axes[i].set_title(f"Image {i+1}")
78-
axes[i].axis('off')
79-
80-
plt.tight_layout()
81-
plt.show()
82-
```
83-
84-
Combine both approaches to generate different variations of different prompts.
85-
86-
```py
87-
images = pipeline(
88-
prompt=prompts,
89-
num_images_per_prompt=2,
90-
).images
91-
92-
fig, axes = plt.subplots(2, 2, figsize=(12, 12))
93-
axes = axes.flatten()
94-
95-
for i, image in enumerate(images):
96-
axes[i].imshow(image)
97-
axes[i].set_title(f"Image {i+1}")
98-
axes[i].axis('off')
99-
100-
plt.tight_layout()
101-
plt.show()
102-
```
103-
104-
</hfoption>
105-
<hfoption id="image-to-image">
106-
107-
For image-to-image, pass a list of input images and prompts to the pipeline.
108-
109-
```py
110-
import torch
111-
from diffusers.utils import load_image
112-
from diffusers import DiffusionPipeline
113-
114-
pipeline = DiffusionPipeline.from_pretrained(
115-
"stabilityai/stable-diffusion-xl-base-1.0",
116-
torch_dtype=torch.float16
117-
).to("cuda")
118-
119-
input_images = [
120-
load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/inpaint.png"),
121-
load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/cat.png"),
122-
load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/detail-prompt.png")
123-
]
28+
torch_dtype=torch.float16,
29+
device_map="cuda"
30+
)
12431

12532
prompts = [
126-
"cinematic photo of a beautiful sunset over mountains, 35mm photograph, film, professional, 4k, highly detailed",
127-
"cinematic film still of a cat basking in the sun on a roof in Turkey, highly detailed, high budget hollywood movie, cinemascope, moody, epic, gorgeous, film grain",
128-
"pixel-art a cozy coffee shop interior, low-res, blocky, pixel art style, 8-bit graphics"
33+
"Cinematic shot of a cozy coffee shop interior, warm pastel light streaming through a window where a cat rests. Shallow depth of field, glowing cups in soft focus, dreamy lofi-inspired mood, nostalgic tones, framed like a quiet film scene.",
34+
"Polaroid-style photograph of a cozy coffee shop interior, bathed in warm pastel light. A cat sits on the windowsill near steaming mugs. Soft, slightly faded tones and dreamy blur evoke nostalgia, a lofi mood, and the intimate, imperfect charm of instant film.",
35+
"Soft watercolor illustration of a cozy coffee shop interior, pastel washes of color filling the space. A cat rests peacefully on the windowsill as warm light glows through. Gentle brushstrokes create a dreamy, lofi-inspired atmosphere with whimsical textures and nostalgic calm.",
36+
"Isometric pixel-art illustration of a cozy coffee shop interior in detailed 8-bit style. Warm pastel light fills the space as a cat rests on the windowsill. Blocky furniture and tiny mugs add charm, low-res retro graphics enhance the nostalgic, lofi-inspired game aesthetic."
12937
]
13038

13139
images = pipeline(
13240
prompt=prompts,
133-
image=input_images,
134-
guidance_scale=8.0,
135-
strength=0.5
13641
).images
13742

13843
fig, axes = plt.subplots(2, 2, figsize=(12, 12))
@@ -147,24 +52,31 @@ plt.tight_layout()
14752
plt.show()
14853
```
14954

55+
<div class="flex justify-center">
56+
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/batch-inference.png"/>
57+
</div>
58+
15059
To generate multiple variations of one prompt, use the `num_images_per_prompt` argument.
15160

15261
```py
15362
import torch
15463
import matplotlib.pyplot as plt
155-
from diffusers.utils import load_image
15664
from diffusers import DiffusionPipeline
15765

15866
pipeline = DiffusionPipeline.from_pretrained(
15967
"stabilityai/stable-diffusion-xl-base-1.0",
160-
torch_dtype=torch.float16
161-
).to("cuda")
68+
torch_dtype=torch.float16,
69+
device_map="cuda"
70+
)
16271

163-
input_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/detail-prompt.png")
72+
prompt="""
73+
Isometric pixel-art illustration of a cozy coffee shop interior in detailed 8-bit style. Warm pastel light fills the
74+
space as a cat rests on the windowsill. Blocky furniture and tiny mugs add charm, low-res retro graphics enhance the
75+
nostalgic, lofi-inspired game aesthetic.
76+
"""
16477

16578
images = pipeline(
166-
prompt="pixel-art a cozy coffee shop interior, low-res, blocky, pixel art style, 8-bit graphics",
167-
image=input_image,
79+
prompt=prompt,
16880
num_images_per_prompt=4
16981
).images
17082

@@ -180,26 +92,19 @@ plt.tight_layout()
18092
plt.show()
18193
```
18294

95+
<div class="flex justify-center">
96+
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/batch-inference-2.png"/>
97+
</div>
98+
18399
Combine both approaches to generate different variations of different prompts.
184100

185101
```py
186-
input_images = [
187-
load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/cat.png"),
188-
load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/detail-prompt.png")
189-
]
190-
191-
prompts = [
192-
"cinematic film still of a cat basking in the sun on a roof in Turkey, highly detailed, high budget hollywood movie, cinemascope, moody, epic, gorgeous, film grain",
193-
"pixel-art a cozy coffee shop interior, low-res, blocky, pixel art style, 8-bit graphics"
194-
]
195-
196102
images = pipeline(
197103
prompt=prompts,
198-
image=input_images,
199104
num_images_per_prompt=2,
200105
).images
201106

202-
fig, axes = plt.subplots(2, 2, figsize=(12, 12))
107+
fig, axes = plt.subplots(2, 4, figsize=(12, 12))
203108
axes = axes.flatten()
204109

205110
for i, image in enumerate(images):
@@ -211,16 +116,18 @@ plt.tight_layout()
211116
plt.show()
212117
```
213118

214-
</hfoption>
215-
</hfoptions>
119+
<div class="flex justify-center">
120+
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/batch-inference-3.png"/>
121+
</div>
216122

217123
## Deterministic generation
218124

219125
Enable reproducible batch generation by passing a list of [Generator’s](https://pytorch.org/docs/stable/generated/torch.Generator.html) to the pipeline and tie each `Generator` to a seed to reuse it.
220126

221-
Use a list comprehension to iterate over the batch size specified in `range()` to create a unique `Generator` object for each image in the batch.
127+
> [!TIP]
128+
> Refer to the [Reproducibility](./reusing_seeds) docs to learn more about deterministic algorithms and the `Generator` object.
222129
223-
Don't multiply the `Generator` by the batch size because that only creates one `Generator` object that is used sequentially for each image in the batch.
130+
Use a list comprehension to iterate over the batch size specified in `range()` to create a unique `Generator` object for each image in the batch. Don't multiply the `Generator` by the batch size because that only creates one `Generator` object that is used sequentially for each image in the batch.
224131

225132
```py
226133
generator = [torch.Generator(device="cuda").manual_seed(0)] * 3
@@ -234,14 +141,16 @@ from diffusers import DiffusionPipeline
234141

235142
pipeline = DiffusionPipeline.from_pretrained(
236143
"stabilityai/stable-diffusion-xl-base-1.0",
237-
torch_dtype=torch.float16
238-
).to("cuda")
144+
torch_dtype=torch.float16,
145+
device_map="cuda"
146+
)
239147

240148
generator = [torch.Generator(device="cuda").manual_seed(i) for i in range(3)]
241149
prompts = [
242-
"cinematic photo of A beautiful sunset over mountains, 35mm photograph, film, professional, 4k, highly detailed",
243-
"cinematic film still of a cat basking in the sun on a roof in Turkey, highly detailed, high budget hollywood movie, cinemascope, moody, epic, gorgeous, film grain",
244-
"pixel-art a cozy coffee shop interior, low-res, blocky, pixel art style, 8-bit graphics"
150+
"Cinematic shot of a cozy coffee shop interior, warm pastel light streaming through a window where a cat rests. Shallow depth of field, glowing cups in soft focus, dreamy lofi-inspired mood, nostalgic tones, framed like a quiet film scene.",
151+
"Polaroid-style photograph of a cozy coffee shop interior, bathed in warm pastel light. A cat sits on the windowsill near steaming mugs. Soft, slightly faded tones and dreamy blur evoke nostalgia, a lofi mood, and the intimate, imperfect charm of instant film.",
152+
"Soft watercolor illustration of a cozy coffee shop interior, pastel washes of color filling the space. A cat rests peacefully on the windowsill as warm light glows through. Gentle brushstrokes create a dreamy, lofi-inspired atmosphere with whimsical textures and nostalgic calm.",
153+
"Isometric pixel-art illustration of a cozy coffee shop interior in detailed 8-bit style. Warm pastel light fills the space as a cat rests on the windowsill. Blocky furniture and tiny mugs add charm, low-res retro graphics enhance the nostalgic, lofi-inspired game aesthetic."
245154
]
246155

247156
images = pipeline(
@@ -261,4 +170,4 @@ plt.tight_layout()
261170
plt.show()
262171
```
263172

264-
You can use this to iteratively select an image associated with a seed and then improve on it by crafting a more detailed prompt.
173+
You can use this to select an image associated with a seed and iteratively improve on it by crafting a more detailed prompt.

docs/source/en/using-diffusers/weighted_prompts.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -22,9 +22,9 @@ This guide covers general best practices for writing prompts and introduce a few
2222

2323
A good prompt foundation should include the following elements.
2424

25-
1. <span class="underline decoration-wavy decoration-blue-500 decoration-2 underline-offset-4">Subject</span> is what you want to generate an image or video of. It is the main focus and you should generally begin your prompt with the subject.
26-
2. <span class="underline decoration-wavy decoration-purple-500 decoration-2 underline-offset-4">Style</span> describes the medium or aesthetic of the image or video. What do you want it to look like?
27-
3. <span class="underline decoration-wavy decoration-green-500 decoration-2 underline-offset-4">Context</span> adds details to the image or video. For example, what is the subject doing and what is the setting and mood?
25+
1. <span class="underline decoration-sky-500 decoration-2 underline-offset-4">Subject</span> is what you want to generate an image or video of. It is the main focus and you should generally begin your prompt with the subject.
26+
2. <span class="underline decoration-pink-500 decoration-2 underline-offset-4">Style</span> describes the medium or aesthetic of the image or video. What do you want it to look like?
27+
3. <span class="underline decoration-green-500 decoration-2 underline-offset-4">Context</span> adds details to the image or video. For example, what is the subject doing and what is the setting and mood?
2828

2929
Combine these elements into a structured narrative instead of a list of keywords. Modern models have powerful text encoders that have better language understanding. Start with a short prompt, and then iterate on it.
3030

@@ -33,7 +33,7 @@ To generate an even better image, enhance the prompt with additional details suc
3333
<div class="flex gap-4">
3434
<div class="flex-1 text-center">
3535
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/ok-prompt.png" class="w-full h-auto object-cover rounded-lg">
36-
<figcaption class="mt-2 text-sm text-gray-500">A <span class="underline decoration-wavy decoration-blue-500 decoration-2 underline-offset-1">cute cat</span> <span class="underline decoration-wavy decoration-green-500 decoration-2 underline-offset-1">lounges on a leaf in a pool during a peaceful summer afternoon</span>, in <span class="underline decoration-wavy decoration-purple-500 decoration-2 underline-offset-1">lofi art style, illustration</span>.</figcaption>
36+
<figcaption class="mt-2 text-sm text-gray-500">A <span class="underline decoration-sky-500 decoration-2 underline-offset-1">cute cat</span> <span class="underline decoration-pink-500 decoration-2 underline-offset-1">lounges on a leaf in a pool during a peaceful summer afternoon</span>, in <span class="underline decoration-green-500 decoration-2 underline-offset-1">lofi art style, illustration</span>.</figcaption>
3737
</div>
3838
<div class="flex-1 text-center">
3939
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/better-prompt.png" class="w-full h-auto object-cover rounded-lg"/>

0 commit comments

Comments
 (0)