batch inf

stevhliu · stevhliu · commit d3d500b9d16f · 2025-09-10T14:54:37.000-07:00
diff --git a/docs/source/en/using-diffusers/batched_inference.md b/docs/source/en/using-diffusers/batched_inference.md
@@ -16,43 +16,7 @@ Batch inference processes multiple prompts at a time to increase throughput. It
 
 The downside is increased latency because you must wait for the entire batch to complete, and more GPU memory is required for large batches.
 
-<hfoptions id="usage">
-<hfoption id="text-to-image">
-
-For text-to-image, pass a list of prompts to the pipeline.
-
-```py
-import torch
-from diffusers import DiffusionPipeline
-
-pipeline = DiffusionPipeline.from_pretrained(
-    "stabilityai/stable-diffusion-xl-base-1.0",
-    torch_dtype=torch.float16
-).to("cuda")
-
-prompts = [
-    "cinematic photo of A beautiful sunset over mountains, 35mm photograph, film, professional, 4k, highly detailed",
-    "cinematic film still of a cat basking in the sun on a roof in Turkey, highly detailed, high budget hollywood movie, cinemascope, moody, epic, gorgeous, film grain",
-    "pixel-art a cozy coffee shop interior, low-res, blocky, pixel art style, 8-bit graphics"
-]
-
-images = pipeline(
-    prompt=prompts,
-).images
-
-fig, axes = plt.subplots(2, 2, figsize=(12, 12))
-axes = axes.flatten()
-
-for i, image in enumerate(images):
-    axes[i].imshow(image)
-    axes[i].set_title(f"Image {i+1}")
-    axes[i].axis('off')
-
-plt.tight_layout()
-plt.show()
-```
-
-To generate multiple variations of one prompt, use the `num_images_per_prompt` argument.
+For text-to-image, pass a list of prompts to the pipeline and for image-to-image, pass a list of images and prompts to the pipeline. The example below demonstrates batched text-to-image inference.
 
 ```py
 import torch
@@ -61,78 +25,19 @@ from diffusers import DiffusionPipeline
 
 pipeline = DiffusionPipeline.from_pretrained(
     "stabilityai/stable-diffusion-xl-base-1.0",
-    torch_dtype=torch.float16
-).to("cuda")
-
-images = pipeline(
-    prompt="pixel-art a cozy coffee shop interior, low-res, blocky, pixel art style, 8-bit graphics",
-    num_images_per_prompt=4
-).images
-
-fig, axes = plt.subplots(2, 2, figsize=(12, 12))
-axes = axes.flatten()
-
-for i, image in enumerate(images):
-    axes[i].imshow(image)
-    axes[i].set_title(f"Image {i+1}")
-    axes[i].axis('off')
-
-plt.tight_layout()
-plt.show()
-```
-
-Combine both approaches to generate different variations of different prompts.
-
-```py
-images = pipeline(
-    prompt=prompts,
-    num_images_per_prompt=2,
-).images
-
-fig, axes = plt.subplots(2, 2, figsize=(12, 12))
-axes = axes.flatten()
-
-for i, image in enumerate(images):
-    axes[i].imshow(image)
-    axes[i].set_title(f"Image {i+1}")
-    axes[i].axis('off')
-
-plt.tight_layout()
-plt.show()
-```
-
-</hfoption>
-<hfoption id="image-to-image">
-
-For image-to-image, pass a list of input images and prompts to the pipeline.
-
-```py
-import torch
-from diffusers.utils import load_image
-from diffusers import DiffusionPipeline
-
-pipeline = DiffusionPipeline.from_pretrained(
-    "stabilityai/stable-diffusion-xl-base-1.0",
-    torch_dtype=torch.float16
-).to("cuda")
-
-input_images = [
-    load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/inpaint.png"),
-    load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/cat.png"),
-    load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/detail-prompt.png")
-]
+    torch_dtype=torch.float16,
+    device_map="cuda"
+)
 
 prompts = [
-    "cinematic photo of a beautiful sunset over mountains, 35mm photograph, film, professional, 4k, highly detailed",
-    "cinematic film still of a cat basking in the sun on a roof in Turkey, highly detailed, high budget hollywood movie, cinemascope, moody, epic, gorgeous, film grain",
-    "pixel-art a cozy coffee shop interior, low-res, blocky, pixel art style, 8-bit graphics"
+    "Cinematic shot of a cozy coffee shop interior, warm pastel light streaming through a window where a cat rests. Shallow depth of field, glowing cups in soft focus, dreamy lofi-inspired mood, nostalgic tones, framed like a quiet film scene.",
+    "Polaroid-style photograph of a cozy coffee shop interior, bathed in warm pastel light. A cat sits on the windowsill near steaming mugs. Soft, slightly faded tones and dreamy blur evoke nostalgia, a lofi mood, and the intimate, imperfect charm of instant film.",
+    "Soft watercolor illustration of a cozy coffee shop interior, pastel washes of color filling the space. A cat rests peacefully on the windowsill as warm light glows through. Gentle brushstrokes create a dreamy, lofi-inspired atmosphere with whimsical textures and nostalgic calm.",
+    "Isometric pixel-art illustration of a cozy coffee shop interior in detailed 8-bit style. Warm pastel light fills the space as a cat rests on the windowsill. Blocky furniture and tiny mugs add charm, low-res retro graphics enhance the nostalgic, lofi-inspired game aesthetic."
 ]
 
 images = pipeline(
     prompt=prompts,
-    image=input_images,
-    guidance_scale=8.0,
-    strength=0.5
 ).images
 
 fig, axes = plt.subplots(2, 2, figsize=(12, 12))
@@ -147,24 +52,31 @@ plt.tight_layout()
 plt.show()
 ```
 
+<div class="flex justify-center">
+    <img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/batch-inference.png"/>
+</div>
+
 To generate multiple variations of one prompt, use the `num_images_per_prompt` argument.
 
 ```py
 import torch
 import matplotlib.pyplot as plt
-from diffusers.utils import load_image
 from diffusers import DiffusionPipeline
 
 pipeline = DiffusionPipeline.from_pretrained(
     "stabilityai/stable-diffusion-xl-base-1.0",
-    torch_dtype=torch.float16
-).to("cuda")
+    torch_dtype=torch.float16,
+    device_map="cuda"
+)
 
-input_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/detail-prompt.png")
+prompt="""
+Isometric pixel-art illustration of a cozy coffee shop interior in detailed 8-bit style. Warm pastel light fills the
+space as a cat rests on the windowsill. Blocky furniture and tiny mugs add charm, low-res retro graphics enhance the
+nostalgic, lofi-inspired game aesthetic.
+"""
 
 images = pipeline(
-    prompt="pixel-art a cozy coffee shop interior, low-res, blocky, pixel art style, 8-bit graphics",
-    image=input_image,
+    prompt=prompt,
     num_images_per_prompt=4
 ).images
 
@@ -180,26 +92,19 @@ plt.tight_layout()
 plt.show()
 ```
 
+<div class="flex justify-center">
+    <img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/batch-inference-2.png"/>
+</div>
+
 Combine both approaches to generate different variations of different prompts.
 
 ```py
-input_images = [
-    load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/cat.png"),
-    load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/detail-prompt.png")
-]
-
-prompts = [
-    "cinematic film still of a cat basking in the sun on a roof in Turkey, highly detailed, high budget hollywood movie, cinemascope, moody, epic, gorgeous, film grain",
-    "pixel-art a cozy coffee shop interior, low-res, blocky, pixel art style, 8-bit graphics"
-]
-
 images = pipeline(
     prompt=prompts,
-    image=input_images,
     num_images_per_prompt=2,
 ).images
 
-fig, axes = plt.subplots(2, 2, figsize=(12, 12))
+fig, axes = plt.subplots(2, 4, figsize=(12, 12))
 axes = axes.flatten()
 
 for i, image in enumerate(images):
@@ -211,16 +116,18 @@ plt.tight_layout()
 plt.show()
 ```
 
-</hfoption>
-</hfoptions>
+<div class="flex justify-center">
+    <img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/batch-inference-3.png"/>
+</div>
 
 ## Deterministic generation
 
 Enable reproducible batch generation by passing a list of [Generator’s](https://pytorch.org/docs/stable/generated/torch.Generator.html) to the pipeline and tie each `Generator` to a seed to reuse it.
 
-Use a list comprehension to iterate over the batch size specified in `range()` to create a unique `Generator` object for each image in the batch.
+> [!TIP]
+> Refer to the [Reproducibility](./reusing_seeds) docs to learn more about deterministic algorithms and the `Generator` object.
 
-Don't multiply the `Generator` by the batch size because that only creates one `Generator` object that is used sequentially for each image in the batch.
+Use a list comprehension to iterate over the batch size specified in `range()` to create a unique `Generator` object for each image in the batch. Don't multiply the `Generator` by the batch size because that only creates one `Generator` object that is used sequentially for each image in the batch.
 
 ```py
 generator = [torch.Generator(device="cuda").manual_seed(0)] * 3
@@ -234,14 +141,16 @@ from diffusers import DiffusionPipeline
 
 pipeline = DiffusionPipeline.from_pretrained(
     "stabilityai/stable-diffusion-xl-base-1.0",
-    torch_dtype=torch.float16
-).to("cuda")
+    torch_dtype=torch.float16,
+    device_map="cuda"
+)
 
 generator = [torch.Generator(device="cuda").manual_seed(i) for i in range(3)]
 prompts = [
-    "cinematic photo of A beautiful sunset over mountains, 35mm photograph, film, professional, 4k, highly detailed",
-    "cinematic film still of a cat basking in the sun on a roof in Turkey, highly detailed, high budget hollywood movie, cinemascope, moody, epic, gorgeous, film grain",
-    "pixel-art a cozy coffee shop interior, low-res, blocky, pixel art style, 8-bit graphics"
+    "Cinematic shot of a cozy coffee shop interior, warm pastel light streaming through a window where a cat rests. Shallow depth of field, glowing cups in soft focus, dreamy lofi-inspired mood, nostalgic tones, framed like a quiet film scene.",
+    "Polaroid-style photograph of a cozy coffee shop interior, bathed in warm pastel light. A cat sits on the windowsill near steaming mugs. Soft, slightly faded tones and dreamy blur evoke nostalgia, a lofi mood, and the intimate, imperfect charm of instant film.",
+    "Soft watercolor illustration of a cozy coffee shop interior, pastel washes of color filling the space. A cat rests peacefully on the windowsill as warm light glows through. Gentle brushstrokes create a dreamy, lofi-inspired atmosphere with whimsical textures and nostalgic calm.",
+    "Isometric pixel-art illustration of a cozy coffee shop interior in detailed 8-bit style. Warm pastel light fills the space as a cat rests on the windowsill. Blocky furniture and tiny mugs add charm, low-res retro graphics enhance the nostalgic, lofi-inspired game aesthetic."
 ]
 
 images = pipeline(
@@ -261,4 +170,4 @@ plt.tight_layout()
 plt.show()
 ```
 
-You can use this to iteratively select an image associated with a seed and then improve on it by crafting a more detailed prompt.
+You can use this to select an image associated with a seed and iteratively improve on it by crafting a more detailed prompt.
diff --git a/docs/source/en/using-diffusers/weighted_prompts.md b/docs/source/en/using-diffusers/weighted_prompts.md
@@ -22,9 +22,9 @@ This guide covers general best practices for writing prompts and introduce a few
 
 A good prompt foundation should include the following elements.
 
-1. <span class="underline decoration-wavy decoration-blue-500 decoration-2 underline-offset-4">Subject</span> is what you want to generate an image or video of. It is the main focus and you should generally begin your prompt with the subject.
-2. <span class="underline decoration-wavy decoration-purple-500 decoration-2 underline-offset-4">Style</span> describes the medium or aesthetic of the image or video. What do you want it to look like?
-3. <span class="underline decoration-wavy decoration-green-500 decoration-2 underline-offset-4">Context</span> adds details to the image or video. For example, what is the subject doing and what is the setting and mood?
+1. <span class="underline decoration-sky-500 decoration-2 underline-offset-4">Subject</span> is what you want to generate an image or video of. It is the main focus and you should generally begin your prompt with the subject.
+2. <span class="underline decoration-pink-500 decoration-2 underline-offset-4">Style</span> describes the medium or aesthetic of the image or video. What do you want it to look like?
+3. <span class="underline decoration-green-500 decoration-2 underline-offset-4">Context</span> adds details to the image or video. For example, what is the subject doing and what is the setting and mood?
 
 Combine these elements into a structured narrative instead of a list of keywords. Modern models have powerful text encoders that have better language understanding. Start with a short prompt, and then iterate on it.
 
@@ -33,7 +33,7 @@ To generate an even better image, enhance the prompt with additional details suc
 <div class="flex gap-4">
   <div class="flex-1 text-center">
     <img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/ok-prompt.png" class="w-full h-auto object-cover rounded-lg">
-    <figcaption class="mt-2 text-sm text-gray-500">A <span class="underline decoration-wavy decoration-blue-500 decoration-2 underline-offset-1">cute cat</span> <span class="underline decoration-wavy decoration-green-500 decoration-2 underline-offset-1">lounges on a leaf in a pool during a peaceful summer afternoon</span>, in <span class="underline decoration-wavy decoration-purple-500 decoration-2 underline-offset-1">lofi art style, illustration</span>.</figcaption>
+    <figcaption class="mt-2 text-sm text-gray-500">A <span class="underline decoration-sky-500 decoration-2 underline-offset-1">cute cat</span> <span class="underline decoration-pink-500 decoration-2 underline-offset-1">lounges on a leaf in a pool during a peaceful summer afternoon</span>, in <span class="underline decoration-green-500 decoration-2 underline-offset-1">lofi art style, illustration</span>.</figcaption>
   </div>
   <div class="flex-1 text-center">
     <img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/better-prompt.png" class="w-full h-auto object-cover rounded-lg"/>