diff --git a/docs/source/en/api/pipelines/overview.md b/docs/source/en/api/pipelines/overview.md index f34262d37ce0..b5e3825fef6d 100644 --- a/docs/source/en/api/pipelines/overview.md +++ b/docs/source/en/api/pipelines/overview.md @@ -113,3 +113,17 @@ The table below lists all the pipelines currently available in 🤗 Diffusers an ## PushToHubMixin [[autodoc]] utils.PushToHubMixin + +## Callbacks + +[[autodoc]] callbacks.PipelineCallback + +[[autodoc]] callbacks.SDCFGCutoffCallback + +[[autodoc]] callbacks.SDXLCFGCutoffCallback + +[[autodoc]] callbacks.SDXLControlnetCFGCutoffCallback + +[[autodoc]] callbacks.IPAdapterScaleCutoffCallback + +[[autodoc]] callbacks.SD3CFGCutoffCallback diff --git a/docs/source/en/using-diffusers/callback.md b/docs/source/en/using-diffusers/callback.md index e0fa88578425..60b839805ff2 100644 --- a/docs/source/en/using-diffusers/callback.md +++ b/docs/source/en/using-diffusers/callback.md @@ -12,52 +12,37 @@ specific language governing permissions and limitations under the License. # Pipeline callbacks -The denoising loop of a pipeline can be modified with custom defined functions using the `callback_on_step_end` parameter. The callback function is executed at the end of each step, and modifies the pipeline attributes and variables for the next step. This is really useful for *dynamically* adjusting certain pipeline attributes or modifying tensor variables. This versatility allows for interesting use cases such as changing the prompt embeddings at each timestep, assigning different weights to the prompt embeddings, and editing the guidance scale. With callbacks, you can implement new features without modifying the underlying code! +A callback is a function that modifies [`DiffusionPipeline`] behavior and it is executed at the end of a denoising step. The changes are propagated to subsequent steps in the denoising process. It is useful for adjusting pipeline attributes or tensor variables to support new features without rewriting the underlying pipeline code. -> [!TIP] -> 🤗 Diffusers currently only supports `callback_on_step_end`, but feel free to open a [feature request](https://github.com/huggingface/diffusers/issues/new/choose) if you have a cool use-case and require a callback function with a different execution point! +Diffusers provides several callbacks in the pipeline [overview](../api/pipelines/overview#callbacks). -This guide will demonstrate how callbacks work by a few features you can implement with them. +To enable a callback, configure when the callback is executed after a certain number of denoising steps with one of the following arguments. -## Official callbacks +- `cutoff_step_ratio` specifies when a callback is activated as a percentage of the total denoising steps. +- `cutoff_step_index` specifies the exact step number a callback is activated. -We provide a list of callbacks you can plug into an existing pipeline and modify the denoising loop. This is the current list of official callbacks: +The example below uses `cutoff_step_ratio=0.4`, which means the callback is activated once denoising reaches 40% of the total inference steps. [`~callbacks.SDXLCFGCutoffCallback`] disables classifier-free guidance (CFG) after a certain number of steps, which can help save compute without significantly affecting performance. -- `SDCFGCutoffCallback`: Disables the CFG after a certain number of steps for all SD 1.5 pipelines, including text-to-image, image-to-image, inpaint, and controlnet. -- `SDXLCFGCutoffCallback`: Disables the CFG after a certain number of steps for all SDXL pipelines, including text-to-image, image-to-image, inpaint, and controlnet. -- `IPAdapterScaleCutoffCallback`: Disables the IP Adapter after a certain number of steps for all pipelines supporting IP-Adapter. +Define a callback with either of the `cutoff` arguments and pass it to the `callback_on_step_end` parameter in the pipeline. -> [!TIP] -> If you want to add a new official callback, feel free to open a [feature request](https://github.com/huggingface/diffusers/issues/new/choose) or [submit a PR](https://huggingface.co/docs/diffusers/main/en/conceptual/contribution#how-to-open-a-pr). - -To set up a callback, you need to specify the number of denoising steps after which the callback comes into effect. You can do so by using either one of these two arguments - -- `cutoff_step_ratio`: Float number with the ratio of the steps. -- `cutoff_step_index`: Integer number with the exact number of the step. - -```python +```py import torch - from diffusers import DPMSolverMultistepScheduler, StableDiffusionXLPipeline from diffusers.callbacks import SDXLCFGCutoffCallback - callback = SDXLCFGCutoffCallback(cutoff_step_ratio=0.4) -# can also be used with cutoff_step_index +# if using cutoff_step_index # callback = SDXLCFGCutoffCallback(cutoff_step_ratio=None, cutoff_step_index=10) pipeline = StableDiffusionXLPipeline.from_pretrained( "stabilityai/stable-diffusion-xl-base-1.0", torch_dtype=torch.float16, - variant="fp16", -).to("cuda") + device_map="cuda" +) pipeline.scheduler = DPMSolverMultistepScheduler.from_config(pipeline.scheduler.config, use_karras_sigmas=True) prompt = "a sports car at the road, best quality, high quality, high detail, 8k resolution" - -generator = torch.Generator(device="cpu").manual_seed(2628670641) - -out = pipeline( +output = pipeline( prompt=prompt, negative_prompt="", guidance_scale=6.5, @@ -65,83 +50,16 @@ out = pipeline( generator=generator, callback_on_step_end=callback, ) - -out.images[0].save("official_callback.png") -``` - -
-
- generated image of a sports car at the road -
without SDXLCFGCutoffCallback
-
-
- generated image of a sports car at the road with cfg callback -
with SDXLCFGCutoffCallback
-
-
- -## Dynamic classifier-free guidance - -Dynamic classifier-free guidance (CFG) is a feature that allows you to disable CFG after a certain number of inference steps which can help you save compute with minimal cost to performance. The callback function for this should have the following arguments: - -- `pipeline` (or the pipeline instance) provides access to important properties such as `num_timesteps` and `guidance_scale`. You can modify these properties by updating the underlying attributes. For this example, you'll disable CFG by setting `pipeline._guidance_scale=0.0`. -- `step_index` and `timestep` tell you where you are in the denoising loop. Use `step_index` to turn off CFG after reaching 40% of `num_timesteps`. -- `callback_kwargs` is a dict that contains tensor variables you can modify during the denoising loop. It only includes variables specified in the `callback_on_step_end_tensor_inputs` argument, which is passed to the pipeline's `__call__` method. Different pipelines may use different sets of variables, so please check a pipeline's `_callback_tensor_inputs` attribute for the list of variables you can modify. Some common variables include `latents` and `prompt_embeds`. For this function, change the batch size of `prompt_embeds` after setting `guidance_scale=0.0` in order for it to work properly. - -Your callback function should look something like this: - -```python -def callback_dynamic_cfg(pipe, step_index, timestep, callback_kwargs): - # adjust the batch_size of prompt_embeds according to guidance_scale - if step_index == int(pipeline.num_timesteps * 0.4): - prompt_embeds = callback_kwargs["prompt_embeds"] - prompt_embeds = prompt_embeds.chunk(2)[-1] - - # update guidance_scale and prompt_embeds - pipeline._guidance_scale = 0.0 - callback_kwargs["prompt_embeds"] = prompt_embeds - return callback_kwargs -``` - -Now, you can pass the callback function to the `callback_on_step_end` parameter and the `prompt_embeds` to `callback_on_step_end_tensor_inputs`. - -```py -import torch -from diffusers import StableDiffusionPipeline - -pipeline = StableDiffusionPipeline.from_pretrained("stable-diffusion-v1-5/stable-diffusion-v1-5", torch_dtype=torch.float16) -pipeline = pipeline.to("cuda") - -prompt = "a photo of an astronaut riding a horse on mars" - -generator = torch.Generator(device="cuda").manual_seed(1) -out = pipeline( - prompt, - generator=generator, - callback_on_step_end=callback_dynamic_cfg, - callback_on_step_end_tensor_inputs=['prompt_embeds'] -) - -out.images[0].save("out_custom_cfg.png") ``` -## Interrupt the diffusion process +If you want to add a new official callback, feel free to open a [feature request](https://github.com/huggingface/diffusers/issues/new/choose) or [submit a PR](https://huggingface.co/docs/diffusers/main/en/conceptual/contribution#how-to-open-a-pr). Otherwise, you can also create your own callback as shown below. -> [!TIP] -> The interruption callback is supported for text-to-image, image-to-image, and inpainting for the [StableDiffusionPipeline](../api/pipelines/stable_diffusion/overview) and [StableDiffusionXLPipeline](../api/pipelines/stable_diffusion/stable_diffusion_xl). +## Early stopping -Stopping the diffusion process early is useful when building UIs that work with Diffusers because it allows users to stop the generation process if they're unhappy with the intermediate results. You can incorporate this into your pipeline with a callback. +Early stopping is useful if you aren't happy with the intermediate results during generation. This callback sets a hardcoded stop point after which the pipeline terminates by setting the `_interrupt` attribute to `True`. -This callback function should take the following arguments: `pipeline`, `i`, `t`, and `callback_kwargs` (this must be returned). Set the pipeline's `_interrupt` attribute to `True` to stop the diffusion process after a certain number of steps. You are also free to implement your own custom stopping logic inside the callback. - -In this example, the diffusion process is stopped after 10 steps even though `num_inference_steps` is set to 50. - -```python -from diffusers import StableDiffusionPipeline - -pipeline = StableDiffusionPipeline.from_pretrained("stable-diffusion-v1-5/stable-diffusion-v1-5") -pipeline.enable_model_cpu_offload() -num_inference_steps = 50 +```py +from diffusers import StableDiffusionXLPipeline def interrupt_callback(pipeline, i, t, callback_kwargs): stop_idx = 10 @@ -150,6 +68,11 @@ def interrupt_callback(pipeline, i, t, callback_kwargs): return callback_kwargs +pipeline = StableDiffusionXLPipeline.from_pretrained( + "stable-diffusion-v1-5/stable-diffusion-v1-5" +) +num_inference_steps = 50 + pipeline( "A photo of a cat", num_inference_steps=num_inference_steps, @@ -157,92 +80,11 @@ pipeline( ) ``` -## IP Adapter Cutoff +## Display intermediate images -IP Adapter is an image prompt adapter that can be used for diffusion models without any changes to the underlying model. We can use the IP Adapter Cutoff Callback to disable the IP Adapter after a certain number of steps. To set up the callback, you need to specify the number of denoising steps after which the callback comes into effect. You can do so by using either one of these two arguments: +Visualizing the intermediate images is useful for progress monitoring and assessing the quality of the generated content. This callback decodes the latent tensors at each step and converts them to images. -- `cutoff_step_ratio`: Float number with the ratio of the steps. -- `cutoff_step_index`: Integer number with the exact number of the step. - -We need to download the diffusion model and load the ip_adapter for it as follows: - -```py -from diffusers import AutoPipelineForText2Image -from diffusers.utils import load_image -import torch - -pipeline = AutoPipelineForText2Image.from_pretrained("stabilityai/stable-diffusion-xl-base-1.0", torch_dtype=torch.float16).to("cuda") -pipeline.load_ip_adapter("h94/IP-Adapter", subfolder="sdxl_models", weight_name="ip-adapter_sdxl.bin") -pipeline.set_ip_adapter_scale(0.6) -``` -The setup for the callback should look something like this: - -```py - -from diffusers import AutoPipelineForText2Image -from diffusers.callbacks import IPAdapterScaleCutoffCallback -from diffusers.utils import load_image -import torch - - -pipeline = AutoPipelineForText2Image.from_pretrained( - "stabilityai/stable-diffusion-xl-base-1.0", - torch_dtype=torch.float16 -).to("cuda") - - -pipeline.load_ip_adapter( - "h94/IP-Adapter", - subfolder="sdxl_models", - weight_name="ip-adapter_sdxl.bin" -) - -pipeline.set_ip_adapter_scale(0.6) - - -callback = IPAdapterScaleCutoffCallback( - cutoff_step_ratio=None, - cutoff_step_index=5 -) - -image = load_image( - "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/ip_adapter_diner.png" -) - -generator = torch.Generator(device="cuda").manual_seed(2628670641) - -images = pipeline( - prompt="a tiger sitting in a chair drinking orange juice", - ip_adapter_image=image, - negative_prompt="deformed, ugly, wrong proportion, low res, bad anatomy, worst quality, low quality", - generator=generator, - num_inference_steps=50, - callback_on_step_end=callback, -).images - -images[0].save("custom_callback_img.png") -``` - -
-
- generated image of a tiger sitting in a chair drinking orange juice -
without IPAdapterScaleCutoffCallback
-
-
- generated image of a tiger sitting in a chair drinking orange juice with ip adapter callback -
with IPAdapterScaleCutoffCallback
-
-
- - -## Display image after each generation step - -> [!TIP] -> This tip was contributed by [asomoza](https://github.com/asomoza). - -Display an image after each generation step by accessing and converting the latents after each step into an image. The latent space is compressed to 128x128, so the images are also 128x128 which is useful for a quick preview. - -1. Use the function below to convert the SDXL latents (4 channels) to RGB tensors (3 channels) as explained in the [Explaining the SDXL latent space](https://huggingface.co/blog/TimothyAlexisVass/explaining-the-sdxl-latent-space) blog post. +[Convert](https://huggingface.co/blog/TimothyAlexisVass/explaining-the-sdxl-latent-space) the Stable Diffusion XL latents from latents (4 channels) to RGB tensors (3 tensors). ```py def latents_to_rgb(latents): @@ -260,7 +102,7 @@ def latents_to_rgb(latents): return Image.fromarray(image_array) ``` -2. Create a function to decode and save the latents into an image. +Extract the latents and convert the first image in the batch to RGB. Save the image as a PNG file with the step number. ```py def decode_tensors(pipe, step, timestep, callback_kwargs): @@ -272,19 +114,18 @@ def decode_tensors(pipe, step, timestep, callback_kwargs): return callback_kwargs ``` -3. Pass the `decode_tensors` function to the `callback_on_step_end` parameter to decode the tensors after each step. You also need to specify what you want to modify in the `callback_on_step_end_tensor_inputs` parameter, which in this case are the latents. +Use the `callback_on_step_end_tensor_inputs` parameter to specify what input type to modify, which in this case, are the latents. ```py -from diffusers import AutoPipelineForText2Image import torch from PIL import Image +from diffusers import AutoPipelineForText2Image pipeline = AutoPipelineForText2Image.from_pretrained( "stabilityai/stable-diffusion-xl-base-1.0", torch_dtype=torch.float16, - variant="fp16", - use_safetensors=True -).to("cuda") + device_map="cuda" +) image = pipeline( prompt="A croissant shaped like a cute bear.", @@ -293,27 +134,3 @@ image = pipeline( callback_on_step_end_tensor_inputs=["latents"], ).images[0] ``` - -
-
- -
step 0
-
-
- -
step 19 -
-
-
- -
step 29
-
-
- -
step 39
-
-
- -
step 49
-
-