diff --git a/docs/source/en/_toctree.yml b/docs/source/en/_toctree.yml index 14dbfe3ea1d3..856874d51961 100644 --- a/docs/source/en/_toctree.yml +++ b/docs/source/en/_toctree.yml @@ -23,11 +23,7 @@ - local: using-diffusers/reusing_seeds title: Reproducibility - local: using-diffusers/schedulers - title: Load schedulers and models - - local: using-diffusers/models - title: Models - - local: using-diffusers/scheduler_features - title: Scheduler features + title: Schedulers - local: using-diffusers/other-formats title: Model files and layouts - local: using-diffusers/push_to_hub diff --git a/docs/source/en/using-diffusers/models.md b/docs/source/en/using-diffusers/models.md deleted file mode 100644 index 22c78d490ae4..000000000000 --- a/docs/source/en/using-diffusers/models.md +++ /dev/null @@ -1,120 +0,0 @@ - - -[[open-in-colab]] - -# Models - -A diffusion model relies on a few individual models working together to generate an output. These models are responsible for denoising, encoding inputs, and decoding latents into the actual outputs. - -This guide will show you how to load models. - -## Loading a model - -All models are loaded with the [`~ModelMixin.from_pretrained`] method, which downloads and caches the latest model version. If the latest files are available in the local cache, [`~ModelMixin.from_pretrained`] reuses files in the cache. - -Pass the `subfolder` argument to [`~ModelMixin.from_pretrained`] to specify where to load the model weights from. Omit the `subfolder` argument if the repository doesn't have a subfolder structure or if you're loading a standalone model. - -```py -from diffusers import QwenImageTransformer2DModel - -model = QwenImageTransformer2DModel.from_pretrained("Qwen/Qwen-Image", subfolder="transformer") -``` - -## AutoModel - -[`AutoModel`] detects the model class from a `model_index.json` file or a model's `config.json` file. It fetches the correct model class from these files and delegates the actual loading to the model class. [`AutoModel`] is useful for automatic model type detection without needing to know the exact model class beforehand. - -```py -from diffusers import AutoModel - -model = AutoModel.from_pretrained( - "Qwen/Qwen-Image", subfolder="transformer" -) -``` - -## Model data types - -Use the `torch_dtype` argument in [`~ModelMixin.from_pretrained`] to load a model with a specific data type. This allows you to load a model in a lower precision to reduce memory usage. - -```py -import torch -from diffusers import QwenImageTransformer2DModel - -model = QwenImageTransformer2DModel.from_pretrained( - "Qwen/Qwen-Image", - subfolder="transformer", - torch_dtype=torch.bfloat16 -) -``` - -[nn.Module.to](https://docs.pytorch.org/docs/stable/generated/torch.nn.Module.html#torch.nn.Module.to) can also convert to a specific data type on the fly. However, it converts *all* weights to the requested data type unlike `torch_dtype` which respects `_keep_in_fp32_modules`. This argument preserves layers in `torch.float32` for numerical stability and best generation quality (see example [_keep_in_fp32_modules](https://github.com/huggingface/diffusers/blob/f864a9a352fa4a220d860bfdd1782e3e5af96382/src/diffusers/models/transformers/transformer_wan.py#L374)) - -```py -from diffusers import QwenImageTransformer2DModel - -model = QwenImageTransformer2DModel.from_pretrained( - "Qwen/Qwen-Image", subfolder="transformer" -) -model = model.to(dtype=torch.float16) -``` - -## Device placement - -Use the `device_map` argument in [`~ModelMixin.from_pretrained`] to place a model on an accelerator like a GPU. It is especially helpful where there are multiple GPUs. - -Diffusers currently provides three options to `device_map` for individual models, `"cuda"`, `"balanced"` and `"auto"`. Refer to the table below to compare the three placement strategies. - -| parameter | description | -|---|---| -| `"cuda"` | places pipeline on a supported accelerator (CUDA) | -| `"balanced"` | evenly distributes pipeline on all GPUs | -| `"auto"` | distribute model from fastest device first to slowest | - -Use the `max_memory` argument in [`~ModelMixin.from_pretrained`] to allocate a maximum amount of memory to use on each device. By default, Diffusers uses the maximum amount available. - -```py -import torch -from diffusers import QwenImagePipeline - -max_memory = {0: "16GB", 1: "16GB"} -pipeline = QwenImagePipeline.from_pretrained( - "Qwen/Qwen-Image", - torch_dtype=torch.bfloat16, - device_map="cuda", - max_memory=max_memory -) -``` - -The `hf_device_map` attribute allows you to access and view the `device_map`. - -```py -print(transformer.hf_device_map) -# {'': device(type='cuda')} -``` - -## Saving models - -Save a model with the [`~ModelMixin.save_pretrained`] method. - -```py -from diffusers import QwenImageTransformer2DModel - -model = QwenImageTransformer2DModel.from_pretrained("Qwen/Qwen-Image", subfolder="transformer") -model.save_pretrained("./local/model") -``` - -For large models, it is helpful to use `max_shard_size` to save a model as multiple shards. A shard can be loaded faster and save memory (refer to the [parallel loading](./loading#parallel-loading) docs for more details), especially if there is more than one GPU. - -```py -model.save_pretrained("./local/model", max_shard_size="5GB") -``` diff --git a/docs/source/en/using-diffusers/scheduler_features.md b/docs/source/en/using-diffusers/scheduler_features.md deleted file mode 100644 index f7977d53d5d6..000000000000 --- a/docs/source/en/using-diffusers/scheduler_features.md +++ /dev/null @@ -1,235 +0,0 @@ - - -# Scheduler features - -The scheduler is an important component of any diffusion model because it controls the entire denoising (or sampling) process. There are many types of schedulers, some are optimized for speed and some for quality. With Diffusers, you can modify the scheduler configuration to use custom noise schedules, sigmas, and rescale the noise schedule. Changing these parameters can have profound effects on inference quality and speed. - -This guide will demonstrate how to use these features to improve inference quality. - -> [!TIP] -> Diffusers currently only supports the `timesteps` and `sigmas` parameters for a select list of schedulers and pipelines. Feel free to open a [feature request](https://github.com/huggingface/diffusers/issues/new/choose) if you want to extend these parameters to a scheduler and pipeline that does not currently support it! - -## Timestep schedules - -The timestep or noise schedule determines the amount of noise at each sampling step. The scheduler uses this to generate an image with the corresponding amount of noise at each step. The timestep schedule is generated from the scheduler's default configuration, but you can customize the scheduler to use new and optimized sampling schedules that aren't in Diffusers yet. - -For example, [Align Your Steps (AYS)](https://research.nvidia.com/labs/toronto-ai/AlignYourSteps/) is a method for optimizing a sampling schedule to generate a high-quality image in as little as 10 steps. The optimal [10-step schedule](https://github.com/huggingface/diffusers/blob/a7bf77fc284810483f1e60afe34d1d27ad91ce2e/src/diffusers/schedulers/scheduling_utils.py#L51) for Stable Diffusion XL is: - -```py -from diffusers.schedulers import AysSchedules - -sampling_schedule = AysSchedules["StableDiffusionXLTimesteps"] -print(sampling_schedule) -"[999, 845, 730, 587, 443, 310, 193, 116, 53, 13]" -``` - -You can use the AYS sampling schedule in a pipeline by passing it to the `timesteps` parameter. - -```py -pipeline = StableDiffusionXLPipeline.from_pretrained( - "SG161222/RealVisXL_V4.0", - torch_dtype=torch.float16, - variant="fp16", -).to("cuda") -pipeline.scheduler = DPMSolverMultistepScheduler.from_config(pipeline.scheduler.config, algorithm_type="sde-dpmsolver++") - -prompt = "A cinematic shot of a cute little rabbit wearing a jacket and doing a thumbs up" -generator = torch.Generator(device="cpu").manual_seed(2487854446) -image = pipeline( - prompt=prompt, - negative_prompt="", - generator=generator, - timesteps=sampling_schedule, -).images[0] -``` - -
-
- -
AYS timestep schedule 10 steps
-
-
- -
Linearly-spaced timestep schedule 10 steps
-
-
- -
Linearly-spaced timestep schedule 25 steps
-
-
- -## Timestep spacing - -The way sample steps are selected in the schedule can affect the quality of the generated image, especially with respect to [rescaling the noise schedule](#rescale-noise-schedule), which can enable a model to generate much brighter or darker images. Diffusers provides three timestep spacing methods: - -- `leading` creates evenly spaced steps -- `linspace` includes the first and last steps and evenly selects the remaining intermediate steps -- `trailing` only includes the last step and evenly selects the remaining intermediate steps starting from the end - -It is recommended to use the `trailing` spacing method because it generates higher quality images with more details when there are fewer sample steps. But the difference in quality is not as obvious for more standard sample step values. - -```py -import torch -from diffusers import StableDiffusionXLPipeline, DPMSolverMultistepScheduler - -pipeline = StableDiffusionXLPipeline.from_pretrained( - "SG161222/RealVisXL_V4.0", - torch_dtype=torch.float16, - variant="fp16", -).to("cuda") -pipeline.scheduler = DPMSolverMultistepScheduler.from_config(pipeline.scheduler.config, timestep_spacing="trailing") - -prompt = "A cinematic shot of a cute little black cat sitting on a pumpkin at night" -generator = torch.Generator(device="cpu").manual_seed(2487854446) -image = pipeline( - prompt=prompt, - negative_prompt="", - generator=generator, - num_inference_steps=5, -).images[0] -image -``` - -
-
- -
trailing spacing after 5 steps
-
-
- -
leading spacing after 5 steps
-
-
- -## Sigmas - -The `sigmas` parameter is the amount of noise added at each timestep according to the timestep schedule. Like the `timesteps` parameter, you can customize the `sigmas` parameter to control how much noise is added at each step. When you use a custom `sigmas` value, the `timesteps` are calculated from the custom `sigmas` value and the default scheduler configuration is ignored. - -For example, you can manually pass the [sigmas](https://github.com/huggingface/diffusers/blob/6529ee67ec02fcf58d2fd9242164ea002b351d75/src/diffusers/schedulers/scheduling_utils.py#L55) for something like the 10-step AYS schedule from before to the pipeline. - -```py -import torch - -from diffusers import DiffusionPipeline, EulerDiscreteScheduler - -model_id = "stabilityai/stable-diffusion-xl-base-1.0" -pipeline = DiffusionPipeline.from_pretrained( - "stabilityai/stable-diffusion-xl-base-1.0", - torch_dtype=torch.float16, - variant="fp16", -).to("cuda") -pipeline.scheduler = EulerDiscreteScheduler.from_config(pipeline.scheduler.config) - -sigmas = [14.615, 6.315, 3.771, 2.181, 1.342, 0.862, 0.555, 0.380, 0.234, 0.113, 0.0] -prompt = "anthropomorphic capybara wearing a suit and working with a computer" -generator = torch.Generator(device='cuda').manual_seed(123) -image = pipeline( - prompt=prompt, - num_inference_steps=10, - sigmas=sigmas, - generator=generator -).images[0] -``` - -When you take a look at the scheduler's `timesteps` parameter, you'll see that it is the same as the AYS timestep schedule because the `timestep` schedule is calculated from the `sigmas`. - -```py -print(f" timesteps: {pipe.scheduler.timesteps}") -"timesteps: tensor([999., 845., 730., 587., 443., 310., 193., 116., 53., 13.], device='cuda:0')" -``` - -### Karras sigmas - -> [!TIP] -> Refer to the scheduler API [overview](../api/schedulers/overview) for a list of schedulers that support Karras sigmas. -> -> Karras sigmas should not be used for models that weren't trained with them. For example, the base Stable Diffusion XL model shouldn't use Karras sigmas but the [DreamShaperXL](https://hf.co/Lykon/dreamshaper-xl-1-0) model can since they are trained with Karras sigmas. - -Karras scheduler's use the timestep schedule and sigmas from the [Elucidating the Design Space of Diffusion-Based Generative Models](https://hf.co/papers/2206.00364) paper. This scheduler variant applies a smaller amount of noise per step as it approaches the end of the sampling process compared to other schedulers, and can increase the level of details in the generated image. - -Enable Karras sigmas by setting `use_karras_sigmas=True` in the scheduler. - -```py -import torch -from diffusers import StableDiffusionXLPipeline, DPMSolverMultistepScheduler - -pipeline = StableDiffusionXLPipeline.from_pretrained( - "SG161222/RealVisXL_V4.0", - torch_dtype=torch.float16, - variant="fp16", -).to("cuda") -pipeline.scheduler = DPMSolverMultistepScheduler.from_config(pipeline.scheduler.config, algorithm_type="sde-dpmsolver++", use_karras_sigmas=True) - -prompt = "A cinematic shot of a cute little rabbit wearing a jacket and doing a thumbs up" -generator = torch.Generator(device="cpu").manual_seed(2487854446) -image = pipeline( - prompt=prompt, - negative_prompt="", - generator=generator, -).images[0] -``` - -
-
- -
Karras sigmas enabled
-
-
- -
Karras sigmas disabled
-
-
- -## Rescale noise schedule - -In the [Common Diffusion Noise Schedules and Sample Steps are Flawed](https://hf.co/papers/2305.08891) paper, the authors discovered that common noise schedules allowed some signal to leak into the last timestep. This signal leakage at inference can cause models to only generate images with medium brightness. By enforcing a zero signal-to-noise ratio (SNR) for the timstep schedule and sampling from the last timestep, the model can be improved to generate very bright or dark images. - -> [!TIP] -> For inference, you need a model that has been trained with *v_prediction*. To train your own model with *v_prediction*, add the following flag to the [train_text_to_image.py](https://github.com/huggingface/diffusers/blob/main/examples/text_to_image/train_text_to_image.py) or [train_text_to_image_lora.py](https://github.com/huggingface/diffusers/blob/main/examples/text_to_image/train_text_to_image_lora.py) scripts. -> -> ```bash -> --prediction_type="v_prediction" -> ``` - -For example, load the [ptx0/pseudo-journey-v2](https://hf.co/ptx0/pseudo-journey-v2) checkpoint which was trained with `v_prediction` and the [`DDIMScheduler`]. Configure the following parameters in the [`DDIMScheduler`]: - -* `rescale_betas_zero_snr=True` to rescale the noise schedule to zero SNR -* `timestep_spacing="trailing"` to start sampling from the last timestep - -Set `guidance_rescale` in the pipeline to prevent over-exposure. A lower value increases brightness but some of the details may appear washed out. - -```py -from diffusers import DiffusionPipeline, DDIMScheduler - -pipeline = DiffusionPipeline.from_pretrained("ptx0/pseudo-journey-v2", use_safetensors=True) - -pipeline.scheduler = DDIMScheduler.from_config( - pipeline.scheduler.config, rescale_betas_zero_snr=True, timestep_spacing="trailing" -) -pipeline.to("cuda") -prompt = "cinematic photo of a snowy mountain at night with the northern lights aurora borealis overhead, 35mm photograph, film, professional, 4k, highly detailed" -generator = torch.Generator(device="cpu").manual_seed(23) -image = pipeline(prompt, guidance_rescale=0.7, generator=generator).images[0] -image -``` - -
-
- -
default Stable Diffusion v2-1 image
-
-
- -
image with zero SNR and trailing timestep spacing enabled
-
-
diff --git a/docs/source/en/using-diffusers/schedulers.md b/docs/source/en/using-diffusers/schedulers.md index 6d928f8037c4..0e236e4e3e1d 100644 --- a/docs/source/en/using-diffusers/schedulers.md +++ b/docs/source/en/using-diffusers/schedulers.md @@ -10,200 +10,273 @@ an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express o specific language governing permissions and limitations under the License. --> -# Load schedulers and models - [[open-in-colab]] -Diffusion pipelines are a collection of interchangeable schedulers and models that can be mixed and matched to tailor a pipeline to a specific use case. The scheduler encapsulates the entire denoising process such as the number of denoising steps and the algorithm for finding the denoised sample. A scheduler is not parameterized or trained so they don't take very much memory. The model is usually only concerned with the forward pass of going from a noisy input to a less noisy sample. +# Schedulers + +A scheduler is an algorithm that provides instructions to the denoising process such as how much noise to remove at a certain step. It takes the model prediction from step *t* and applies an update for how to compute the next sample at step *t-1*. Different schedulers produce different results; some are faster while others are more accurate. + +Diffusers supports many schedulers and allows you to modify their timestep schedules, timestep spacing, and more, to generate high-quality images in fewer steps. -This guide will show you how to load schedulers and models to customize a pipeline. You'll use the [stable-diffusion-v1-5/stable-diffusion-v1-5](https://hf.co/stable-diffusion-v1-5/stable-diffusion-v1-5) checkpoint throughout this guide, so let's load it first. +This guide will show you how to load and customize schedulers. + +## Loading schedulers + +Schedulers don't have any parameters and are defined in a configuration file. Access the `.scheduler` attribute of a pipeline to view the configuration. ```py import torch from diffusers import DiffusionPipeline pipeline = DiffusionPipeline.from_pretrained( - "stable-diffusion-v1-5/stable-diffusion-v1-5", torch_dtype=torch.float16, use_safetensors=True -).to("cuda") -``` - -You can see what scheduler this pipeline uses with the `pipeline.scheduler` attribute. - -```py + "stabilityai/stable-diffusion-xl-base-1.0", torch_dtype=torch.float16, device_map="cuda" +) pipeline.scheduler -PNDMScheduler { - "_class_name": "PNDMScheduler", - "_diffusers_version": "0.21.4", - "beta_end": 0.012, - "beta_schedule": "scaled_linear", - "beta_start": 0.00085, - "clip_sample": false, - "num_train_timesteps": 1000, - "set_alpha_to_one": false, - "skip_prk_steps": true, - "steps_offset": 1, - "timestep_spacing": "leading", - "trained_betas": null -} ``` -## Load a scheduler - -Schedulers are defined by a configuration file that can be used by a variety of schedulers. Load a scheduler with the [`SchedulerMixin.from_pretrained`] method, and specify the `subfolder` parameter to load the configuration file into the correct subfolder of the pipeline repository. - -For example, to load the [`DDIMScheduler`]: +Load a different scheduler with [`~SchedulerMixin.from_pretrained`] and specify the `subfolder` argument to load the configuration file into the correct subfolder of the pipeline repository. Pass the new scheduler to the existing pipeline. ```py -from diffusers import DDIMScheduler, DiffusionPipeline +from diffusers import DPMSolverMultistepScheduler -ddim = DDIMScheduler.from_pretrained("stable-diffusion-v1-5/stable-diffusion-v1-5", subfolder="scheduler") +dpm = DPMSolverMultistepScheduler.from_pretrained( + "stabilityai/stable-diffusion-xl-base-1.0", subfolder="scheduler" +) +pipeline = DiffusionPipeline.from_pretrained( + "stabilityai/stable-diffusion-xl-base-1.0", + scheduler=dpm, + torch_dtype=torch.float16, + device_map="cuda" +) +pipeline.scheduler ``` -Then you can pass the newly loaded scheduler to the pipeline. +## Timestep schedules -```python -pipeline = DiffusionPipeline.from_pretrained( - "stable-diffusion-v1-5/stable-diffusion-v1-5", scheduler=ddim, torch_dtype=torch.float16, use_safetensors=True -).to("cuda") -``` +Timestep or noise schedule decides how noise is distributed over the denoising process. The schedule can be linear or more concentrated toward the beginning or end. It is a precomputed sequence of noise levels generated from the scheduler's default configuration, but it can be customized to use other schedules. -## Compare schedulers +> [!TIP] +> The `timesteps` argument is only supported for a select list of schedulers and pipelines. Feel free to open a feature request if you want to extend these parameters to a scheduler and pipeline that does not currently support it! -Schedulers have their own unique strengths and weaknesses, making it difficult to quantitatively compare which scheduler works best for a pipeline. You typically have to make a trade-off between denoising speed and denoising quality. We recommend trying out different schedulers to find one that works best for your use case. Call the `pipeline.scheduler.compatibles` attribute to see what schedulers are compatible with a pipeline. +The example below uses the [Align Your Steps (AYS)](https://research.nvidia.com/labs/toronto-ai/AlignYourSteps/) schedule which can generate a high-quality image in 10 steps, significantly speeding up generation and reducing computation time. -Let's compare the [`LMSDiscreteScheduler`], [`EulerDiscreteScheduler`], [`EulerAncestralDiscreteScheduler`], and the [`DPMSolverMultistepScheduler`] on the following prompt and seed. +Import the schedule and pass it to the `timesteps` argument in the pipeline. ```py import torch -from diffusers import DiffusionPipeline +from diffusers import DiffusionPipeline, DPMSolverMultistepScheduler +from diffusers.schedulers import AysSchedules + +sampling_schedule = AysSchedules["StableDiffusionXLTimesteps"] +print(sampling_schedule) +"[999, 845, 730, 587, 443, 310, 193, 116, 53, 13]" pipeline = DiffusionPipeline.from_pretrained( - "stable-diffusion-v1-5/stable-diffusion-v1-5", torch_dtype=torch.float16, use_safetensors=True -).to("cuda") + "SG161222/RealVisXL_V4.0", + torch_dtype=torch.float16, + device_map="cuda" +) +pipeline.scheduler = DPMSolverMultistepScheduler.from_config( + pipeline.scheduler.config, algorithm_type="sde-dpmsolver++" +) -prompt = "A photograph of an astronaut riding a horse on Mars, high resolution, high definition." -generator = torch.Generator(device="cuda").manual_seed(8) +prompt = "A cinematic shot of a cute little rabbit wearing a jacket and doing a thumbs up" +image = pipeline( + prompt=prompt, + negative_prompt="", + timesteps=sampling_schedule, +).images[0] ``` -To change the pipelines scheduler, use the [`~ConfigMixin.from_config`] method to load a different scheduler's `pipeline.scheduler.config` into the pipeline. +
+
+ +
AYS timestep schedule 10 steps
+
+
+ +
Linearly-spaced timestep schedule 10 steps
+
+
+ +
Linearly-spaced timestep schedule 25 steps
+
+
+ +### Rescaling schedules + +Denoising should begin with pure noise and the signal-to-noise (SNR) ration should be zero. However, some models don't actually start from pure noise which makes it difficult to generate images at brightness extremes. - - +> [!TIP] +> Train your own model with `v_prediction` by adding the `--prediction_type="v_prediction"` flag to your training script. You can also [search](https://huggingface.co/search/full-text?q=v_prediction&type=model) for existing models trained with `v_prediction`. -[`LMSDiscreteScheduler`] typically generates higher quality images than the default scheduler. +To fix this, a model must be trained with `v_prediction`. If a model is trained with `v_prediction`, then enable the following arguments in the scheduler. + +- Set `rescale_betas_zero_snr=True` to rescale the noise schedule to the very last timestep with exactly zero SNR +- Set `timestep_spacing="trailing"` to force sampling from the last timestep with pure noise ```py -from diffusers import LMSDiscreteScheduler +from diffusers import DiffusionPipeline, DDIMScheduler -pipeline.scheduler = LMSDiscreteScheduler.from_config(pipeline.scheduler.config) -image = pipeline(prompt, generator=generator).images[0] -image -``` +pipeline = DiffusionPipeline.from_pretrained("ptx0/pseudo-journey-v2", device_map="cuda") - - +pipeline.scheduler = DDIMScheduler.from_config( + pipeline.scheduler.config, rescale_betas_zero_snr=True, timestep_spacing="trailing" +) +``` -[`EulerDiscreteScheduler`] can generate higher quality images in just 30 steps. +Set `guidance_rescale` in the pipeline to avoid overexposed images. A lower value increases brightness, but some details may appear washed out. ```py -from diffusers import EulerDiscreteScheduler - -pipeline.scheduler = EulerDiscreteScheduler.from_config(pipeline.scheduler.config) -image = pipeline(prompt, generator=generator).images[0] -image +prompt = """ +cinematic photo of a snowy mountain at night with the northern lights aurora borealis +overhead, 35mm photograph, film, professional, 4k, highly detailed +""" +image = pipeline(prompt, guidance_rescale=0.7).images[0] ``` - - +
+
+ +
default Stable Diffusion v2-1 image
+
+
+ +
image with zero SNR and trailing timestep spacing enabled
+
+
-[`EulerAncestralDiscreteScheduler`] can generate higher quality images in just 30 steps. +## Timestep spacing -```py -from diffusers import EulerAncestralDiscreteScheduler +Timestep spacing refers to the specific steps *t* to sample from from the schedule. Diffusers provides three spacing types as shown below. -pipeline.scheduler = EulerAncestralDiscreteScheduler.from_config(pipeline.scheduler.config) -image = pipeline(prompt, generator=generator).images[0] -image -``` +| spacing strategy | spacing calculation | example timesteps | +|---|---|---| +| `leading` | evenly spaced steps | `[900, 800, 700, ..., 100, 0]` | +| `linspace` | include first and last steps and evenly divide remaining intermediate steps | `[1000, 888.89, 777.78, ..., 111.11, 0]` | +| `trailing` | include last step and evenly divide remaining intermediate steps beginning from the end | `[999, 899, 799, 699, 599, 499, 399, 299, 199, 99]` | -
- +Pass the spacing strategy to the `timestep_spacing` argument in the scheduler. -[`DPMSolverMultistepScheduler`] provides a balance between speed and quality and can generate higher quality images in just 20 steps. +> [!TIP] +> The `trailing` strategy typically produces higher quality images with more details with fewer steps, but the difference in quality is not as obvious for more standard step values. ```py -from diffusers import DPMSolverMultistepScheduler +import torch +from diffusers import DiffusionPipeline, DPMSolverMultistepScheduler + +pipeline = DiffusionPipeline.from_pretrained( + "SG161222/RealVisXL_V4.0", + torch_dtype=torch.float16, + device_map="cuda" +) +pipeline.scheduler = DPMSolverMultistepScheduler.from_config( + pipeline.scheduler.config, timestep_spacing="trailing" +) -pipeline.scheduler = DPMSolverMultistepScheduler.from_config(pipeline.scheduler.config) -image = pipeline(prompt, generator=generator).images[0] +prompt = "A cinematic shot of a cute little black cat sitting on a pumpkin at night" +image = pipeline( + prompt=prompt, + negative_prompt="", + num_inference_steps=5, +).images[0] image ``` - -
-
- -
LMSDiscreteScheduler
+ +
trailing spacing after 5 steps
- -
EulerDiscreteScheduler
-
-
-
-
- -
EulerAncestralDiscreteScheduler
-
-
- -
DPMSolverMultistepScheduler
+ +
leading spacing after 5 steps
-Most images look very similar and are comparable in quality. Again, it often comes down to your specific use case so a good approach is to run multiple different schedulers and compare the results. +## Sigmas -## Models +Sigmas is a measure of how noisy a sample is at a certain step as defined by the schedule. When using custom `sigmas`, the `timesteps` are calculated from these values instead of the default scheduler configuration. -Models are loaded from the [`ModelMixin.from_pretrained`] method, which downloads and caches the latest version of the model weights and configurations. If the latest files are available in the local cache, [`~ModelMixin.from_pretrained`] reuses files in the cache instead of re-downloading them. +> [!TIP] +> The `sigmas` argument is only supported for a select list of schedulers and pipelines. Feel free to open a feature request if you want to extend these parameters to a scheduler and pipeline that does not currently support it! -Models can be loaded from a subfolder with the `subfolder` argument. For example, the model weights for [stable-diffusion-v1-5/stable-diffusion-v1-5](https://hf.co/stable-diffusion-v1-5/stable-diffusion-v1-5) are stored in the [unet](https://hf.co/stable-diffusion-v1-5/stable-diffusion-v1-5/tree/main/unet) subfolder. +Pass the custom sigmas to the `sigmas` argument in the pipeline. The example below uses the [sigmas](https://github.com/huggingface/diffusers/blob/6529ee67ec02fcf58d2fd9242164ea002b351d75/src/diffusers/schedulers/scheduling_utils.py#L55) from the 10-step AYS schedule. -```python -from diffusers import UNet2DConditionModel +```py +import torch +from diffusers import DiffusionPipeline, DPMSolverMultistepScheduler -unet = UNet2DConditionModel.from_pretrained("stable-diffusion-v1-5/stable-diffusion-v1-5", subfolder="unet", use_safetensors=True) -``` +pipeline = DiffusionPipeline.from_pretrained( + "SG161222/RealVisXL_V4.0", + torch_dtype=torch.float16, + device_map="cuda" +) +pipeline.scheduler = DPMSolverMultistepScheduler.from_config( + pipeline.scheduler.config, algorithm_type="sde-dpmsolver++" +) -They can also be directly loaded from a [repository](https://huggingface.co/google/ddpm-cifar10-32/tree/main). +sigmas = [14.615, 6.315, 3.771, 2.181, 1.342, 0.862, 0.555, 0.380, 0.234, 0.113, 0.0] +prompt = "A cinematic shot of a cute little rabbit wearing a jacket and doing a thumbs up" +image = pipeline( + prompt=prompt, + negative_prompt="", + sigmas=sigmas, +).images[0] +``` -```python -from diffusers import UNet2DModel +### Karras sigmas -unet = UNet2DModel.from_pretrained("google/ddpm-cifar10-32", use_safetensors=True) -``` +[Karras sigmas](https://huggingface.co/papers/2206.00364) resamples the noise schedule for more efficient sampling by clustering sigmas more densely in the middle of the sequence where structure reconstruction is critical, while using fewer sigmas at the beginning and end where noise changes have less impact. This can increase the level of details in a generated image. -To load and save model variants, specify the `variant` argument in [`ModelMixin.from_pretrained`] and [`ModelMixin.save_pretrained`]. +Set `use_karras_sigmas=True` in the scheduler to enable it. -```python -from diffusers import UNet2DConditionModel +```py +import torch +from diffusers import DiffusionPipeline, DPMSolverMultistepScheduler -unet = UNet2DConditionModel.from_pretrained( - "stable-diffusion-v1-5/stable-diffusion-v1-5", subfolder="unet", variant="non_ema", use_safetensors=True +pipeline = DiffusionPipeline.from_pretrained( + "SG161222/RealVisXL_V4.0", + torch_dtype=torch.float16, + device_map="cuda" +) +pipeline.scheduler = DPMSolverMultistepScheduler.from_config( + pipeline.scheduler.config, + algorithm_type="sde-dpmsolver++", + use_karras_sigmas=True, ) -unet.save_pretrained("./local-unet", variant="non_ema") + +prompt = "A cinematic shot of a cute little rabbit wearing a jacket and doing a thumbs up" +image = pipeline( + prompt=prompt, + negative_prompt="", + sigmas=sigmas, +).images[0] ``` -Use the `torch_dtype` argument in [`~ModelMixin.from_pretrained`] to specify the dtype to load a model in. +
+
+ +
Karras sigmas enabled
+
+
+ +
Karras sigmas disabled
+
+
-```py -from diffusers import AutoModel +Refer to the scheduler API [overview](../api/schedulers/overview) for a list of schedulers that support Karras sigmas. It should only be used for models trained with Karras sigmas. -unet = AutoModel.from_pretrained( - "stabilityai/stable-diffusion-xl-base-1.0", subfolder="unet", torch_dtype=torch.float16 -) -``` +## Choosing a scheduler + +It's important to try different schedulers to find the best one for your use case. Here are a few recommendations to help you get started. + +- DPM++ 2M SDE Karras is generally a good all-purpose option. +- [`TCDScheduler`] works well for distilled models. +- [`FlowMatchEulerDiscreteScheduler`] and [`FlowMatchHeunDiscreteScheduler`] for FlowMatch models. +- [`EulerDiscreteScheduler`] or [`EulerAncestralDiscreteScheduler`] for generating anime style images. +- DPM++ 2M paired with [`LCMScheduler`] on SDXL for generating realistic images. + +## Resources -You can also use the [torch.Tensor.to](https://docs.pytorch.org/docs/stable/generated/torch.Tensor.to.html) method to convert to the specified dtype on the fly. It converts *all* weights unlike the `torch_dtype` argument that respects the `_keep_in_fp32_modules`. This is important for models whose layers must remain in fp32 for numerical stability and best generation quality (see example [here](https://github.com/huggingface/diffusers/blob/f864a9a352fa4a220d860bfdd1782e3e5af96382/src/diffusers/models/transformers/transformer_wan.py#L374)). +- Read the [Common Diffusion Noise Schedules and Sample Steps are Flawed](https://huggingface.co/papers/2305.08891) paper for more details about rescaling the noise schedule to enforce zero SNR. \ No newline at end of file