Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion docs/source/en/api/loaders/ip_adapter.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ specific language governing permissions and limitations under the License.
[IP-Adapter](https://hf.co/papers/2308.06721) is a lightweight adapter that enables prompting a diffusion model with an image. This method decouples the cross-attention layers of the image and text features. The image features are generated from an image encoder.

> [!TIP]
> Learn how to load an IP-Adapter checkpoint and image in the IP-Adapter [loading](../../using-diffusers/loading_adapters#ip-adapter) guide, and you can see how to use it in the [usage](../../using-diffusers/ip_adapter) guide.
> Learn how to load and use an IP-Adapter checkpoint and image in the [IP-Adapter](../../using-diffusers/ip_adapter) guide,.

## IPAdapterMixin

Expand Down
2 changes: 1 addition & 1 deletion docs/source/en/api/loaders/lora.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@ LoRA is a fast and lightweight training method that inserts and trains a signifi
- [`LoraBaseMixin`] provides a base class with several utility methods to fuse, unfuse, unload, LoRAs and more.

> [!TIP]
> To learn more about how to load LoRA weights, see the [LoRA](../../using-diffusers/loading_adapters#lora) loading guide.
> To learn more about how to load LoRA weights, see the [LoRA](../../tutorials/using_peft_for_inference) loading guide.

## LoraBaseMixin

Expand Down
2 changes: 1 addition & 1 deletion docs/source/en/api/loaders/peft.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ specific language governing permissions and limitations under the License.

# PEFT

Diffusers supports loading adapters such as [LoRA](../../using-diffusers/loading_adapters) with the [PEFT](https://huggingface.co/docs/peft/index) library with the [`~loaders.peft.PeftAdapterMixin`] class. This allows modeling classes in Diffusers like [`UNet2DConditionModel`], [`SD3Transformer2DModel`] to operate with an adapter.
Diffusers supports loading adapters such as [LoRA](../../tutorials/using_peft_for_inference) with the [PEFT](https://huggingface.co/docs/peft/index) library with the [`~loaders.peft.PeftAdapterMixin`] class. This allows modeling classes in Diffusers like [`UNet2DConditionModel`], [`SD3Transformer2DModel`] to operate with an adapter.

> [!TIP]
> Refer to the [Inference with PEFT](../../tutorials/using_peft_for_inference.md) tutorial for an overview of how to use PEFT in Diffusers for inference.
Expand Down
2 changes: 1 addition & 1 deletion docs/source/en/api/loaders/textual_inversion.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ Textual Inversion is a training method for personalizing models by learning new
[`TextualInversionLoaderMixin`] provides a function for loading Textual Inversion embeddings from Diffusers and Automatic1111 into the text encoder and loading a special token to activate the embeddings.

> [!TIP]
> To learn more about how to load Textual Inversion embeddings, see the [Textual Inversion](../../using-diffusers/loading_adapters#textual-inversion) loading guide.
> To learn more about how to load Textual Inversion embeddings, see the [Textual Inversion](../../using-diffusers/textual_inversion_inference) loading guide.

## TextualInversionLoaderMixin

Expand Down
2 changes: 1 addition & 1 deletion docs/source/en/api/loaders/transformer_sd3.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ This class is useful when *only* loading weights into a [`SD3Transformer2DModel`
The [`SD3Transformer2DLoadersMixin`] class currently only loads IP-Adapter weights, but will be used in the future to save weights and load LoRAs.

> [!TIP]
> To learn more about how to load LoRA weights, see the [LoRA](../../using-diffusers/loading_adapters#lora) loading guide.
> To learn more about how to load LoRA weights, see the [LoRA](../../tutorials/using_peft_for_inference) loading guide.

## SD3Transformer2DLoadersMixin

Expand Down
2 changes: 1 addition & 1 deletion docs/source/en/api/loaders/unet.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ Some training methods - like LoRA and Custom Diffusion - typically target the UN
The [`UNet2DConditionLoadersMixin`] class provides functions for loading and saving weights, fusing and unfusing LoRAs, disabling and enabling LoRAs, and setting and deleting adapters.

> [!TIP]
> To learn more about how to load LoRA weights, see the [LoRA](../../using-diffusers/loading_adapters#lora) loading guide.
> To learn more about how to load LoRA weights, see the [LoRA](../../tutorials/using_peft_for_inference) guide.

## UNet2DConditionLoadersMixin

Expand Down
2 changes: 1 addition & 1 deletion docs/source/en/api/pipelines/flux.md
Original file line number Diff line number Diff line change
Expand Up @@ -418,7 +418,7 @@ When unloading the Control LoRA weights, call `pipe.unload_lora_weights(reset_to
## IP-Adapter

> [!TIP]
> Check out [IP-Adapter](../../../using-diffusers/ip_adapter) to learn more about how IP-Adapters work.
> Check out [IP-Adapter](../../using-diffusers/ip_adapter) to learn more about how IP-Adapters work.

An IP-Adapter lets you prompt Flux with images, in addition to the text prompt. This is especially useful when describing complex concepts that are difficult to articulate through text alone and you have reference images.

Expand Down
2 changes: 1 addition & 1 deletion docs/source/en/api/pipelines/hidream.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@

## Available models

The following models are available for the [`HiDreamImagePipeline`](text-to-image) pipeline:
The following models are available for the [`HiDreamImagePipeline`] pipeline:

| Model name | Description |
|:---|:---|
Expand Down
2 changes: 1 addition & 1 deletion docs/source/en/api/pipelines/overview.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ The table below lists all the pipelines currently available in 🤗 Diffusers an
| [Attend-and-Excite](attend_and_excite) | text2image |
| [AudioLDM](audioldm) | text2audio |
| [AudioLDM2](audioldm2) | text2audio |
| [AuraFlow](auraflow) | text2image |
| [AuraFlow](aura_flow) | text2image |
| [BLIP Diffusion](blip_diffusion) | text2image |
| [Bria 3.2](bria_3_2) | text2image |
| [CogVideoX](cogvideox) | text2video |
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -271,7 +271,7 @@ Check out the full script [here](https://gist.github.com/sayakpaul/508d89d7aad4f

Quantization helps reduce the memory requirements of very large models by storing model weights in a lower precision data type. However, quantization may have varying impact on video quality depending on the video model.

Refer to the [Quantization](../../quantization/overview) overview to learn more about supported quantization backends and selecting a quantization backend that supports your use case. The example below demonstrates how to load a quantized [`StableDiffusion3Pipeline`] for inference with bitsandbytes.
Refer to the [Quantization](../../../quantization/overview) overview to learn more about supported quantization backends and selecting a quantization backend that supports your use case. The example below demonstrates how to load a quantized [`StableDiffusion3Pipeline`] for inference with bitsandbytes.

```py
import torch
Expand Down
2 changes: 1 addition & 1 deletion docs/source/en/api/pipelines/stable_diffusion/svd.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ The abstract from the paper is:

Video generation is memory-intensive and one way to reduce your memory usage is to set `enable_forward_chunking` on the pipeline's UNet so you don't run the entire feedforward layer at once. Breaking it up into chunks in a loop is more efficient.

Check out the [Text or image-to-video](text-img2vid) guide for more details about how certain parameters can affect video generation and how to optimize inference by reducing memory usage.
Check out the [Text or image-to-video](../../../using-diffusers/text-img2vid) guide for more details about how certain parameters can affect video generation and how to optimize inference by reducing memory usage.

## StableVideoDiffusionPipeline

Expand Down
2 changes: 1 addition & 1 deletion docs/source/en/api/pipelines/text_to_video.md
Original file line number Diff line number Diff line change
Expand Up @@ -172,7 +172,7 @@ Here are some sample outputs:

Video generation is memory-intensive and one way to reduce your memory usage is to set `enable_forward_chunking` on the pipeline's UNet so you don't run the entire feedforward layer at once. Breaking it up into chunks in a loop is more efficient.

Check out the [Text or image-to-video](text-img2vid) guide for more details about how certain parameters can affect video generation and how to optimize inference by reducing memory usage.
Check out the [Text or image-to-video](../../using-diffusers/text-img2vid) guide for more details about how certain parameters can affect video generation and how to optimize inference by reducing memory usage.

> [!TIP]
> Make sure to check out the Schedulers [guide](../../using-diffusers/schedulers) to learn how to explore the tradeoff between scheduler speed and quality, and see the [reuse components across pipelines](../../using-diffusers/loading#reuse-a-pipeline) section to learn how to efficiently load the same components into multiple pipelines.
Expand Down
2 changes: 1 addition & 1 deletion docs/source/en/training/dreambooth.md
Original file line number Diff line number Diff line change
Expand Up @@ -548,4 +548,4 @@ Training the DeepFloyd IF model can be challenging, but here are some tips that

Congratulations on training your DreamBooth model! To learn more about how to use your new model, the following guide may be helpful:

- Learn how to [load a DreamBooth](../using-diffusers/loading_adapters) model for inference if you trained your model with LoRA.
- Learn how to [load a DreamBooth](../using-diffusers/dreambooth) model for inference if you trained your model with LoRA.
4 changes: 2 additions & 2 deletions docs/source/en/training/lcm_distill.md
Original file line number Diff line number Diff line change
Expand Up @@ -75,7 +75,7 @@ accelerate launch train_lcm_distill_sd_wds.py \
Most of the parameters are identical to the parameters in the [Text-to-image](text2image#script-parameters) training guide, so you'll focus on the parameters that are relevant to latent consistency distillation in this guide.

- `--pretrained_teacher_model`: the path to a pretrained latent diffusion model to use as the teacher model
- `--pretrained_vae_model_name_or_path`: path to a pretrained VAE; the SDXL VAE is known to suffer from numerical instability, so this parameter allows you to specify an alternative VAE (like this [VAE]((https://huggingface.co/madebyollin/sdxl-vae-fp16-fix)) by madebyollin which works in fp16)
- `--pretrained_vae_model_name_or_path`: path to a pretrained VAE; the SDXL VAE is known to suffer from numerical instability, so this parameter allows you to specify an alternative VAE (like this [VAE](https://huggingface.co/madebyollin/sdxl-vae-fp16-fix)) by madebyollin which works in fp16)
- `--w_min` and `--w_max`: the minimum and maximum guidance scale values for guidance scale sampling
- `--num_ddim_timesteps`: the number of timesteps for DDIM sampling
- `--loss_type`: the type of loss (L2 or Huber) to calculate for latent consistency distillation; Huber loss is generally preferred because it's more robust to outliers
Expand Down Expand Up @@ -245,5 +245,5 @@ The SDXL training script is discussed in more detail in the [SDXL training](sdxl

Congratulations on distilling a LCM model! To learn more about LCM, the following may be helpful:

- Learn how to use [LCMs for inference](../using-diffusers/lcm) for text-to-image, image-to-image, and with LoRA checkpoints.
- Learn how to use [LCMs for inference](../using-diffusers/inference_with_lcm) for text-to-image, image-to-image, and with LoRA checkpoints.
- Read the [SDXL in 4 steps with Latent Consistency LoRAs](https://huggingface.co/blog/lcm_lora) blog post to learn more about SDXL LCM-LoRA's for super fast inference, quality comparisons, benchmarks, and more.
2 changes: 1 addition & 1 deletion docs/source/en/training/lora.md
Original file line number Diff line number Diff line change
Expand Up @@ -198,5 +198,5 @@ image = pipeline("A naruto with blue eyes").images[0]

Congratulations on training a new model with LoRA! To learn more about how to use your new model, the following guides may be helpful:

- Learn how to [load different LoRA formats](../using-diffusers/loading_adapters#LoRA) trained using community trainers like Kohya and TheLastBen.
- Learn how to [load different LoRA formats](../tutorials/using_peft_for_inference) trained using community trainers like Kohya and TheLastBen.
- Learn how to use and [combine multiple LoRA's](../tutorials/using_peft_for_inference) with PEFT for inference.
2 changes: 1 addition & 1 deletion docs/source/en/training/text2image.md
Original file line number Diff line number Diff line change
Expand Up @@ -178,5 +178,5 @@ image.save("yoda-naruto.png")

Congratulations on training your own text-to-image model! To learn more about how to use your new model, the following guides may be helpful:

- Learn how to [load LoRA weights](../using-diffusers/loading_adapters#LoRA) for inference if you trained your model with LoRA.
- Learn how to [load LoRA weights](../tutorials/using_peft_for_inference) for inference if you trained your model with LoRA.
- Learn more about how certain parameters like guidance scale or techniques such as prompt weighting can help you control inference in the [Text-to-image](../using-diffusers/conditional_image_generation) task guide.
3 changes: 1 addition & 2 deletions docs/source/en/training/text_inversion.md
Original file line number Diff line number Diff line change
Expand Up @@ -203,5 +203,4 @@ image.save("cat-train.png")

Congratulations on training your own Textual Inversion model! 🎉 To learn more about how to use your new model, the following guides may be helpful:

- Learn how to [load Textual Inversion embeddings](../using-diffusers/loading_adapters) and also use them as negative embeddings.
- Learn how to use [Textual Inversion](textual_inversion_inference) for inference with Stable Diffusion 1/2 and Stable Diffusion XL.
- Learn how to [load Textual Inversion embeddings](../using-diffusers/textual_inversion_inference) and also use them as negative embeddings.
34 changes: 0 additions & 34 deletions docs/source/en/using-diffusers/controlling_generation.md
Original file line number Diff line number Diff line change
Expand Up @@ -70,32 +70,6 @@ For convenience, we provide a table to denote which methods are inference-only a
[InstructPix2Pix](../api/pipelines/pix2pix) is fine-tuned from Stable Diffusion to support editing input images. It takes as inputs an image and a prompt describing an edit, and it outputs the edited image.
InstructPix2Pix has been explicitly trained to work well with [InstructGPT](https://openai.com/blog/instruction-following/)-like prompts.

## Pix2Pix Zero

[Paper](https://huggingface.co/papers/2302.03027)

[Pix2Pix Zero](../api/pipelines/pix2pix_zero) allows modifying an image so that one concept or subject is translated to another one while preserving general image semantics.

The denoising process is guided from one conceptual embedding towards another conceptual embedding. The intermediate latents are optimized during the denoising process to push the attention maps towards reference attention maps. The reference attention maps are from the denoising process of the input image and are used to encourage semantic preservation.

Pix2Pix Zero can be used both to edit synthetic images as well as real images.

- To edit synthetic images, one first generates an image given a caption.
Next, we generate image captions for the concept that shall be edited and for the new target concept. We can use a model like [Flan-T5](https://huggingface.co/docs/transformers/model_doc/flan-t5) for this purpose. Then, "mean" prompt embeddings for both the source and target concepts are created via the text encoder. Finally, the pix2pix-zero algorithm is used to edit the synthetic image.
- To edit a real image, one first generates an image caption using a model like [BLIP](https://huggingface.co/docs/transformers/model_doc/blip). Then one applies DDIM inversion on the prompt and image to generate "inverse" latents. Similar to before, "mean" prompt embeddings for both source and target concepts are created and finally the pix2pix-zero algorithm in combination with the "inverse" latents is used to edit the image.

> [!TIP]
> Pix2Pix Zero is the first model that allows "zero-shot" image editing. This means that the model
> can edit an image in less than a minute on a consumer GPU as shown [here](../api/pipelines/pix2pix_zero#usage-example).

As mentioned above, Pix2Pix Zero includes optimizing the latents (and not any of the UNet, VAE, or the text encoder) to steer the generation toward a specific concept. This means that the overall
pipeline might require more memory than a standard [StableDiffusionPipeline](../api/pipelines/stable_diffusion/text2img).

> [!TIP]
> An important distinction between methods like InstructPix2Pix and Pix2Pix Zero is that the former
> involves fine-tuning the pre-trained weights while the latter does not. This means that you can
> apply Pix2Pix Zero to any of the available Stable Diffusion models.

## Attend and Excite

[Paper](https://huggingface.co/papers/2301.13826)
Expand Down Expand Up @@ -178,14 +152,6 @@ multi-concept training by design. Like DreamBooth and Textual Inversion, Custom
teach a pre-trained text-to-image diffusion model about new concepts to generate outputs involving the
concept(s) of interest.

## Model Editing

[Paper](https://huggingface.co/papers/2303.08084)

The [text-to-image model editing pipeline](../api/pipelines/model_editing) helps you mitigate some of the incorrect implicit assumptions a pre-trained text-to-image
diffusion model might make about the subjects present in the input prompt. For example, if you prompt Stable Diffusion to generate images for "A pack of roses", the roses in the generated images
are more likely to be red. This pipeline helps you change that assumption.

## DiffEdit

[Paper](https://huggingface.co/papers/2210.11427)
Expand Down
2 changes: 1 addition & 1 deletion docs/source/en/using-diffusers/inference_with_lcm.md
Original file line number Diff line number Diff line change
Expand Up @@ -257,7 +257,7 @@ LCMs are compatible with adapters like LoRA, ControlNet, T2I-Adapter, and Animat

### LoRA

[LoRA](../using-diffusers/loading_adapters#lora) adapters can be rapidly finetuned to learn a new style from just a few images and plugged into a pretrained model to generate images in that style.
[LoRA](../tutorials/using_peft_for_inference) adapters can be rapidly finetuned to learn a new style from just a few images and plugged into a pretrained model to generate images in that style.

<hfoptions id="lcm-lora">
<hfoption id="LCM">
Expand Down
4 changes: 2 additions & 2 deletions docs/source/en/using-diffusers/inference_with_tcd_lora.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ Trajectory Consistency Distillation (TCD) enables a model to generate higher qua

The major advantages of TCD are:

- Better than Teacher: TCD demonstrates superior generative quality at both small and large inference steps and exceeds the performance of [DPM-Solver++(2S)](../../api/schedulers/multistep_dpm_solver) with Stable Diffusion XL (SDXL). There is no additional discriminator or LPIPS supervision included during TCD training.
- Better than Teacher: TCD demonstrates superior generative quality at both small and large inference steps and exceeds the performance of [DPM-Solver++(2S)](../api/schedulers/multistep_dpm_solver) with Stable Diffusion XL (SDXL). There is no additional discriminator or LPIPS supervision included during TCD training.

- Flexible Inference Steps: The inference steps for TCD sampling can be freely adjusted without adversely affecting the image quality.

Expand Down Expand Up @@ -166,7 +166,7 @@ image = pipe(
TCD-LoRA also supports other LoRAs trained on different styles. For example, let's load the [TheLastBen/Papercut_SDXL](https://huggingface.co/TheLastBen/Papercut_SDXL) LoRA and fuse it with the TCD-LoRA with the [`~loaders.UNet2DConditionLoadersMixin.set_adapters`] method.

> [!TIP]
> Check out the [Merge LoRAs](merge_loras) guide to learn more about efficient merging methods.
> Check out the [Merge LoRAs](../tutorials/using_peft_for_inference#merge) guide to learn more about efficient merging methods.
```python
import torch
Expand Down
2 changes: 1 addition & 1 deletion docs/source/en/using-diffusers/sdxl.md
Original file line number Diff line number Diff line change
Expand Up @@ -280,7 +280,7 @@ refiner = DiffusionPipeline.from_pretrained(
```

> [!WARNING]
> You can use SDXL refiner with a different base model. For example, you can use the [Hunyuan-DiT](../../api/pipelines/hunyuandit) or [PixArt-Sigma](../../api/pipelines/pixart_sigma) pipelines to generate images with better prompt adherence. Once you have generated an image, you can pass it to the SDXL refiner model to enhance final generation quality.
> You can use SDXL refiner with a different base model. For example, you can use the [Hunyuan-DiT](../api/pipelines/hunyuandit) or [PixArt-Sigma](../api/pipelines/pixart_sigma) pipelines to generate images with better prompt adherence. Once you have generated an image, you can pass it to the SDXL refiner model to enhance final generation quality.

Generate an image from the base model, and set the model output to **latent** space:

Expand Down
2 changes: 1 addition & 1 deletion docs/source/en/using-diffusers/write_own_pipeline.md
Original file line number Diff line number Diff line change
Expand Up @@ -280,5 +280,5 @@ This is really what 🧨 Diffusers is designed for: to make it intuitive and eas

For your next steps, feel free to:

* Learn how to [build and contribute a pipeline](../using-diffusers/contribute_pipeline) to 🧨 Diffusers. We can't wait and see what you'll come up with!
* Learn how to [build and contribute a pipeline](../conceptual/contribution) to 🧨 Diffusers. We can't wait and see what you'll come up with!
* Explore [existing pipelines](../api/pipelines/overview) in the library, and see if you can deconstruct and build a pipeline from scratch using the models and schedulers separately.