diff --git a/docs/source/en/api/loaders/ip_adapter.md b/docs/source/en/api/loaders/ip_adapter.md index c2c45bee1022..508e6d4ee6ae 100644 --- a/docs/source/en/api/loaders/ip_adapter.md +++ b/docs/source/en/api/loaders/ip_adapter.md @@ -15,7 +15,7 @@ specific language governing permissions and limitations under the License. [IP-Adapter](https://hf.co/papers/2308.06721) is a lightweight adapter that enables prompting a diffusion model with an image. This method decouples the cross-attention layers of the image and text features. The image features are generated from an image encoder. > [!TIP] -> Learn how to load an IP-Adapter checkpoint and image in the IP-Adapter [loading](../../using-diffusers/loading_adapters#ip-adapter) guide, and you can see how to use it in the [usage](../../using-diffusers/ip_adapter) guide. +> Learn how to load and use an IP-Adapter checkpoint and image in the [IP-Adapter](../../using-diffusers/ip_adapter) guide,. ## IPAdapterMixin diff --git a/docs/source/en/api/loaders/lora.md b/docs/source/en/api/loaders/lora.md index bf22a32d74f3..b1d1ffb63423 100644 --- a/docs/source/en/api/loaders/lora.md +++ b/docs/source/en/api/loaders/lora.md @@ -34,7 +34,7 @@ LoRA is a fast and lightweight training method that inserts and trains a signifi - [`LoraBaseMixin`] provides a base class with several utility methods to fuse, unfuse, unload, LoRAs and more. > [!TIP] -> To learn more about how to load LoRA weights, see the [LoRA](../../using-diffusers/loading_adapters#lora) loading guide. +> To learn more about how to load LoRA weights, see the [LoRA](../../tutorials/using_peft_for_inference) loading guide. ## LoraBaseMixin diff --git a/docs/source/en/api/loaders/peft.md b/docs/source/en/api/loaders/peft.md index 5508509c8823..c514766dd87f 100644 --- a/docs/source/en/api/loaders/peft.md +++ b/docs/source/en/api/loaders/peft.md @@ -12,7 +12,7 @@ specific language governing permissions and limitations under the License. # PEFT -Diffusers supports loading adapters such as [LoRA](../../using-diffusers/loading_adapters) with the [PEFT](https://huggingface.co/docs/peft/index) library with the [`~loaders.peft.PeftAdapterMixin`] class. This allows modeling classes in Diffusers like [`UNet2DConditionModel`], [`SD3Transformer2DModel`] to operate with an adapter. +Diffusers supports loading adapters such as [LoRA](../../tutorials/using_peft_for_inference) with the [PEFT](https://huggingface.co/docs/peft/index) library with the [`~loaders.peft.PeftAdapterMixin`] class. This allows modeling classes in Diffusers like [`UNet2DConditionModel`], [`SD3Transformer2DModel`] to operate with an adapter. > [!TIP] > Refer to the [Inference with PEFT](../../tutorials/using_peft_for_inference.md) tutorial for an overview of how to use PEFT in Diffusers for inference. diff --git a/docs/source/en/api/loaders/textual_inversion.md b/docs/source/en/api/loaders/textual_inversion.md index 2cb54ce4ea3a..5e8bfac255d0 100644 --- a/docs/source/en/api/loaders/textual_inversion.md +++ b/docs/source/en/api/loaders/textual_inversion.md @@ -17,7 +17,7 @@ Textual Inversion is a training method for personalizing models by learning new [`TextualInversionLoaderMixin`] provides a function for loading Textual Inversion embeddings from Diffusers and Automatic1111 into the text encoder and loading a special token to activate the embeddings. > [!TIP] -> To learn more about how to load Textual Inversion embeddings, see the [Textual Inversion](../../using-diffusers/loading_adapters#textual-inversion) loading guide. +> To learn more about how to load Textual Inversion embeddings, see the [Textual Inversion](../../using-diffusers/textual_inversion_inference) loading guide. ## TextualInversionLoaderMixin diff --git a/docs/source/en/api/loaders/transformer_sd3.md b/docs/source/en/api/loaders/transformer_sd3.md index cc3ec0da145e..2c8b81b59cf9 100644 --- a/docs/source/en/api/loaders/transformer_sd3.md +++ b/docs/source/en/api/loaders/transformer_sd3.md @@ -17,7 +17,7 @@ This class is useful when *only* loading weights into a [`SD3Transformer2DModel` The [`SD3Transformer2DLoadersMixin`] class currently only loads IP-Adapter weights, but will be used in the future to save weights and load LoRAs. > [!TIP] -> To learn more about how to load LoRA weights, see the [LoRA](../../using-diffusers/loading_adapters#lora) loading guide. +> To learn more about how to load LoRA weights, see the [LoRA](../../tutorials/using_peft_for_inference) loading guide. ## SD3Transformer2DLoadersMixin diff --git a/docs/source/en/api/loaders/unet.md b/docs/source/en/api/loaders/unet.md index 7450e03e580b..50d210bbf53f 100644 --- a/docs/source/en/api/loaders/unet.md +++ b/docs/source/en/api/loaders/unet.md @@ -17,7 +17,7 @@ Some training methods - like LoRA and Custom Diffusion - typically target the UN The [`UNet2DConditionLoadersMixin`] class provides functions for loading and saving weights, fusing and unfusing LoRAs, disabling and enabling LoRAs, and setting and deleting adapters. > [!TIP] -> To learn more about how to load LoRA weights, see the [LoRA](../../using-diffusers/loading_adapters#lora) loading guide. +> To learn more about how to load LoRA weights, see the [LoRA](../../tutorials/using_peft_for_inference) guide. ## UNet2DConditionLoadersMixin diff --git a/docs/source/en/api/pipelines/flux.md b/docs/source/en/api/pipelines/flux.md index 1a89de98e48f..358b8139c73a 100644 --- a/docs/source/en/api/pipelines/flux.md +++ b/docs/source/en/api/pipelines/flux.md @@ -418,7 +418,7 @@ When unloading the Control LoRA weights, call `pipe.unload_lora_weights(reset_to ## IP-Adapter > [!TIP] -> Check out [IP-Adapter](../../../using-diffusers/ip_adapter) to learn more about how IP-Adapters work. +> Check out [IP-Adapter](../../using-diffusers/ip_adapter) to learn more about how IP-Adapters work. An IP-Adapter lets you prompt Flux with images, in addition to the text prompt. This is especially useful when describing complex concepts that are difficult to articulate through text alone and you have reference images. diff --git a/docs/source/en/api/pipelines/hidream.md b/docs/source/en/api/pipelines/hidream.md index acfcef93e0ad..add4ad313231 100644 --- a/docs/source/en/api/pipelines/hidream.md +++ b/docs/source/en/api/pipelines/hidream.md @@ -21,7 +21,7 @@ ## Available models -The following models are available for the [`HiDreamImagePipeline`](text-to-image) pipeline: +The following models are available for the [`HiDreamImagePipeline`] pipeline: | Model name | Description | |:---|:---| diff --git a/docs/source/en/api/pipelines/overview.md b/docs/source/en/api/pipelines/overview.md index ce883931df21..22fcf560eaca 100644 --- a/docs/source/en/api/pipelines/overview.md +++ b/docs/source/en/api/pipelines/overview.md @@ -32,7 +32,7 @@ The table below lists all the pipelines currently available in ๐Ÿค— Diffusers an | [Attend-and-Excite](attend_and_excite) | text2image | | [AudioLDM](audioldm) | text2audio | | [AudioLDM2](audioldm2) | text2audio | -| [AuraFlow](auraflow) | text2image | +| [AuraFlow](aura_flow) | text2image | | [BLIP Diffusion](blip_diffusion) | text2image | | [Bria 3.2](bria_3_2) | text2image | | [CogVideoX](cogvideox) | text2video | diff --git a/docs/source/en/api/pipelines/stable_diffusion/stable_diffusion_3.md b/docs/source/en/api/pipelines/stable_diffusion/stable_diffusion_3.md index 3c49df101c1e..4c7e5b107316 100644 --- a/docs/source/en/api/pipelines/stable_diffusion/stable_diffusion_3.md +++ b/docs/source/en/api/pipelines/stable_diffusion/stable_diffusion_3.md @@ -271,7 +271,7 @@ Check out the full script [here](https://gist.github.com/sayakpaul/508d89d7aad4f Quantization helps reduce the memory requirements of very large models by storing model weights in a lower precision data type. However, quantization may have varying impact on video quality depending on the video model. -Refer to the [Quantization](../../quantization/overview) overview to learn more about supported quantization backends and selecting a quantization backend that supports your use case. The example below demonstrates how to load a quantized [`StableDiffusion3Pipeline`] for inference with bitsandbytes. +Refer to the [Quantization](../../../quantization/overview) overview to learn more about supported quantization backends and selecting a quantization backend that supports your use case. The example below demonstrates how to load a quantized [`StableDiffusion3Pipeline`] for inference with bitsandbytes. ```py import torch diff --git a/docs/source/en/api/pipelines/stable_diffusion/svd.md b/docs/source/en/api/pipelines/stable_diffusion/svd.md index 0c33c0600728..a00dd3ef6d85 100644 --- a/docs/source/en/api/pipelines/stable_diffusion/svd.md +++ b/docs/source/en/api/pipelines/stable_diffusion/svd.md @@ -29,7 +29,7 @@ The abstract from the paper is: Video generation is memory-intensive and one way to reduce your memory usage is to set `enable_forward_chunking` on the pipeline's UNet so you don't run the entire feedforward layer at once. Breaking it up into chunks in a loop is more efficient. -Check out the [Text or image-to-video](text-img2vid) guide for more details about how certain parameters can affect video generation and how to optimize inference by reducing memory usage. +Check out the [Text or image-to-video](../../../using-diffusers/text-img2vid) guide for more details about how certain parameters can affect video generation and how to optimize inference by reducing memory usage. ## StableVideoDiffusionPipeline diff --git a/docs/source/en/api/pipelines/text_to_video.md b/docs/source/en/api/pipelines/text_to_video.md index d7c37be6371e..d9f6d8e722ac 100644 --- a/docs/source/en/api/pipelines/text_to_video.md +++ b/docs/source/en/api/pipelines/text_to_video.md @@ -172,7 +172,7 @@ Here are some sample outputs: Video generation is memory-intensive and one way to reduce your memory usage is to set `enable_forward_chunking` on the pipeline's UNet so you don't run the entire feedforward layer at once. Breaking it up into chunks in a loop is more efficient. -Check out the [Text or image-to-video](text-img2vid) guide for more details about how certain parameters can affect video generation and how to optimize inference by reducing memory usage. +Check out the [Text or image-to-video](../../using-diffusers/text-img2vid) guide for more details about how certain parameters can affect video generation and how to optimize inference by reducing memory usage. > [!TIP] > Make sure to check out the Schedulers [guide](../../using-diffusers/schedulers) to learn how to explore the tradeoff between scheduler speed and quality, and see the [reuse components across pipelines](../../using-diffusers/loading#reuse-a-pipeline) section to learn how to efficiently load the same components into multiple pipelines. diff --git a/docs/source/en/training/dreambooth.md b/docs/source/en/training/dreambooth.md index 81ed09c9d002..2302739a0e34 100644 --- a/docs/source/en/training/dreambooth.md +++ b/docs/source/en/training/dreambooth.md @@ -548,4 +548,4 @@ Training the DeepFloyd IF model can be challenging, but here are some tips that Congratulations on training your DreamBooth model! To learn more about how to use your new model, the following guide may be helpful: -- Learn how to [load a DreamBooth](../using-diffusers/loading_adapters) model for inference if you trained your model with LoRA. \ No newline at end of file +- Learn how to [load a DreamBooth](../using-diffusers/dreambooth) model for inference if you trained your model with LoRA. \ No newline at end of file diff --git a/docs/source/en/training/lcm_distill.md b/docs/source/en/training/lcm_distill.md index 232f2eceed5d..4750f150367e 100644 --- a/docs/source/en/training/lcm_distill.md +++ b/docs/source/en/training/lcm_distill.md @@ -75,7 +75,7 @@ accelerate launch train_lcm_distill_sd_wds.py \ Most of the parameters are identical to the parameters in the [Text-to-image](text2image#script-parameters) training guide, so you'll focus on the parameters that are relevant to latent consistency distillation in this guide. - `--pretrained_teacher_model`: the path to a pretrained latent diffusion model to use as the teacher model -- `--pretrained_vae_model_name_or_path`: path to a pretrained VAE; the SDXL VAE is known to suffer from numerical instability, so this parameter allows you to specify an alternative VAE (like this [VAE]((https://huggingface.co/madebyollin/sdxl-vae-fp16-fix)) by madebyollin which works in fp16) +- `--pretrained_vae_model_name_or_path`: path to a pretrained VAE; the SDXL VAE is known to suffer from numerical instability, so this parameter allows you to specify an alternative VAE (like this [VAE](https://huggingface.co/madebyollin/sdxl-vae-fp16-fix)) by madebyollin which works in fp16) - `--w_min` and `--w_max`: the minimum and maximum guidance scale values for guidance scale sampling - `--num_ddim_timesteps`: the number of timesteps for DDIM sampling - `--loss_type`: the type of loss (L2 or Huber) to calculate for latent consistency distillation; Huber loss is generally preferred because it's more robust to outliers @@ -245,5 +245,5 @@ The SDXL training script is discussed in more detail in the [SDXL training](sdxl Congratulations on distilling a LCM model! To learn more about LCM, the following may be helpful: -- Learn how to use [LCMs for inference](../using-diffusers/lcm) for text-to-image, image-to-image, and with LoRA checkpoints. +- Learn how to use [LCMs for inference](../using-diffusers/inference_with_lcm) for text-to-image, image-to-image, and with LoRA checkpoints. - Read the [SDXL in 4 steps with Latent Consistency LoRAs](https://huggingface.co/blog/lcm_lora) blog post to learn more about SDXL LCM-LoRA's for super fast inference, quality comparisons, benchmarks, and more. diff --git a/docs/source/en/training/lora.md b/docs/source/en/training/lora.md index 45a234b76a61..efb170e329d4 100644 --- a/docs/source/en/training/lora.md +++ b/docs/source/en/training/lora.md @@ -198,5 +198,5 @@ image = pipeline("A naruto with blue eyes").images[0] Congratulations on training a new model with LoRA! To learn more about how to use your new model, the following guides may be helpful: -- Learn how to [load different LoRA formats](../using-diffusers/loading_adapters#LoRA) trained using community trainers like Kohya and TheLastBen. +- Learn how to [load different LoRA formats](../tutorials/using_peft_for_inference) trained using community trainers like Kohya and TheLastBen. - Learn how to use and [combine multiple LoRA's](../tutorials/using_peft_for_inference) with PEFT for inference. diff --git a/docs/source/en/training/text2image.md b/docs/source/en/training/text2image.md index a9327457c783..d11e55e91018 100644 --- a/docs/source/en/training/text2image.md +++ b/docs/source/en/training/text2image.md @@ -178,5 +178,5 @@ image.save("yoda-naruto.png") Congratulations on training your own text-to-image model! To learn more about how to use your new model, the following guides may be helpful: -- Learn how to [load LoRA weights](../using-diffusers/loading_adapters#LoRA) for inference if you trained your model with LoRA. +- Learn how to [load LoRA weights](../tutorials/using_peft_for_inference) for inference if you trained your model with LoRA. - Learn more about how certain parameters like guidance scale or techniques such as prompt weighting can help you control inference in the [Text-to-image](../using-diffusers/conditional_image_generation) task guide. diff --git a/docs/source/en/training/text_inversion.md b/docs/source/en/training/text_inversion.md index 0b540107e9b2..4912b6730a61 100644 --- a/docs/source/en/training/text_inversion.md +++ b/docs/source/en/training/text_inversion.md @@ -203,5 +203,4 @@ image.save("cat-train.png") Congratulations on training your own Textual Inversion model! ๐ŸŽ‰ To learn more about how to use your new model, the following guides may be helpful: -- Learn how to [load Textual Inversion embeddings](../using-diffusers/loading_adapters) and also use them as negative embeddings. -- Learn how to use [Textual Inversion](textual_inversion_inference) for inference with Stable Diffusion 1/2 and Stable Diffusion XL. \ No newline at end of file +- Learn how to [load Textual Inversion embeddings](../using-diffusers/textual_inversion_inference) and also use them as negative embeddings. \ No newline at end of file diff --git a/docs/source/en/using-diffusers/controlling_generation.md b/docs/source/en/using-diffusers/controlling_generation.md index aed3a8b729ac..b7b0ea491949 100644 --- a/docs/source/en/using-diffusers/controlling_generation.md +++ b/docs/source/en/using-diffusers/controlling_generation.md @@ -70,32 +70,6 @@ For convenience, we provide a table to denote which methods are inference-only a [InstructPix2Pix](../api/pipelines/pix2pix) is fine-tuned from Stable Diffusion to support editing input images. It takes as inputs an image and a prompt describing an edit, and it outputs the edited image. InstructPix2Pix has been explicitly trained to work well with [InstructGPT](https://openai.com/blog/instruction-following/)-like prompts. -## Pix2Pix Zero - -[Paper](https://huggingface.co/papers/2302.03027) - -[Pix2Pix Zero](../api/pipelines/pix2pix_zero) allows modifying an image so that one concept or subject is translated to another one while preserving general image semantics. - -The denoising process is guided from one conceptual embedding towards another conceptual embedding. The intermediate latents are optimized during the denoising process to push the attention maps towards reference attention maps. The reference attention maps are from the denoising process of the input image and are used to encourage semantic preservation. - -Pix2Pix Zero can be used both to edit synthetic images as well as real images. - -- To edit synthetic images, one first generates an image given a caption. - Next, we generate image captions for the concept that shall be edited and for the new target concept. We can use a model like [Flan-T5](https://huggingface.co/docs/transformers/model_doc/flan-t5) for this purpose. Then, "mean" prompt embeddings for both the source and target concepts are created via the text encoder. Finally, the pix2pix-zero algorithm is used to edit the synthetic image. -- To edit a real image, one first generates an image caption using a model like [BLIP](https://huggingface.co/docs/transformers/model_doc/blip). Then one applies DDIM inversion on the prompt and image to generate "inverse" latents. Similar to before, "mean" prompt embeddings for both source and target concepts are created and finally the pix2pix-zero algorithm in combination with the "inverse" latents is used to edit the image. - -> [!TIP] -> Pix2Pix Zero is the first model that allows "zero-shot" image editing. This means that the model -> can edit an image in less than a minute on a consumer GPU as shown [here](../api/pipelines/pix2pix_zero#usage-example). - -As mentioned above, Pix2Pix Zero includes optimizing the latents (and not any of the UNet, VAE, or the text encoder) to steer the generation toward a specific concept. This means that the overall -pipeline might require more memory than a standard [StableDiffusionPipeline](../api/pipelines/stable_diffusion/text2img). - -> [!TIP] -> An important distinction between methods like InstructPix2Pix and Pix2Pix Zero is that the former -> involves fine-tuning the pre-trained weights while the latter does not. This means that you can -> apply Pix2Pix Zero to any of the available Stable Diffusion models. - ## Attend and Excite [Paper](https://huggingface.co/papers/2301.13826) @@ -178,14 +152,6 @@ multi-concept training by design. Like DreamBooth and Textual Inversion, Custom teach a pre-trained text-to-image diffusion model about new concepts to generate outputs involving the concept(s) of interest. -## Model Editing - -[Paper](https://huggingface.co/papers/2303.08084) - -The [text-to-image model editing pipeline](../api/pipelines/model_editing) helps you mitigate some of the incorrect implicit assumptions a pre-trained text-to-image -diffusion model might make about the subjects present in the input prompt. For example, if you prompt Stable Diffusion to generate images for "A pack of roses", the roses in the generated images -are more likely to be red. This pipeline helps you change that assumption. - ## DiffEdit [Paper](https://huggingface.co/papers/2210.11427) diff --git a/docs/source/en/using-diffusers/inference_with_lcm.md b/docs/source/en/using-diffusers/inference_with_lcm.md index d0a47449ad88..cde4168d38e0 100644 --- a/docs/source/en/using-diffusers/inference_with_lcm.md +++ b/docs/source/en/using-diffusers/inference_with_lcm.md @@ -257,7 +257,7 @@ LCMs are compatible with adapters like LoRA, ControlNet, T2I-Adapter, and Animat ### LoRA -[LoRA](../using-diffusers/loading_adapters#lora) adapters can be rapidly finetuned to learn a new style from just a few images and plugged into a pretrained model to generate images in that style. +[LoRA](../tutorials/using_peft_for_inference) adapters can be rapidly finetuned to learn a new style from just a few images and plugged into a pretrained model to generate images in that style. diff --git a/docs/source/en/using-diffusers/inference_with_tcd_lora.md b/docs/source/en/using-diffusers/inference_with_tcd_lora.md index a4de12b5e722..2aaf9c8aa8e9 100644 --- a/docs/source/en/using-diffusers/inference_with_tcd_lora.md +++ b/docs/source/en/using-diffusers/inference_with_tcd_lora.md @@ -18,7 +18,7 @@ Trajectory Consistency Distillation (TCD) enables a model to generate higher qua The major advantages of TCD are: -- Better than Teacher: TCD demonstrates superior generative quality at both small and large inference steps and exceeds the performance of [DPM-Solver++(2S)](../../api/schedulers/multistep_dpm_solver) with Stable Diffusion XL (SDXL). There is no additional discriminator or LPIPS supervision included during TCD training. +- Better than Teacher: TCD demonstrates superior generative quality at both small and large inference steps and exceeds the performance of [DPM-Solver++(2S)](../api/schedulers/multistep_dpm_solver) with Stable Diffusion XL (SDXL). There is no additional discriminator or LPIPS supervision included during TCD training. - Flexible Inference Steps: The inference steps for TCD sampling can be freely adjusted without adversely affecting the image quality. @@ -166,7 +166,7 @@ image = pipe( TCD-LoRA also supports other LoRAs trained on different styles. For example, let's load the [TheLastBen/Papercut_SDXL](https://huggingface.co/TheLastBen/Papercut_SDXL) LoRA and fuse it with the TCD-LoRA with the [`~loaders.UNet2DConditionLoadersMixin.set_adapters`] method. > [!TIP] -> Check out the [Merge LoRAs](merge_loras) guide to learn more about efficient merging methods. +> Check out the [Merge LoRAs](../tutorials/using_peft_for_inference#merge) guide to learn more about efficient merging methods. ```python import torch diff --git a/docs/source/en/using-diffusers/sdxl.md b/docs/source/en/using-diffusers/sdxl.md index 79625e0c4a81..275394a03ca9 100644 --- a/docs/source/en/using-diffusers/sdxl.md +++ b/docs/source/en/using-diffusers/sdxl.md @@ -280,7 +280,7 @@ refiner = DiffusionPipeline.from_pretrained( ``` > [!WARNING] -> You can use SDXL refiner with a different base model. For example, you can use the [Hunyuan-DiT](../../api/pipelines/hunyuandit) or [PixArt-Sigma](../../api/pipelines/pixart_sigma) pipelines to generate images with better prompt adherence. Once you have generated an image, you can pass it to the SDXL refiner model to enhance final generation quality. +> You can use SDXL refiner with a different base model. For example, you can use the [Hunyuan-DiT](../api/pipelines/hunyuandit) or [PixArt-Sigma](../api/pipelines/pixart_sigma) pipelines to generate images with better prompt adherence. Once you have generated an image, you can pass it to the SDXL refiner model to enhance final generation quality. Generate an image from the base model, and set the model output to **latent** space: diff --git a/docs/source/en/using-diffusers/write_own_pipeline.md b/docs/source/en/using-diffusers/write_own_pipeline.md index 930b0fe21fc4..e34727b5da25 100644 --- a/docs/source/en/using-diffusers/write_own_pipeline.md +++ b/docs/source/en/using-diffusers/write_own_pipeline.md @@ -280,5 +280,5 @@ This is really what ๐Ÿงจ Diffusers is designed for: to make it intuitive and eas For your next steps, feel free to: -* Learn how to [build and contribute a pipeline](../using-diffusers/contribute_pipeline) to ๐Ÿงจ Diffusers. We can't wait and see what you'll come up with! +* Learn how to [build and contribute a pipeline](../conceptual/contribution) to ๐Ÿงจ Diffusers. We can't wait and see what you'll come up with! * Explore [existing pipelines](../api/pipelines/overview) in the library, and see if you can deconstruct and build a pipeline from scratch using the models and schedulers separately.