Skip to content

Commit 846d949

Browse files
committed
feedback
1 parent d5fee82 commit 846d949

File tree

1 file changed

+19
-17
lines changed

1 file changed

+19
-17
lines changed

docs/source/en/quicktour.md

Lines changed: 19 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -21,21 +21,23 @@ This Quickstart will give you an overview of Diffusers and get you up and genera
2121
> [!TIP]
2222
> Before you begin, make sure you have a Hugging Face [account](https://huggingface.co/join) in order to use models like [Flux](https://huggingface.co/black-forest-labs/FLUX.1-dev).
2323
24-
Follow the [Installation](./installation) guide to install Diffusers if you haven't already installed it.
24+
Follow the [Installation](./installation) guide to install Diffusers if it's not already installed.
2525

2626
## DiffusionPipeline
2727

28-
A diffusion model is actually comprised of several components that work together to generate an image or video that matches your prompt.
28+
A diffusion model combines multiple components to generate outputs in any modality based on an input, such as a text description or image.
2929

30-
1. A text encoder turns your prompt into embeddings which steers the denoising process.
31-
2. A scheduler contains the *denoising* specifics such as how much noise to remove at each step and the step size. Schedulers affect denoising speed and generation quality.
32-
3. A UNet or diffusion transformer is the workhorse of a diffusion model.
30+
For a standard text-to-image model:
3331

34-
At each step, it takes the noisy input and predicts the noise based on the step and embeddings. The scheduler uses this prediction to subtract the appropriate amount of noise to produce a slightly cleaner image.
35-
36-
The UNet or diffusion transformer repeats this loop for a set amount of steps to produce the final latent output.
32+
1. A text encoder turns a prompt into embeddings that guide the denoising process.
33+
2. A scheduler contains the algorithmic specifics for gradually denoising initial random noise into clean outputs. Different schedulers affect generation speed and quality.
34+
3. A UNet or diffusion transformer (DiT) is the workhorse of a diffusion model.
35+
36+
At each step, it performs the denoising predictions, such as how much noise to remove or the general direction in which to steer the noise to generate better quality outputs.
3737

38-
4. A decoder (variational autoencoder) converts the *latents* into the actual image or video. Latents are a compressed representation of the input, making it more efficient to work with than the actual pixels.
38+
The UNet or DiT repeats this loop for a set amount of steps to generate the final output.
39+
40+
4. A variational autoencoder (VAE) encodes and decodes pixels to a spatially compressed latent-space. *Latents* are compressed representations of an image and are more efficient to work with. The UNet or DiT operates on latents, and the clean latents at the end are decoded back into images.
3941

4042
The [`DiffusionPipeline`] packages all these components into a single class for inference. There are several arguments in [`DiffusionPipeline`] you can change, such as `num_inference_steps`, that affect the diffusion process. Try different values and arguments to see how they change generation quality or speed.
4143

@@ -115,7 +117,7 @@ export_to_video(video, "output.mp4", fps=15)
115117

116118
Adapters insert a small number of trainable parameters to the original base model. Only the inserted parameters are fine-tuned while the rest of the model weights remain frozen. This makes it fast and cheap to fine-tune a model on a new style. Among adapters, [LoRA's](./tutorials/using_peft_for_inference) are the most popular.
117119

118-
Add a LoRA to your pipeline with the [`~loaders.FluxLoraLoaderMixin.load_lora_weights`] method. Some LoRA's require a special word to trigger it, such as `GHIBSKY style`, in the example below. Check a LoRA's model card to see if it requires a trigger word.
120+
Add a LoRA to a pipeline with the [`~loaders.FluxLoraLoaderMixin.load_lora_weights`] method. Some LoRA's require a special word to trigger it, such as `GHIBSKY style`, in the example below. Check a LoRA's model card to see if it requires a trigger word.
119121

120122
```py
121123
import torch
@@ -145,7 +147,7 @@ Check out the [LoRA](./tutorials/using_peft_for_inference) docs or Adapters sect
145147

146148
[Quantization](./quantization/overview) stores data in fewer bits to reduce memory usage. It may also speed up inference because it takes less time to perform calculations with fewer bits.
147149

148-
Diffusers provides several quantization backends, and picking one depends on your use case. For example, [bitsandbytes](./quantization/bitsandbytes) and [torchao](./quantization/torchao) are both simple and easy to use for inference, but torchao supports more [quantization types](./quantization/torchao#supported-quantization-types) like fp8.
150+
Diffusers provides several quantization backends and picking one depends on your use case. For example, [bitsandbytes](./quantization/bitsandbytes) and [torchao](./quantization/torchao) are both simple and easy to use for inference, but torchao supports more [quantization types](./quantization/torchao#supported-quantization-types) like fp8.
149151

150152
Configure [`PipelineQuantizationConfig`] with the backend to use, the specific arguments (refer to the [API](./api/quantization) reference for available arguments) for that backend, and which components to quantize. The example below quantizes the model to 4-bits and only uses 14.93GB of memory.
151153

@@ -177,15 +179,15 @@ Take a look at the [Quantization](./quantization/overview) section for more deta
177179

178180
## Optimizations
179181

180-
Modern diffusion models are very large and have billions of parameters. The iterative denoising process is also computationally intensive and slow. Diffusers provides techniques for reducing memory usage and boosting inference speed. These techniques can be used together and with quantization to optimize for both memory and inference speed.
182+
Modern diffusion models are very large and have billions of parameters. The iterative denoising process is also computationally intensive and slow. Diffusers provides techniques for reducing memory usage and boosting inference speed. These techniques can be combined with quantization to optimize for both memory usage and inference speed.
181183

182-
### Memory
184+
### Memory usage
183185

184-
The text encoders and UNet or diffusion transformer can use up as much as ~30GB of memory, exceeding the amount available on many free-tier or consumer GPUs.
186+
The text encoders and UNet or DiT can use up as much as ~30GB of memory, exceeding the amount available on many free-tier or consumer GPUs.
185187

186-
Offloading stores weights that aren't currently used on the CPU and only moves them to the GPU when they're needed. There are a few offloading types and the example below uses [model offloading](./optimization/memory#model-offloading) to move an entire model, like a text encoder or transformer, to the CPU when it isn't being used.
188+
Offloading stores weights that aren't currently used on the CPU and only moves them to the GPU when they're needed. There are a few offloading types and the example below uses [model offloading](./optimization/memory#model-offloading). This moves an entire model, like a text encoder or transformer, to the CPU when it isn't actively being used.
187189

188-
Call [`~DiffusionPipeline.enable_model_cpu_offload`] to activate it on your pipeline. By combining quantization and offloading, the following example only requires ~12.54GB of memory.
190+
Call [`~DiffusionPipeline.enable_model_cpu_offload`] to activate it. By combining quantization and offloading, the following example only requires ~12.54GB of memory.
189191

190192
```py
191193
import torch
@@ -216,7 +218,7 @@ Refer to the [Reduce memory usage](./optimization/memory) docs to learn more abo
216218

217219
### Inference speed
218220

219-
The denoising loop performs a lot of computations and can be slow. Methods like [torch.compile](./optimization/fp16#torchcompile) increases inference speed by compiling the computations into an optimized kernel. Compilation is slow at first but it should be much faster the next time as long as the input remains the same.
221+
The denoising loop performs a lot of computations and can be slow. Methods like [torch.compile](./optimization/fp16#torchcompile) increases inference speed by compiling the computations into an optimized kernel. Compilation is slow for the first generation but successive generations should be much faster.
220222

221223
The example below uses [regional compilation](./optimization/fp16#regional-compilation) to only compile small regions of a model. It reduces cold-start latency while also providing a runtime speed up.
222224

0 commit comments

Comments
 (0)