You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/source/en/quicktour.md
+32-48Lines changed: 32 additions & 48 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -12,24 +12,24 @@ specific language governing permissions and limitations under the License.
12
12
13
13
# Quickstart
14
14
15
-
Diffusers is a library for developers and researchers that provides an easy inference API for generating imagesand videos as well as the building blocks for implementing new workflows.
15
+
Diffusers is a library for developers and researchers that provides an easy inference API for generating images, videos and audio, as well as the building blocks for implementing new workflows.
16
16
17
17
Diffusers provides many optimizations out-of-the-box that makes it possible to load and run large models on setups with limited memory or to accelerate inference.
18
18
19
19
This Quickstart will give you an overview of Diffusers and get you up and generating quickly.
20
20
21
21
> [!TIP]
22
-
> Before you begin, make sure you have a Hugging Face [account](https://huggingface.co/join) in order to use models like [Flux](https://huggingface.co/black-forest-labs/FLUX.1-dev).
22
+
> Before you begin, make sure you have a Hugging Face [account](https://huggingface.co/join) in order to use gated models like [Flux](https://huggingface.co/black-forest-labs/FLUX.1-dev).
23
23
24
24
Follow the [Installation](./installation) guide to install Diffusers if it's not already installed.
25
25
26
26
## DiffusionPipeline
27
27
28
-
A diffusion model combines multiple components to generate outputs in any modality based on an input, such as a text descriptionor image.
28
+
A diffusion model combines multiple components to generate outputs in any modality based on an input, such as a text description, image or both.
29
29
30
30
For a standard text-to-image model:
31
31
32
-
1. A text encoder turns a prompt into embeddings that guide the denoising process.
32
+
1. A text encoder turns a prompt into embeddings that guide the denoising process. Some models have more than one text encoder.
33
33
2. A scheduler contains the algorithmic specifics for gradually denoising initial random noise into clean outputs. Different schedulers affect generation speed and quality.
34
34
3. A UNet or diffusion transformer (DiT) is the workhorse of a diffusion model.
35
35
@@ -39,7 +39,7 @@ For a standard text-to-image model:
39
39
40
40
4. A variational autoencoder (VAE) encodes and decodes pixels to a spatially compressed latent-space. *Latents* are compressed representations of an image and are more efficient to work with. The UNet or DiT operates on latents, and the clean latents at the end are decoded back into images.
41
41
42
-
The [`DiffusionPipeline`] packages all these components into a single class for inference. There are several arguments in [`DiffusionPipeline`] you can change, such as `num_inference_steps`, that affect the diffusion process. Try different values and arguments to see how they change generation quality or speed.
42
+
The [`DiffusionPipeline`] packages all these components into a single class for inference. There are several arguments in [`~DiffusionPipeline.__call__`] you can change, such as `num_inference_steps`, that affect the diffusion process. Try different values and arguments to see how they change generation quality or speed.
43
43
44
44
Load a model with [`~DiffusionPipeline.from_pretrained`] and describe what you'd like to generate. The example below uses the default argument values.
45
45
@@ -53,7 +53,7 @@ import torch
53
53
from diffusers import DiffusionPipeline
54
54
55
55
pipeline = DiffusionPipeline.from_pretrained(
56
-
"black-forest-labs/FLUX.1-dev",
56
+
"Qwen/Qwen-Image",
57
57
torch_dtype=torch.bfloat16
58
58
).to("cuda")
59
59
@@ -64,83 +64,67 @@ highly detailed, high budget hollywood movie, cinemascope, moody, epic, gorgeous
Adapters insert a small number of trainable parameters to the original base model. Only the inserted parameters are fine-tuned while the rest of the model weights remain frozen. This makes it fast and cheap to fine-tune a model on a new style. Among adapters, [LoRA's](./tutorials/using_peft_for_inference) are the most popular.
119
105
120
-
Add a LoRA to a pipeline with the [`~loaders.FluxLoraLoaderMixin.load_lora_weights`] method. Some LoRA's require a special word to trigger it, such as `GHIBSKY style`, in the example below. Check a LoRA's model card to see if it requires a trigger word.
106
+
Add a LoRA to a pipeline with the [`~loaders.QwenImageLoraLoaderMixin.load_lora_weights`] method. Some LoRA's require a special word to trigger it, such as `Realism`, in the example below. Check a LoRA's model card to see if it requires a trigger word.
0 commit comments