Skip to content

Commit c6f4016

Browse files
committed
more pipelines
1 parent b76aba1 commit c6f4016

File tree

12 files changed

+267
-13
lines changed

12 files changed

+267
-13
lines changed

docs/source/en/api/pipelines/allegro.md

Lines changed: 45 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -23,6 +23,51 @@ Make sure to check out the Schedulers [guide](../../using-diffusers/schedulers.m
2323

2424
</Tip>
2525

26+
## Quantization
27+
28+
Quantization helps reduce the memory requirements of very large models by storing model weights in a lower precision data type. However, quantization may have varying impact on video quality depending on the video model.
29+
30+
Refer to the [Quantization](../../quantization/overview) overview to learn more about supported quantization backends and selecting a quantization backend that supports your use case. The example below demonstrates how to load a quantized [`AllegroPipeline`] for inference with bitsandbytes.
31+
32+
```py
33+
import torch
34+
from diffusers import BitsAndBytesConfig as DiffusersBitsAndBytesConfig, AllegroTransformer3DModel, AllegroPipeline
35+
from diffusers.utils import export_to_video
36+
from transformers import BitsAndBytesConfig as BitsAndBytesConfig, T5EncoderModel
37+
38+
quant_config = BitsAndBytesConfig(load_in_8bit=True)
39+
text_encoder_8bit = T5EncoderModel.from_pretrained(
40+
"rhymes-ai/Allegro",
41+
subfolder="text_encoder",
42+
quantization_config=quant_config,
43+
torch_dtype=torch.float16,
44+
)
45+
46+
quant_config = DiffusersBitsAndBytesConfig(load_in_8bit=True)
47+
transformer_8bit = AllegroTransformer3DModel.from_pretrained(
48+
"rhymes-ai/Allegro",
49+
subfolder="transformer",
50+
quantization_config=quant_config,
51+
torch_dtype=torch.float16,
52+
)
53+
54+
pipeline = AllegroPipeline.from_pretrained(
55+
"rhymes-ai/Allegro",
56+
text_encoder=text_encoder_8bit,
57+
transformer=transformer_8bit,
58+
torch_dtype=torch.float16,
59+
device_map="balanced",
60+
)
61+
62+
prompt = (
63+
"A seaside harbor with bright sunlight and sparkling seawater, with many boats in the water. From an aerial view, "
64+
"the boats vary in size and color, some moving and some stationary. Fishing boats in the water suggest that this "
65+
"location might be a popular spot for docking fishing boats."
66+
)
67+
video = pipeline(prompt, guidance_scale=7.5, max_sequence_length=512).frames[0]
68+
export_to_video(video, "harbor.mp4", fps=15)
69+
```
70+
2671
## AllegroPipeline
2772

2873
[[autodoc]] AllegroPipeline

docs/source/en/api/pipelines/aura_flow.md

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -26,12 +26,11 @@ AuraFlow can be quite expensive to run on consumer hardware devices. However, yo
2626

2727
Quantization helps reduce the memory requirements of very large models by storing model weights in a lower precision data type. However, quantization may have varying impact on video quality depending on the video model.
2828

29-
Refer to the [Quantization](../../quantization/overview) to learn more about supported quantization backends (bitsandbytes, torchao, gguf) and selecting a quantization backend that supports your use case. The example below demonstrates how to load a quantized [`AuraFlowPipeline`] for inference with bitsandbytes.
29+
Refer to the [Quantization](../../quantization/overview) overview to learn more about supported quantization backends and selecting a quantization backend that supports your use case. The example below demonstrates how to load a quantized [`AuraFlowPipeline`] for inference with bitsandbytes.
3030

3131
```py
3232
import torch
3333
from diffusers import BitsAndBytesConfig as DiffusersBitsAndBytesConfig, AuraFlowTransformer2DModel, AuraFlowPipeline
34-
from diffusers.utils import export_to_video
3534
from transformers import BitsAndBytesConfig as BitsAndBytesConfig, T5EncoderModel
3635

3736
quant_config = BitsAndBytesConfig(load_in_8bit=True)
@@ -58,7 +57,7 @@ pipeline = AuraFlowPipeline.from_pretrained(
5857
device_map="balanced",
5958
)
6059

61-
prompt = "A refreshing scene where a glass of freshly squeezed orange juice stands prominently at the center, bathed in warm, golden sunlight that highlights the vibrant, citrus hues of the juice. The glass is intricately detailed, showing condensation droplets that glisten like tiny jewels. Surrounding the base of the glass, scattered orange slices and lush green leaves add a touch of natural beauty and freshness. Above the glass, a dynamic splash of orange juice is captured mid-air, forming the word 'Orange' in a fluid, playful script. The splash is so vivid and realistic that each droplet seems to dance in the air, creating a sense of movement and energy. In the background, a serene orchard with rows of orange trees stretches out under a clear blue sky, their branches heavy with ripe oranges ready for harvest. Rays of sunlight filter through the leaves, casting dappled shadows on the ground. A gentle breeze rustles the leaves, adding a sense of calm and tranquility to the scene. The entire scene evokes a sense of purity, freshness, and vitality, inviting viewers to experience the simple joy of a glass of fresh orange juice."
60+
prompt = "a tiny astronaut hatching from an egg on the moon"
6261
image = pipeline(prompt).images[0]
6362
image.save("auraflow.png")
6463
```

docs/source/en/api/pipelines/cogvideox.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -116,7 +116,7 @@ CogVideoX-2b requires about 19 GB of GPU memory to decode 49 frames (6 seconds o
116116

117117
Quantization helps reduce the memory requirements of very large models by storing model weights in a lower precision data type. However, quantization may have varying impact on video quality depending on the video model.
118118

119-
Refer to the [Quantization](../../quantization/overview) to learn more about supported quantization backends (bitsandbytes, torchao, gguf) and selecting a quantization backend that supports your use case. The example below demonstrates how to load a quantized [`CogVideoXPipeline`] for inference with bitsandbytes.
119+
Refer to the [Quantization](../../quantization/overview) overview to learn more about supported quantization backends and selecting a quantization backend that supports your use case. The example below demonstrates how to load a quantized [`CogVideoXPipeline`] for inference with bitsandbytes.
120120

121121
```py
122122
import torch

docs/source/en/api/pipelines/flux.md

Lines changed: 2 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -338,12 +338,11 @@ out.save("image.png")
338338

339339
Quantization helps reduce the memory requirements of very large models by storing model weights in a lower precision data type. However, quantization may have varying impact on video quality depending on the video model.
340340

341-
Refer to the [Quantization](../../quantization/overview) to learn more about supported quantization backends (bitsandbytes, torchao, gguf) and selecting a quantization backend that supports your use case. The example below demonstrates how to load a quantized [`FluxPipeline`] for inference with bitsandbytes.
341+
Refer to the [Quantization](../../quantization/overview) overview to learn more about supported quantization backends and selecting a quantization backend that supports your use case. The example below demonstrates how to load a quantized [`FluxPipeline`] for inference with bitsandbytes.
342342

343343
```py
344344
import torch
345345
from diffusers import BitsAndBytesConfig as DiffusersBitsAndBytesConfig, FluxTransformer2DModel, FluxPipeline
346-
from diffusers.utils import export_to_video
347346
from transformers import BitsAndBytesConfig as BitsAndBytesConfig, T5EncoderModel
348347

349348
quant_config = BitsAndBytesConfig(load_in_8bit=True)
@@ -370,8 +369,7 @@ pipeline = FluxPipeline.from_pretrained(
370369
device_map="balanced",
371370
)
372371

373-
prompt = "A refreshing scene where a glass of freshly squeezed orange juice stands prominently at the center, bathed in warm, golden sunlight that highlights the vibrant, citrus hues of the juice. The glass is intricately detailed, showing condensation droplets that glisten like tiny jewels. Surrounding the base of the glass, scattered orange slices and lush green leaves add a touch of natural beauty and freshness. Above the glass, a dynamic splash of orange juice is captured mid-air, forming the word 'Orange' in a fluid, playful script. The splash is so vivid and realistic that each droplet seems to dance in the air, creating a sense of movement and energy. In the background, a serene orchard with rows of orange trees stretches out under a clear blue sky, their branches heavy with ripe oranges ready for harvest. Rays of sunlight filter through the leaves, casting dappled shadows on the ground. A gentle breeze rustles the leaves, adding a sense of calm and tranquility to the scene. The entire scene evokes a sense of purity, freshness, and vitality, inviting viewers to experience the simple joy of a glass of fresh orange juice."
374-
372+
prompt = "a tiny astronaut hatching from an egg on the moon"
375373
image = pipeline(prompt, guidance_scale=3.5, height=768, width=1360, num_inference_steps=50).images[0]
376374
image.save("flux.png")
377375
```

docs/source/en/api/pipelines/hunyuan_video.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -36,7 +36,7 @@ Recommendations for inference:
3636

3737
Quantization helps reduce the memory requirements of very large models by storing model weights in a lower precision data type. However, quantization may have varying impact on video quality depending on the video model.
3838

39-
Refer to the [Quantization](../../quantization/overview) to learn more about supported quantization backends (bitsandbytes, torchao, gguf) and selecting a quantization backend that supports your use case. The example below demonstrates how to load a quantized [`HunyuanVideoPipeline`] for inference with bitsandbytes.
39+
Refer to the [Quantization](../../quantization/overview) overview to learn more about supported quantization backends and selecting a quantization backend that supports your use case. The example below demonstrates how to load a quantized [`HunyuanVideoPipeline`] for inference with bitsandbytes.
4040

4141
```py
4242
import torch

docs/source/en/api/pipelines/latte.md

Lines changed: 41 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -70,6 +70,47 @@ Without torch.compile(): Average inference time: 16.246 seconds.
7070
With torch.compile(): Average inference time: 14.573 seconds.
7171
```
7272

73+
## Quantization
74+
75+
Quantization helps reduce the memory requirements of very large models by storing model weights in a lower precision data type. However, quantization may have varying impact on video quality depending on the video model.
76+
77+
Refer to the [Quantization](../../quantization/overview) overview to learn more about supported quantization backends and selecting a quantization backend that supports your use case. The example below demonstrates how to load a quantized [`LattePipeline`] for inference with bitsandbytes.
78+
79+
```py
80+
import torch
81+
from diffusers import BitsAndBytesConfig as DiffusersBitsAndBytesConfig, LatteTransformer3DModel, LattePipeline
82+
from diffusers.utils import export_to_gif
83+
from transformers import BitsAndBytesConfig as BitsAndBytesConfig, T5EncoderModel
84+
85+
quant_config = BitsAndBytesConfig(load_in_8bit=True)
86+
text_encoder_8bit = T5EncoderModel.from_pretrained(
87+
"maxin-cn/Latte-1",
88+
subfolder="text_encoder",
89+
quantization_config=quant_config,
90+
torch_dtype=torch.float16,
91+
)
92+
93+
quant_config = DiffusersBitsAndBytesConfig(load_in_8bit=True)
94+
transformer_8bit = LatteTransformer3DModel.from_pretrained(
95+
"maxin-cn/Latte-1",
96+
subfolder="transformer",
97+
quantization_config=quant_config,
98+
torch_dtype=torch.float16,
99+
)
100+
101+
pipeline = LattePipeline.from_pretrained(
102+
"maxin-cn/Latte-1",
103+
text_encoder=text_encoder_8bit,
104+
transformer=transformer_8bit,
105+
torch_dtype=torch.float16,
106+
device_map="balanced",
107+
)
108+
109+
prompt = "A small cactus with a happy face in the Sahara desert."
110+
video = pipeline(prompt).frames[0]
111+
export_to_gif(video, "latte.gif")
112+
```
113+
73114
## LattePipeline
74115

75116
[[autodoc]] LattePipeline

docs/source/en/api/pipelines/ltx_video.md

Lines changed: 41 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -101,6 +101,47 @@ Make sure to read the [documentation on GGUF](../../quantization/gguf) to learn
101101

102102
Refer to [this section](https://huggingface.co/docs/diffusers/main/en/api/pipelines/cogvideox#memory-optimization) to learn more about optimizing memory consumption.
103103

104+
## Quantization
105+
106+
Quantization helps reduce the memory requirements of very large models by storing model weights in a lower precision data type. However, quantization may have varying impact on video quality depending on the video model.
107+
108+
Refer to the [Quantization](../../quantization/overview) overview to learn more about supported quantization backends and selecting a quantization backend that supports your use case. The example below demonstrates how to load a quantized [`LTXPipeline`] for inference with bitsandbytes.
109+
110+
```py
111+
import torch
112+
from diffusers import BitsAndBytesConfig as DiffusersBitsAndBytesConfig, LTXVideoTransformer3DModel, LTXPipeline
113+
from diffusers.utils import export_to_video
114+
from transformers import BitsAndBytesConfig as BitsAndBytesConfig, T5EncoderModel
115+
116+
quant_config = BitsAndBytesConfig(load_in_8bit=True)
117+
text_encoder_8bit = T5EncoderModel.from_pretrained(
118+
"Lightricks/LTX-Video",
119+
subfolder="text_encoder",
120+
quantization_config=quant_config,
121+
torch_dtype=torch.float16,
122+
)
123+
124+
quant_config = DiffusersBitsAndBytesConfig(load_in_8bit=True)
125+
transformer_8bit = LTXVideoTransformer3DModel.from_pretrained(
126+
"Lightricks/LTX-Video",
127+
subfolder="transformer",
128+
quantization_config=quant_config,
129+
torch_dtype=torch.float16,
130+
)
131+
132+
pipeline = LTXPipeline.from_pretrained(
133+
"Lightricks/LTX-Video",
134+
text_encoder=text_encoder_8bit,
135+
transformer=transformer_8bit,
136+
torch_dtype=torch.float16,
137+
device_map="balanced",
138+
)
139+
140+
prompt = "A detailed wooden toy ship with intricately carved masts and sails is seen gliding smoothly over a plush, blue carpet that mimics the waves of the sea. The ship's hull is painted a rich brown, with tiny windows. The carpet, soft and textured, provides a perfect backdrop, resembling an oceanic expanse. Surrounding the ship are various other toys and children's items, hinting at a playful environment. The scene captures the innocence and imagination of childhood, with the toy ship's journey symbolizing endless adventures in a whimsical, indoor setting."
141+
video = pipeline(prompt=prompt, num_frames=161, num_inference_steps=50).frames[0]
142+
export_to_video(video, "ship.mp4", fps=24)
143+
```
144+
104145
## LTXPipeline
105146

106147
[[autodoc]] LTXPipeline

docs/source/en/api/pipelines/lumina.md

Lines changed: 40 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -82,6 +82,46 @@ pipeline.vae.decode = torch.compile(pipeline.vae.decode, mode="max-autotune", fu
8282
image = pipeline(prompt="Upper body of a young woman in a Victorian-era outfit with brass goggles and leather straps. Background shows an industrial revolution cityscape with smoky skies and tall, metal structures").images[0]
8383
```
8484

85+
## Quantization
86+
87+
Quantization helps reduce the memory requirements of very large models by storing model weights in a lower precision data type. However, quantization may have varying impact on video quality depending on the video model.
88+
89+
Refer to the [Quantization](../../quantization/overview) overview to learn more about supported quantization backends and selecting a quantization backend that supports your use case. The example below demonstrates how to load a quantized [`LuminaText2ImgPipeline`] for inference with bitsandbytes.
90+
91+
```py
92+
import torch
93+
from diffusers import BitsAndBytesConfig as DiffusersBitsAndBytesConfig, Transformer2DModel, LuminaText2ImgPipeline
94+
from transformers import BitsAndBytesConfig as BitsAndBytesConfig, T5EncoderModel
95+
96+
quant_config = BitsAndBytesConfig(load_in_8bit=True)
97+
text_encoder_8bit = T5EncoderModel.from_pretrained(
98+
"Alpha-VLLM/Lumina-Next-SFT-diffusers",
99+
subfolder="text_encoder",
100+
quantization_config=quant_config,
101+
torch_dtype=torch.float16,
102+
)
103+
104+
quant_config = DiffusersBitsAndBytesConfig(load_in_8bit=True)
105+
transformer_8bit = Transformer2DModel.from_pretrained(
106+
"Alpha-VLLM/Lumina-Next-SFT-diffusers",
107+
subfolder="transformer",
108+
quantization_config=quant_config,
109+
torch_dtype=torch.float16,
110+
)
111+
112+
pipeline = LuminaText2ImgPipeline.from_pretrained(
113+
"Alpha-VLLM/Lumina-Next-SFT-diffusers",
114+
text_encoder=text_encoder_8bit,
115+
transformer=transformer_8bit,
116+
torch_dtype=torch.float16,
117+
device_map="balanced",
118+
)
119+
120+
prompt = "a tiny astronaut hatching from an egg on the moon"
121+
image = pipeline(prompt).images[0]
122+
image.save("lumina.png")
123+
```
124+
85125
## LuminaText2ImgPipeline
86126

87127
[[autodoc]] LuminaText2ImgPipeline

docs/source/en/api/pipelines/mochi.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -29,7 +29,7 @@
2929

3030
Quantization helps reduce the memory requirements of very large models by storing model weights in a lower precision data type. However, quantization may have varying impact on video quality depending on the video model.
3131

32-
Refer to the [Quantization](../../quantization/overview) to learn more about supported quantization backends (bitsandbytes, torchao, gguf) and selecting a quantization backend that supports your use case. The example below demonstrates how to load a quantized [`MochiPipeline`] for inference with bitsandbytes.
32+
Refer to the [Quantization](../../quantization/overview) overview to learn more about supported quantization backends and selecting a quantization backend that supports your use case. The example below demonstrates how to load a quantized [`MochiPipeline`] for inference with bitsandbytes.
3333

3434
```py
3535
import torch

docs/source/en/api/pipelines/sana.md

Lines changed: 40 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -50,6 +50,46 @@ Make sure to pass the `variant` argument for downloaded checkpoints to use lower
5050

5151
</Tip>
5252

53+
## Quantization
54+
55+
Quantization helps reduce the memory requirements of very large models by storing model weights in a lower precision data type. However, quantization may have varying impact on video quality depending on the video model.
56+
57+
Refer to the [Quantization](../../quantization/overview) overview to learn more about supported quantization backends and selecting a quantization backend that supports your use case. The example below demonstrates how to load a quantized [`SanaPipeline`] for inference with bitsandbytes.
58+
59+
```py
60+
import torch
61+
from diffusers import BitsAndBytesConfig as DiffusersBitsAndBytesConfig, SanaTransformer2DModel, SanaPipeline
62+
from transformers import BitsAndBytesConfig as BitsAndBytesConfig, AutoModelForCausalLM
63+
64+
quant_config = BitsAndBytesConfig(load_in_8bit=True)
65+
text_encoder_8bit = AutoModelForCausalLM.from_pretrained(
66+
"Efficient-Large-Model/Sana_1600M_1024px_diffusers",
67+
subfolder="text_encoder",
68+
quantization_config=quant_config,
69+
torch_dtype=torch.float16,
70+
)
71+
72+
quant_config = DiffusersBitsAndBytesConfig(load_in_8bit=True)
73+
transformer_8bit = SanaTransformer2DModel.from_pretrained(
74+
"Efficient-Large-Model/Sana_1600M_1024px_diffusers",
75+
subfolder="transformer",
76+
quantization_config=quant_config,
77+
torch_dtype=torch.float16,
78+
)
79+
80+
pipeline = SanaPipeline.from_pretrained(
81+
"Efficient-Large-Model/Sana_1600M_1024px_diffusers",
82+
text_encoder=text_encoder_8bit,
83+
transformer=transformer_8bit,
84+
torch_dtype=torch.float16,
85+
device_map="balanced",
86+
)
87+
88+
prompt = "a tiny astronaut hatching from an egg on the moon"
89+
image = pipeline(prompt).images[0]
90+
image.save("sana.png")
91+
```
92+
5393
## SanaPipeline
5494

5595
[[autodoc]] SanaPipeline

0 commit comments

Comments
 (0)