more pipelines

stevhliu · stevhliu · commit c6f40163e7c1 · 2024-12-20T08:28:14.000-08:00
diff --git a/docs/source/en/api/pipelines/allegro.md b/docs/source/en/api/pipelines/allegro.md
@@ -23,6 +23,51 @@ Make sure to check out the Schedulers [guide](../../using-diffusers/schedulers.m
 
 </Tip>
 
+## Quantization
+
+Quantization helps reduce the memory requirements of very large models by storing model weights in a lower precision data type. However, quantization may have varying impact on video quality depending on the video model.
+
+Refer to the [Quantization](../../quantization/overview) overview to learn more about supported quantization backends and selecting a quantization backend that supports your use case. The example below demonstrates how to load a quantized [`AllegroPipeline`] for inference with bitsandbytes.
+
+```py
+import torch
+from diffusers import BitsAndBytesConfig as DiffusersBitsAndBytesConfig, AllegroTransformer3DModel, AllegroPipeline
+from diffusers.utils import export_to_video
+from transformers import BitsAndBytesConfig as BitsAndBytesConfig, T5EncoderModel
+
+quant_config = BitsAndBytesConfig(load_in_8bit=True)
+text_encoder_8bit = T5EncoderModel.from_pretrained(
+    "rhymes-ai/Allegro",
+    subfolder="text_encoder",
+    quantization_config=quant_config,
+    torch_dtype=torch.float16,
+)
+
+quant_config = DiffusersBitsAndBytesConfig(load_in_8bit=True)
+transformer_8bit = AllegroTransformer3DModel.from_pretrained(
+    "rhymes-ai/Allegro",
+    subfolder="transformer",
+    quantization_config=quant_config,
+    torch_dtype=torch.float16,
+)
+
+pipeline = AllegroPipeline.from_pretrained(
+    "rhymes-ai/Allegro",
+    text_encoder=text_encoder_8bit,
+    transformer=transformer_8bit,
+    torch_dtype=torch.float16,
+    device_map="balanced",
+)
+
+prompt = (
+    "A seaside harbor with bright sunlight and sparkling seawater, with many boats in the water. From an aerial view, "
+    "the boats vary in size and color, some moving and some stationary. Fishing boats in the water suggest that this "
+    "location might be a popular spot for docking fishing boats."
+)
+video = pipeline(prompt, guidance_scale=7.5, max_sequence_length=512).frames[0]
+export_to_video(video, "harbor.mp4", fps=15)
+```
+
 ## AllegroPipeline
 
 [[autodoc]] AllegroPipeline
diff --git a/docs/source/en/api/pipelines/aura_flow.md b/docs/source/en/api/pipelines/aura_flow.md
@@ -26,12 +26,11 @@ AuraFlow can be quite expensive to run on consumer hardware devices. However, yo
 
 Quantization helps reduce the memory requirements of very large models by storing model weights in a lower precision data type. However, quantization may have varying impact on video quality depending on the video model.
 
-Refer to the [Quantization](../../quantization/overview) to learn more about supported quantization backends (bitsandbytes, torchao, gguf) and selecting a quantization backend that supports your use case. The example below demonstrates how to load a quantized [`AuraFlowPipeline`] for inference with bitsandbytes.
+Refer to the [Quantization](../../quantization/overview) overview to learn more about supported quantization backends and selecting a quantization backend that supports your use case. The example below demonstrates how to load a quantized [`AuraFlowPipeline`] for inference with bitsandbytes.
 
 ```py
 import torch
 from diffusers import BitsAndBytesConfig as DiffusersBitsAndBytesConfig, AuraFlowTransformer2DModel, AuraFlowPipeline
-from diffusers.utils import export_to_video
 from transformers import BitsAndBytesConfig as BitsAndBytesConfig, T5EncoderModel
 
 quant_config = BitsAndBytesConfig(load_in_8bit=True)
@@ -58,7 +57,7 @@ pipeline = AuraFlowPipeline.from_pretrained(
     device_map="balanced",
 )
 
-prompt = "A refreshing scene where a glass of freshly squeezed orange juice stands prominently at the center, bathed in warm, golden sunlight that highlights the vibrant, citrus hues of the juice. The glass is intricately detailed, showing condensation droplets that glisten like tiny jewels. Surrounding the base of the glass, scattered orange slices and lush green leaves add a touch of natural beauty and freshness. Above the glass, a dynamic splash of orange juice is captured mid-air, forming the word 'Orange' in a fluid, playful script. The splash is so vivid and realistic that each droplet seems to dance in the air, creating a sense of movement and energy. In the background, a serene orchard with rows of orange trees stretches out under a clear blue sky, their branches heavy with ripe oranges ready for harvest. Rays of sunlight filter through the leaves, casting dappled shadows on the ground. A gentle breeze rustles the leaves, adding a sense of calm and tranquility to the scene. The entire scene evokes a sense of purity, freshness, and vitality, inviting viewers to experience the simple joy of a glass of fresh orange juice."
+prompt = "a tiny astronaut hatching from an egg on the moon"
 image = pipeline(prompt).images[0]
 image.save("auraflow.png")
 ```
diff --git a/docs/source/en/api/pipelines/cogvideox.md b/docs/source/en/api/pipelines/cogvideox.md
@@ -116,7 +116,7 @@ CogVideoX-2b requires about 19 GB of GPU memory to decode 49 frames (6 seconds o
 
 Quantization helps reduce the memory requirements of very large models by storing model weights in a lower precision data type. However, quantization may have varying impact on video quality depending on the video model.
 
-Refer to the [Quantization](../../quantization/overview) to learn more about supported quantization backends (bitsandbytes, torchao, gguf) and selecting a quantization backend that supports your use case. The example below demonstrates how to load a quantized [`CogVideoXPipeline`] for inference with bitsandbytes.
+Refer to the [Quantization](../../quantization/overview) overview to learn more about supported quantization backends and selecting a quantization backend that supports your use case. The example below demonstrates how to load a quantized [`CogVideoXPipeline`] for inference with bitsandbytes.
 
 ```py
 import torch
diff --git a/docs/source/en/api/pipelines/flux.md b/docs/source/en/api/pipelines/flux.md
@@ -338,12 +338,11 @@ out.save("image.png")
 
 Quantization helps reduce the memory requirements of very large models by storing model weights in a lower precision data type. However, quantization may have varying impact on video quality depending on the video model.
 
-Refer to the [Quantization](../../quantization/overview) to learn more about supported quantization backends (bitsandbytes, torchao, gguf) and selecting a quantization backend that supports your use case. The example below demonstrates how to load a quantized [`FluxPipeline`] for inference with bitsandbytes.
+Refer to the [Quantization](../../quantization/overview) overview to learn more about supported quantization backends and selecting a quantization backend that supports your use case. The example below demonstrates how to load a quantized [`FluxPipeline`] for inference with bitsandbytes.
 
 ```py
 import torch
 from diffusers import BitsAndBytesConfig as DiffusersBitsAndBytesConfig, FluxTransformer2DModel, FluxPipeline
-from diffusers.utils import export_to_video
 from transformers import BitsAndBytesConfig as BitsAndBytesConfig, T5EncoderModel
 
 quant_config = BitsAndBytesConfig(load_in_8bit=True)
@@ -370,8 +369,7 @@ pipeline = FluxPipeline.from_pretrained(
     device_map="balanced",
 )
 
-prompt = "A refreshing scene where a glass of freshly squeezed orange juice stands prominently at the center, bathed in warm, golden sunlight that highlights the vibrant, citrus hues of the juice. The glass is intricately detailed, showing condensation droplets that glisten like tiny jewels. Surrounding the base of the glass, scattered orange slices and lush green leaves add a touch of natural beauty and freshness. Above the glass, a dynamic splash of orange juice is captured mid-air, forming the word 'Orange' in a fluid, playful script. The splash is so vivid and realistic that each droplet seems to dance in the air, creating a sense of movement and energy. In the background, a serene orchard with rows of orange trees stretches out under a clear blue sky, their branches heavy with ripe oranges ready for harvest. Rays of sunlight filter through the leaves, casting dappled shadows on the ground. A gentle breeze rustles the leaves, adding a sense of calm and tranquility to the scene. The entire scene evokes a sense of purity, freshness, and vitality, inviting viewers to experience the simple joy of a glass of fresh orange juice."
-
+prompt = "a tiny astronaut hatching from an egg on the moon"
 image = pipeline(prompt, guidance_scale=3.5, height=768, width=1360, num_inference_steps=50).images[0]
 image.save("flux.png")
 ```
diff --git a/docs/source/en/api/pipelines/hunyuan_video.md b/docs/source/en/api/pipelines/hunyuan_video.md
@@ -36,7 +36,7 @@ Recommendations for inference:
 
 Quantization helps reduce the memory requirements of very large models by storing model weights in a lower precision data type. However, quantization may have varying impact on video quality depending on the video model.
 
-Refer to the [Quantization](../../quantization/overview) to learn more about supported quantization backends (bitsandbytes, torchao, gguf) and selecting a quantization backend that supports your use case. The example below demonstrates how to load a quantized [`HunyuanVideoPipeline`] for inference with bitsandbytes.
+Refer to the [Quantization](../../quantization/overview) overview to learn more about supported quantization backends and selecting a quantization backend that supports your use case. The example below demonstrates how to load a quantized [`HunyuanVideoPipeline`] for inference with bitsandbytes.
 
 ```py
 import torch
diff --git a/docs/source/en/api/pipelines/latte.md b/docs/source/en/api/pipelines/latte.md
@@ -70,6 +70,47 @@ Without torch.compile(): Average inference time: 16.246 seconds.
 With torch.compile(): Average inference time: 14.573 seconds.
 ```
 
+## Quantization
+
+Quantization helps reduce the memory requirements of very large models by storing model weights in a lower precision data type. However, quantization may have varying impact on video quality depending on the video model.
+
+Refer to the [Quantization](../../quantization/overview) overview to learn more about supported quantization backends and selecting a quantization backend that supports your use case. The example below demonstrates how to load a quantized [`LattePipeline`] for inference with bitsandbytes.
+
+```py
+import torch
+from diffusers import BitsAndBytesConfig as DiffusersBitsAndBytesConfig, LatteTransformer3DModel, LattePipeline
+from diffusers.utils import export_to_gif
+from transformers import BitsAndBytesConfig as BitsAndBytesConfig, T5EncoderModel
+
+quant_config = BitsAndBytesConfig(load_in_8bit=True)
+text_encoder_8bit = T5EncoderModel.from_pretrained(
+    "maxin-cn/Latte-1",
+    subfolder="text_encoder",
+    quantization_config=quant_config,
+    torch_dtype=torch.float16,
+)
+
+quant_config = DiffusersBitsAndBytesConfig(load_in_8bit=True)
+transformer_8bit = LatteTransformer3DModel.from_pretrained(
+    "maxin-cn/Latte-1",
+    subfolder="transformer",
+    quantization_config=quant_config,
+    torch_dtype=torch.float16,
+)
+
+pipeline = LattePipeline.from_pretrained(
+    "maxin-cn/Latte-1",
+    text_encoder=text_encoder_8bit,
+    transformer=transformer_8bit,
+    torch_dtype=torch.float16,
+    device_map="balanced",
+)
+
+prompt = "A small cactus with a happy face in the Sahara desert."
+video = pipeline(prompt).frames[0]
+export_to_gif(video, "latte.gif")
+```
+
 ## LattePipeline
 
 [[autodoc]] LattePipeline
diff --git a/docs/source/en/api/pipelines/ltx_video.md b/docs/source/en/api/pipelines/ltx_video.md
@@ -101,6 +101,47 @@ Make sure to read the [documentation on GGUF](../../quantization/gguf) to learn
 
 Refer to [this section](https://huggingface.co/docs/diffusers/main/en/api/pipelines/cogvideox#memory-optimization) to learn more about optimizing memory consumption.
 
+## Quantization
+
+Quantization helps reduce the memory requirements of very large models by storing model weights in a lower precision data type. However, quantization may have varying impact on video quality depending on the video model.
+
+Refer to the [Quantization](../../quantization/overview) overview to learn more about supported quantization backends and selecting a quantization backend that supports your use case. The example below demonstrates how to load a quantized [`LTXPipeline`] for inference with bitsandbytes.
+
+```py
+import torch
+from diffusers import BitsAndBytesConfig as DiffusersBitsAndBytesConfig, LTXVideoTransformer3DModel, LTXPipeline
+from diffusers.utils import export_to_video
+from transformers import BitsAndBytesConfig as BitsAndBytesConfig, T5EncoderModel
+
+quant_config = BitsAndBytesConfig(load_in_8bit=True)
+text_encoder_8bit = T5EncoderModel.from_pretrained(
+    "Lightricks/LTX-Video",
+    subfolder="text_encoder",
+    quantization_config=quant_config,
+    torch_dtype=torch.float16,
+)
+
+quant_config = DiffusersBitsAndBytesConfig(load_in_8bit=True)
+transformer_8bit = LTXVideoTransformer3DModel.from_pretrained(
+    "Lightricks/LTX-Video",
+    subfolder="transformer",
+    quantization_config=quant_config,
+    torch_dtype=torch.float16,
+)
+
+pipeline = LTXPipeline.from_pretrained(
+    "Lightricks/LTX-Video",
+    text_encoder=text_encoder_8bit,
+    transformer=transformer_8bit,
+    torch_dtype=torch.float16,
+    device_map="balanced",
+)
+
+prompt = "A detailed wooden toy ship with intricately carved masts and sails is seen gliding smoothly over a plush, blue carpet that mimics the waves of the sea. The ship's hull is painted a rich brown, with tiny windows. The carpet, soft and textured, provides a perfect backdrop, resembling an oceanic expanse. Surrounding the ship are various other toys and children's items, hinting at a playful environment. The scene captures the innocence and imagination of childhood, with the toy ship's journey symbolizing endless adventures in a whimsical, indoor setting."
+video = pipeline(prompt=prompt, num_frames=161, num_inference_steps=50).frames[0]
+export_to_video(video, "ship.mp4", fps=24)
+```
+
 ## LTXPipeline
 
 [[autodoc]] LTXPipeline
diff --git a/docs/source/en/api/pipelines/lumina.md b/docs/source/en/api/pipelines/lumina.md
@@ -82,6 +82,46 @@ pipeline.vae.decode = torch.compile(pipeline.vae.decode, mode="max-autotune", fu
 image = pipeline(prompt="Upper body of a young woman in a Victorian-era outfit with brass goggles and leather straps. Background shows an industrial revolution cityscape with smoky skies and tall, metal structures").images[0]
 ```
 
+## Quantization
+
+Quantization helps reduce the memory requirements of very large models by storing model weights in a lower precision data type. However, quantization may have varying impact on video quality depending on the video model.
+
+Refer to the [Quantization](../../quantization/overview) overview to learn more about supported quantization backends and selecting a quantization backend that supports your use case. The example below demonstrates how to load a quantized [`LuminaText2ImgPipeline`] for inference with bitsandbytes.
+
+```py
+import torch
+from diffusers import BitsAndBytesConfig as DiffusersBitsAndBytesConfig, Transformer2DModel, LuminaText2ImgPipeline
+from transformers import BitsAndBytesConfig as BitsAndBytesConfig, T5EncoderModel
+
+quant_config = BitsAndBytesConfig(load_in_8bit=True)
+text_encoder_8bit = T5EncoderModel.from_pretrained(
+    "Alpha-VLLM/Lumina-Next-SFT-diffusers",
+    subfolder="text_encoder",
+    quantization_config=quant_config,
+    torch_dtype=torch.float16,
+)
+
+quant_config = DiffusersBitsAndBytesConfig(load_in_8bit=True)
+transformer_8bit = Transformer2DModel.from_pretrained(
+    "Alpha-VLLM/Lumina-Next-SFT-diffusers",
+    subfolder="transformer",
+    quantization_config=quant_config,
+    torch_dtype=torch.float16,
+)
+
+pipeline = LuminaText2ImgPipeline.from_pretrained(
+    "Alpha-VLLM/Lumina-Next-SFT-diffusers",
+    text_encoder=text_encoder_8bit,
+    transformer=transformer_8bit,
+    torch_dtype=torch.float16,
+    device_map="balanced",
+)
+
+prompt = "a tiny astronaut hatching from an egg on the moon"
+image = pipeline(prompt).images[0]
+image.save("lumina.png")
+```
+
 ## LuminaText2ImgPipeline
 
 [[autodoc]] LuminaText2ImgPipeline
diff --git a/docs/source/en/api/pipelines/mochi.md b/docs/source/en/api/pipelines/mochi.md
@@ -29,7 +29,7 @@
 
 Quantization helps reduce the memory requirements of very large models by storing model weights in a lower precision data type. However, quantization may have varying impact on video quality depending on the video model.
 
-Refer to the [Quantization](../../quantization/overview) to learn more about supported quantization backends (bitsandbytes, torchao, gguf) and selecting a quantization backend that supports your use case. The example below demonstrates how to load a quantized [`MochiPipeline`] for inference with bitsandbytes.
+Refer to the [Quantization](../../quantization/overview) overview to learn more about supported quantization backends and selecting a quantization backend that supports your use case. The example below demonstrates how to load a quantized [`MochiPipeline`] for inference with bitsandbytes.
 
 ```py
 import torch
diff --git a/docs/source/en/api/pipelines/sana.md b/docs/source/en/api/pipelines/sana.md
@@ -50,6 +50,46 @@ Make sure to pass the `variant` argument for downloaded checkpoints to use lower
 
 </Tip>
 
+## Quantization
+
+Quantization helps reduce the memory requirements of very large models by storing model weights in a lower precision data type. However, quantization may have varying impact on video quality depending on the video model.
+
+Refer to the [Quantization](../../quantization/overview) overview to learn more about supported quantization backends and selecting a quantization backend that supports your use case. The example below demonstrates how to load a quantized [`SanaPipeline`] for inference with bitsandbytes.
+
+```py
+import torch
+from diffusers import BitsAndBytesConfig as DiffusersBitsAndBytesConfig, SanaTransformer2DModel, SanaPipeline
+from transformers import BitsAndBytesConfig as BitsAndBytesConfig, AutoModelForCausalLM
+
+quant_config = BitsAndBytesConfig(load_in_8bit=True)
+text_encoder_8bit = AutoModelForCausalLM.from_pretrained(
+    "Efficient-Large-Model/Sana_1600M_1024px_diffusers",
+    subfolder="text_encoder",
+    quantization_config=quant_config,
+    torch_dtype=torch.float16,
+)
+
+quant_config = DiffusersBitsAndBytesConfig(load_in_8bit=True)
+transformer_8bit = SanaTransformer2DModel.from_pretrained(
+    "Efficient-Large-Model/Sana_1600M_1024px_diffusers",
+    subfolder="transformer",
+    quantization_config=quant_config,
+    torch_dtype=torch.float16,
+)
+
+pipeline = SanaPipeline.from_pretrained(
+    "Efficient-Large-Model/Sana_1600M_1024px_diffusers",
+    text_encoder=text_encoder_8bit,
+    transformer=transformer_8bit,
+    torch_dtype=torch.float16,
+    device_map="balanced",
+)
+
+prompt = "a tiny astronaut hatching from an egg on the moon"
+image = pipeline(prompt).images[0]
+image.save("sana.png")
+```
+
 ## SanaPipeline
 
 [[autodoc]] SanaPipeline
diff --git a/docs/source/en/api/pipelines/stable_audio.md b/docs/source/en/api/pipelines/stable_audio.md
diff --git a/docs/source/en/api/pipelines/stable_diffusion/stable_diffusion_3.md b/docs/source/en/api/pipelines/stable_diffusion/stable_diffusion_3.md