Skip to content

Commit afc04c0

Browse files
committed
quantization
1 parent 5ed761a commit afc04c0

File tree

1 file changed

+48
-4
lines changed

1 file changed

+48
-4
lines changed

docs/source/en/api/pipelines/mochi.md

Lines changed: 48 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -15,15 +15,59 @@
1515

1616
# Mochi
1717

18-
[Mochi 1 Preview](https://huggingface.co/genmo/mochi-1-preview) from Genmo.
18+
> [!TIP]
19+
> Only a research preview of the model weights is available at the moment.
20+
21+
[Mochi 1](https://huggingface.co/genmo/mochi-1-preview) is a video generation model by Genmo with a strong focus on prompt adherence and motion quality. The model features a 10B parameter Asmmetric Diffusion Transformer (AsymmDiT) architecture, and uses non-square QKV and output projection layers to reduce inference memory requirements. A single T5-XXL model is used to encode prompts.
1922

2023
*Mochi 1 preview is an open state-of-the-art video generation model with high-fidelity motion and strong prompt adherence in preliminary evaluation. This model dramatically closes the gap between closed and open video generation systems. The model is released under a permissive Apache 2.0 license.*
2124

22-
<Tip>
25+
> [!TIP]
26+
> Make sure to check out the Schedulers [guide](../../using-diffusers/schedulers) to learn how to explore the tradeoff between scheduler speed and quality, and see the [reuse components across pipelines](../../using-diffusers/loading#reuse-a-pipeline) section to learn how to efficiently load the same components into multiple pipelines.
27+
28+
## Quantization
29+
30+
Quantization helps reduce the memory requirements of very large models by storing model weights in a lower precision data type. Refer to the [Quantization](../../quantization/overview) to learn more about supported quantization backends and selecting a quantization backend that supports your use case.
31+
32+
The example below demonstrates how to load a quantized [`MochiPipeline`] for inference with bitsandbytes.
33+
34+
```py
35+
import torch
36+
from diffusers import BitsAndBytesConfig as DiffusersBitsAndBytesConfig, MochiTransformer3DModel, MochiPipeline
37+
from diffusers.utils import export_to_video
38+
from transformers import BitsAndBytesConfig as BitsAndBytesConfig, T5EncoderModel
39+
40+
quant_config = BitsAndBytesConfig(load_in_8bit=True)
41+
text_encoder_8bit = T5EncoderModel.from_pretrained(
42+
"genmo/mochi-1-preview",
43+
subfolder="text_encoder2",
44+
quantization_config=quant_config,
45+
torch_dtype=torch.float16,
46+
)
47+
48+
quant_config = DiffusersBitsAndBytesConfig(load_in_8bit=True)
49+
transformer_8bit = MochiTransformer3DModel.from_pretrained(
50+
"genmo/mochi-1-preview",
51+
subfolder="transformer",
52+
quantization_config=quant_config,
53+
torch_dtype=torch.float16,
54+
)
2355

24-
Make sure to check out the Schedulers [guide](../../using-diffusers/schedulers.md) to learn how to explore the tradeoff between scheduler speed and quality, and see the [reuse components across pipelines](../../using-diffusers/loading.md#reuse-a-pipeline) section to learn how to efficiently load the same components into multiple pipelines.
56+
pipeline = MochiPipeline.from_pretrained(
57+
"genmo/mochi-1-preview",
58+
text_encoder=text_encoder_8bit,
59+
transformer=transformer_8bit,
60+
torch_dtype=torch.float16,
61+
device_map="balanced",
62+
)
2563

26-
</Tip>
64+
frames = pipeline(
65+
"Close-up of a cats eye, with the galaxy reflected in the cats eye. Ultra high resolution 4k.",
66+
num_inference_steps=28,
67+
guidance_scale=3.5
68+
).frames[0]
69+
export_to_video(frames, "cat.mp4")
70+
```
2771

2872
## MochiPipeline
2973

0 commit comments

Comments
 (0)