|
15 | 15 |
|
16 | 16 | # Mochi 1 Preview |
17 | 17 |
|
18 | | -[Mochi 1 Preview](https://huggingface.co/genmo/mochi-1-preview) from Genmo. |
| 18 | +> [!TIP] |
| 19 | +> Only a research preview of the model weights is available at the moment. |
| 20 | +
|
| 21 | +[Mochi 1](https://huggingface.co/genmo/mochi-1-preview) is a video generation model by Genmo with a strong focus on prompt adherence and motion quality. The model features a 10B parameter Asmmetric Diffusion Transformer (AsymmDiT) architecture, and uses non-square QKV and output projection layers to reduce inference memory requirements. A single T5-XXL model is used to encode prompts. |
19 | 22 |
|
20 | 23 | *Mochi 1 preview is an open state-of-the-art video generation model with high-fidelity motion and strong prompt adherence in preliminary evaluation. This model dramatically closes the gap between closed and open video generation systems. The model is released under a permissive Apache 2.0 license.* |
21 | 24 |
|
22 | | -<Tip> |
| 25 | +> [!TIP] |
| 26 | +> Make sure to check out the Schedulers [guide](../../using-diffusers/schedulers) to learn how to explore the tradeoff between scheduler speed and quality, and see the [reuse components across pipelines](../../using-diffusers/loading#reuse-a-pipeline) section to learn how to efficiently load the same components into multiple pipelines. |
23 | 27 |
|
24 | | -Make sure to check out the Schedulers [guide](../../using-diffusers/schedulers.md) to learn how to explore the tradeoff between scheduler speed and quality, and see the [reuse components across pipelines](../../using-diffusers/loading.md#reuse-a-pipeline) section to learn how to efficiently load the same components into multiple pipelines. |
| 28 | +## Quantization |
25 | 29 |
|
26 | | -</Tip> |
| 30 | +Quantization helps reduce the memory requirements of very large models by storing model weights in a lower precision data type. Refer to the [Quantization](../../quantization/overview) to learn more about supported quantization backends and selecting a quantization backend that supports your use case. |
| 31 | + |
| 32 | +The example below demonstrates how to load a quantized [`MochiPipeline`] for inference with bitsandbytes. |
| 33 | + |
| 34 | +```py |
| 35 | +import torch |
| 36 | +from diffusers import BitsAndBytesConfig as DiffusersBitsAndBytesConfig, MochiTransformer3DModel, MochiPipeline |
| 37 | +from diffusers.utils import export_to_video |
| 38 | +from transformers import BitsAndBytesConfig as BitsAndBytesConfig, T5EncoderModel |
| 39 | + |
| 40 | +quant_config = BitsAndBytesConfig(load_in_8bit=True) |
| 41 | +text_encoder_8bit = T5EncoderModel.from_pretrained( |
| 42 | + "genmo/mochi-1-preview", |
| 43 | + subfolder="text_encoder2", |
| 44 | + quantization_config=quant_config, |
| 45 | + torch_dtype=torch.float16, |
| 46 | +) |
| 47 | + |
| 48 | +quant_config = DiffusersBitsAndBytesConfig(load_in_8bit=True) |
| 49 | +transformer_8bit = MochiTransformer3DModel.from_pretrained( |
| 50 | + "genmo/mochi-1-preview", |
| 51 | + subfolder="transformer", |
| 52 | + quantization_config=quant_config, |
| 53 | + torch_dtype=torch.float16, |
| 54 | +) |
| 55 | + |
| 56 | +pipeline = MochiPipeline.from_pretrained( |
| 57 | + "genmo/mochi-1-preview", |
| 58 | + text_encoder=text_encoder_8bit, |
| 59 | + transformer=transformer_8bit, |
| 60 | + torch_dtype=torch.float16, |
| 61 | + device_map="balanced", |
| 62 | +) |
| 63 | + |
| 64 | +frames = pipeline( |
| 65 | + "Close-up of a cats eye, with the galaxy reflected in the cats eye. Ultra high resolution 4k.", |
| 66 | + num_inference_steps=28, |
| 67 | + guidance_scale=3.5 |
| 68 | +).frames[0] |
| 69 | +export_to_video(frames, "cat.mp4") |
| 70 | +``` |
27 | 71 |
|
28 | 72 | ## Generating videos with Mochi-1 Preview |
29 | 73 |
|
|
0 commit comments