Skip to content

Commit dd40011

Browse files
committed
update
1 parent 25aea7d commit dd40011

File tree

1 file changed

+2
-3
lines changed

1 file changed

+2
-3
lines changed

docs/source/en/api/pipelines/mochi.md

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -74,14 +74,13 @@ export_to_video(frames, "mochi.mp4", fps=30)
7474
The [Genmo Mochi implementation](https://github.com/genmoai/mochi/tree/main) uses different precision values for each stage in the inference process. The text encoder and VAE use `torch.float32`, while the DiT uses `torch.bfloat16` with the [attention kernel](https://pytorch.org/docs/stable/generated/torch.nn.attention.sdpa_kernel.html#torch.nn.attention.sdpa_kernel) set to `EFFICIENT_ATTENTION`. Diffusers pipelines currently do not support setting different `dtypes` for different stages of the pipeline. In order to run inference in the same way as the the original implementation, please refer to the following example.
7575

7676
<Tip>
77-
THe original Mochi implementation zeros out empty prompts. However, enabling this option and placing the entire pipeline under autocast can lead to numerical overflows with the T5 text encoder.
77+
The original Mochi implementation zeros out empty prompts. However, enabling this option and placing the entire pipeline under autocast can lead to numerical overflows with the T5 text encoder.
7878

7979
When enabling `force_zeros_for_empty_prompt`, it is recommended to run the text encoding step outside the autocast context in full precision.
8080
</Tip>
8181

8282
<Tip>
83-
Decoding the latents in full precision is very memory intensive. You will need at least 70GB VRAM to generate the 163 frames
84-
in this example. To reduce memory, either reduce the number of frames or run the decoding step in `torch.bfloat16`
83+
Decoding the latents in full precision is very memory intensive. You will need at least 70GB VRAM to generate the 163 frames in this example. To reduce memory, either reduce the number of frames or run the decoding step in `torch.bfloat16`.
8584
</Tip>
8685

8786
```python

0 commit comments

Comments
 (0)