Skip to content

Commit edc0624

Browse files
chengzeyistevhliu
andauthored
Update docs/source/en/optimization/para_attn.md
Co-authored-by: Steven Liu <[email protected]>
1 parent 3c04cb8 commit edc0624

File tree

1 file changed

+2
-6
lines changed

1 file changed

+2
-6
lines changed

docs/source/en/optimization/para_attn.md

Lines changed: 2 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -274,13 +274,9 @@ print("Saving video to hunyuan_video.mp4")
274274
export_to_video(output, "hunyuan_video.mp4", fps=15)
275275
```
276276

277-
The NVIDIA L20 GPU only has 48GB memory and could face OOM errors after compiling the model and not calling `enable_model_cpu_offload`,
278-
because the HunyuanVideo has very large activation tensors when running with high resolution and large number of frames.
279-
So here we skip measuring the speedup with quantization and compilation on one single NVIDIA L20 GPU and choose to use context parallelism to release the memory pressure.
280-
If you want to run HunyuanVideo with `torch.compile` on GPUs with less than 80GB memory, you can try reducing the resolution and the number of frames to avoid OOM errors.
277+
A NVIDIA L20 GPU only has 48GB memory and could face out-of-memory (OOM) errors after compilation and if `enable_model_cpu_offload` isn't called because HunyuanVideo has very large activation tensors when running with high resolution and large number of frames. For GPUs with less than 80GB of memory, you can try reducing the resolution and number of frames to avoid OOM errors.
281278

282-
Due to the fact that large video generation models usually have performance bottleneck on the attention computation rather than the fully connected layers, we don't observe a significant speedup with quantization and compilation.
283-
However, models like `FLUX.1-dev` can benefit a lot from quantization and compilation, it is suggested to try it for these models.
279+
Large video generation models are usually bottlenecked by the attention computations rather than the fully connected layers. These models don't significantly benefit from quantization and torch.compile.
284280

285281
</hfoption>
286282
</hfoptions>

0 commit comments

Comments
 (0)