Skip to content

Commit 6d30ba1

Browse files
chengzeyistevhliu
andauthored
Update docs/source/en/optimization/para_attn.md
Co-authored-by: Steven Liu <[email protected]>
1 parent a525f05 commit 6d30ba1

File tree

1 file changed

+3
-6
lines changed

1 file changed

+3
-6
lines changed

docs/source/en/optimization/para_attn.md

Lines changed: 3 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -154,12 +154,9 @@ If you are not familiar with `torchao` quantization, you can refer to this [docu
154154
pip3 install -U torch torchao
155155
```
156156

157-
We also need to pass the model to `torch.compile` to gain actual speedup.
158-
`torch.compile` with `mode="max-autotune-no-cudagraphs"` or `mode="max-autotune"` can help us to achieve the best performance by generating and selecting the best kernel for the model inference.
159-
The compilation process could take a long time, but it is worth it.
160-
If you are not familiar with `torch.compile`, you can refer to the [official tutorial](https://pytorch.org/tutorials/intermediate/torch_compile_tutorial.html).
161-
In this example, we only quantize the transformer model, but you can also quantize the text encoder to reduce more memory usage.
162-
We also need to notice that the actually compilation process is done on the first time the model is called, so we need to warm up the model to measure the speedup correctly.
157+
[torch.compile](https://pytorch.org/tutorials/intermediate/torch_compile_tutorial.html) with `mode="max-autotune-no-cudagraphs"` or `mode="max-autotune"` selects the best kernel for performance. Compilation can take a long time if it's the first time the model is called, but it is worth it once the model has been compiled.
158+
159+
This example only quantizes the transformer model, but you can also quantize the text encoder to reduce memory usage even more.
163160

164161
> [!TIP]
165162
> Dynamic quantization can significantly change the distribution of the model output, so you need to change the `residual_diff_threshold` to a larger value for it to take effect.

0 commit comments

Comments
 (0)