Skip to content

Commit 8e9ba97

Browse files
chengzeyistevhliu
andauthored
Update docs/source/en/optimization/para_attn.md
Co-authored-by: Steven Liu <[email protected]>
1 parent 873426d commit 8e9ba97

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

docs/source/en/optimization/para_attn.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -140,7 +140,7 @@ First Block Cache reduced the inference speed to 2271.06 seconds compared to the
140140
</hfoption>
141141
</hfoptions>
142142

143-
### FP8 Quantization
143+
## fp8 quantization
144144

145145
fp8 with dynamic quantization further speeds up inference and reduces memory usage. Both the activations and weights must be quantized in order to use the 8-bit [NVIDIA Tensor Cores](https://www.nvidia.com/en-us/data-center/tensor-cores/).
146146

0 commit comments

Comments
 (0)