Skip to content

Commit bc57989

Browse files
tjtanaatanpinsiang
authored andcommitted
shorten the title and fix the image scaling
Signed-off-by: tjtanaa <[email protected]> Signed-off-by: tanpinsiang <[email protected]>
1 parent 711a2d6 commit bc57989

File tree

1 file changed

+4
-3
lines changed

1 file changed

+4
-3
lines changed

_posts/2025-02-24-ptpc-fp8-rocm.md

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
---
22
layout: post
3-
title: "Boosting vLLM Performance on AMD ROCm: PTPC-FP8 Quantization Unleashes Speed and Accuracy"
3+
title: "PTPC-FP8: Boosting vLLM Performance on AMD ROCm"
44
author: "AMD and Embedded LLM"
55
image: /assets/figures/ptpc/PTPC-tumbnail.png
66
thumbnail-img: /assets/figures/ptpc/PTPC-tumbnail.png
@@ -36,7 +36,6 @@ LLMs develop activation outliers as they scale beyond certain sizes. These unusu
3636
- Most values receive few effective bits of precision when using per-tensor quantization
3737
- Outliers appear persistently in specific channels across different tokens
3838
- While weights are relatively uniform and easy to quantize, activations are not
39-
4039
#### PTPC: A Precision-Targeted Approach
4140

4241
PTPC-FP8 (Per-Token-Activation, Per-Channel-Weight FP8) addresses this challenge by using tailored scaling factors based on three key observations:
@@ -49,7 +48,9 @@ This insight led to a dual-granularity approach:
4948
* **Per-Token Activation Quantization**: Each input token receives its own scaling factor
5049
* **Per-Channel Weight Quantization**: Each weight column gets a unique scaling factor
5150

52-
<img align="right" src="/assets/figures/ptpc/PTPC-Diagram.png" alt="Per-Token Activation + Per-Channel Weight Quantization" width="50%" height="50%">
51+
<div align="center">
52+
<img src="/assets/figures/ptpc/PTPC-Diagram.png" alt="Per-Token Activation + Per-Channel Weight Quantization" width="80%">
53+
</div>
5354

5455
#### Understanding the Diagram
5556

0 commit comments

Comments
 (0)