File tree Expand file tree Collapse file tree 1 file changed +4
-3
lines changed Expand file tree Collapse file tree 1 file changed +4
-3
lines changed Original file line number Diff line number Diff line change 1
1
---
2
2
layout : post
3
- title : " Boosting vLLM Performance on AMD ROCm: PTPC-FP8 Quantization Unleashes Speed and Accuracy "
3
+ title : " PTPC-FP8: Boosting vLLM Performance on AMD ROCm"
4
4
author : " AMD and Embedded LLM"
5
5
image : /assets/figures/ptpc/PTPC-tumbnail.png
6
6
thumbnail-img : /assets/figures/ptpc/PTPC-tumbnail.png
@@ -36,7 +36,6 @@ LLMs develop activation outliers as they scale beyond certain sizes. These unusu
36
36
- Most values receive few effective bits of precision when using per-tensor quantization
37
37
- Outliers appear persistently in specific channels across different tokens
38
38
- While weights are relatively uniform and easy to quantize, activations are not
39
-
40
39
#### PTPC: A Precision-Targeted Approach
41
40
42
41
PTPC-FP8 (Per-Token-Activation, Per-Channel-Weight FP8) addresses this challenge by using tailored scaling factors based on three key observations:
@@ -49,7 +48,9 @@ This insight led to a dual-granularity approach:
49
48
* ** Per-Token Activation Quantization** : Each input token receives its own scaling factor
50
49
* ** Per-Channel Weight Quantization** : Each weight column gets a unique scaling factor
51
50
52
- <img align =" right " src =" /assets/figures/ptpc/PTPC-Diagram.png " alt =" Per-Token Activation + Per-Channel Weight Quantization " width =" 50% " height =" 50% " >
51
+ <div align =" center " >
52
+ <img src =" /assets/figures/ptpc/PTPC-Diagram.png " alt =" Per-Token Activation + Per-Channel Weight Quantization " width =" 80% " >
53
+ </div >
53
54
54
55
#### Understanding the Diagram
55
56
You can’t perform that action at this time.
0 commit comments