Skip to content

Commit 009dc55

Browse files
committed
Add comment for BaseQuantizeConfig
1 parent 7554cbc commit 009dc55

File tree

1 file changed

+17
-0
lines changed

1 file changed

+17
-0
lines changed

auto_fp8/config.py

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,23 @@
22

33

44
class BaseQuantizeConfig:
5+
"""Configuration for model quantization.
6+
7+
Args:
8+
quant_method: Type/precision of quantization method to use.
9+
At the moment, this is just "fp8" which specifically means
10+
the fp8_e4m3 format in pytorch.
11+
activation_scheme: Choice of either "dynamic" or "static" quantization
12+
of activtions. If "static", then calibration samples are required
13+
during quantization to produce accurate per-tensor scales for
14+
activations of Linear modules.
15+
ignore_patterns: List of patterns used to ignore layers. If a string
16+
starts with "re:", then everything afterwards is used as python
17+
regex style matching i.e. re.search(), for each Linear layer.
18+
By default, "re:.*lm_head" is included to ignore the embedding
19+
Linear layer usually at the end of decoder LLMs
20+
"""
21+
522
def __init__(
623
self,
724
quant_method: str = "fp8",

0 commit comments

Comments
 (0)