File tree Expand file tree Collapse file tree 1 file changed +17
-0
lines changed Expand file tree Collapse file tree 1 file changed +17
-0
lines changed Original file line number Diff line number Diff line change 2
2
3
3
4
4
class BaseQuantizeConfig :
5
+ """Configuration for model quantization.
6
+
7
+ Args:
8
+ quant_method: Type/precision of quantization method to use.
9
+ At the moment, this is just "fp8" which specifically means
10
+ the fp8_e4m3 format in pytorch.
11
+ activation_scheme: Choice of either "dynamic" or "static" quantization
12
+ of activtions. If "static", then calibration samples are required
13
+ during quantization to produce accurate per-tensor scales for
14
+ activations of Linear modules.
15
+ ignore_patterns: List of patterns used to ignore layers. If a string
16
+ starts with "re:", then everything afterwards is used as python
17
+ regex style matching i.e. re.search(), for each Linear layer.
18
+ By default, "re:.*lm_head" is included to ignore the embedding
19
+ Linear layer usually at the end of decoder LLMs
20
+ """
21
+
5
22
def __init__ (
6
23
self ,
7
24
quant_method : str = "fp8" ,
You can’t perform that action at this time.
0 commit comments