File tree Expand file tree Collapse file tree 1 file changed +34
-1
lines changed Expand file tree Collapse file tree 1 file changed +34
-1
lines changed Original file line number Diff line number Diff line change 1
- # AutoFP8
1
+ # AutoFP8
2
+
3
+ Example model with static scales for activations and weights: https://huggingface.co/nm-testing/Meta-Llama-3-8B-Instruct-FP8
4
+
5
+ Command to produce:
6
+ ``` bash
7
+ python quantize.py --model-id meta-llama/Meta-Llama-3-8B-Instruct --save-dir Meta-Llama-3-8B-Instruct-FP8
8
+ ```
9
+
10
+ ## Checkpoint structure
11
+
12
+ Here we detail the experimental structure for the fp8 checkpoints.
13
+
14
+ The following is added to config.json
15
+ ``` python
16
+ " quantization_config" : {
17
+ " quant_method" : " fp8" ,
18
+ " activation_scheme" : " static" or " dynamic"
19
+ },
20
+ ```
21
+
22
+ Each quantized layer in the state_dict will have:
23
+
24
+ If the config has ` "activation_scheme": "static" ` :
25
+ ```
26
+ model.layers.0.mlp.down_proj.weight < F8_E4M3
27
+ model.layers.0.mlp.down_proj.act_scale < F32
28
+ model.layers.0.mlp.down_proj.weight_scale < F32
29
+ ```
30
+ If config has ` "activation_scheme": "dynamic" ` :
31
+ ```
32
+ model.layers.0.mlp.down_proj.weight < F8_E4M3
33
+ model.layers.0.mlp.down_proj.weight_scale < F32
34
+ ```
You can’t perform that action at this time.
0 commit comments