You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: examples/AIU_CONVERSION/README.md
+30-1Lines changed: 30 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -34,4 +34,33 @@ python -m fms_mo.run_quant \
34
34
> - In this example, we are not evaluating the perplexity of the quantized model, but, if so desired, the user can add the `--eval_ppl` flag.
35
35
> - We set a single calibration example because the quantizers in use do not need calibration: weights remain static during DQ, so a single example will initialize the quantizer correctly, and the activation quantizer `pertokenmax` will dynamically recompute the quantization range at inference time, when running on the AIU.
36
36
37
-
**3. Reload checkpoint for testing**
37
+
**3. Reload checkpoint for testing** and validate its content (optional).
assertall(v.dtype == torch.int8 for k,v in sd.items() ifany(n in k for n in roberta_qlayers) and k.endswith(".weight"))
50
+
# assert all other parameters are fp16
51
+
assertall(v.dtype == torch.float16 for k,v in sd.items() ifall(n notin k for n in roberta_qlayers) ornot k.endswith(".weight"))
52
+
```
53
+
54
+
> [!TIP]
55
+
> - We have trained the model with symmetric quantizer for activations (`qa_mode`). If an asymmetric quantizer is used, then the checkpoint will also carry a `zero_shift` parameters which is torch.float32, so this validation step should be modified accordingly.
56
+
57
+
Because we have used the `narrow_weight_recomputation` option along with a `maxperCh` (max per-channel) quantizer for weights, the INT weight matrices distributions have been widened. Most values of standard deviation (per channel) should surpass the empirical threshold of 20.
58
+
59
+
```python
60
+
[f"{v.to(torch.float32).std(dim=-1).mean():.4f}"for k,v in sd.items() if k.endswith(".weight") andany(n in k for n in roberta_qlayers)]
61
+
```
62
+
63
+
> [!TIP]
64
+
> - We cast the torch.int8 weights to torch.float32 to be able to apply the torch.std function.
65
+
> - For per-channel weights, the recomputation is applied per-channel. Here we print a mean across channels for help of visualization.
66
+
> - It is not a guarantee that the recomputed weights will exceed the empirical threshold after recomputation, but it is the case for several common models of BERT, RoBERTa, Llama, and Granite families.
0 commit comments