Skip to content

Commit 6ee7e4c

Browse files
committed
Add AIU conversion example
Signed-off-by: Andrea Fasoli <[email protected]>
1 parent 6328fbd commit 6ee7e4c

File tree

1 file changed

+30
-1
lines changed

1 file changed

+30
-1
lines changed

examples/AIU_CONVERSION/README.md

Lines changed: 30 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -34,4 +34,33 @@ python -m fms_mo.run_quant \
3434
> - In this example, we are not evaluating the perplexity of the quantized model, but, if so desired, the user can add the `--eval_ppl` flag.
3535
> - We set a single calibration example because the quantizers in use do not need calibration: weights remain static during DQ, so a single example will initialize the quantizer correctly, and the activation quantizer `pertokenmax` will dynamically recompute the quantization range at inference time, when running on the AIU.
3636
37-
**3. Reload checkpoint for testing**
37+
**3. Reload checkpoint for testing** and validate its content (optional).
38+
39+
```python
40+
sd = torch.load("dq_test/qmodel_for_aiu.pt", weights_only=True)
41+
```
42+
43+
Check that all quantized layers have been converted to `torch.int8`, while the rest are `torch.float16`.
44+
45+
```python
46+
# select quantized layers by name
47+
roberta_qlayers = ["attention.self.query", "attention.self.key", "attention.self.value", "attention.output.dense", "intermediate.dense", "output.dense"]
48+
# assert all quantized weights are int8
49+
assert all(v.dtype == torch.int8 for k,v in sd.items() if any(n in k for n in roberta_qlayers) and k.endswith(".weight"))
50+
# assert all other parameters are fp16
51+
assert all(v.dtype == torch.float16 for k,v in sd.items() if all(n not in k for n in roberta_qlayers) or not k.endswith(".weight"))
52+
```
53+
54+
> [!TIP]
55+
> - We have trained the model with symmetric quantizer for activations (`qa_mode`). If an asymmetric quantizer is used, then the checkpoint will also carry a `zero_shift` parameters which is torch.float32, so this validation step should be modified accordingly.
56+
57+
Because we have used the `narrow_weight_recomputation` option along with a `maxperCh` (max per-channel) quantizer for weights, the INT weight matrices distributions have been widened. Most values of standard deviation (per channel) should surpass the empirical threshold of 20.
58+
59+
```python
60+
[f"{v.to(torch.float32).std(dim=-1).mean():.4f}" for k,v in sd.items() if k.endswith(".weight") and any(n in k for n in roberta_qlayers)]
61+
```
62+
63+
> [!TIP]
64+
> - We cast the torch.int8 weights to torch.float32 to be able to apply the torch.std function.
65+
> - For per-channel weights, the recomputation is applied per-channel. Here we print a mean across channels for help of visualization.
66+
> - It is not a guarantee that the recomputed weights will exceed the empirical threshold after recomputation, but it is the case for several common models of BERT, RoBERTa, Llama, and Granite families.

0 commit comments

Comments
 (0)