Skip to content

Commit f5f49d6

Browse files
committed
First draft of AIU conversion example
Signed-off-by: Andrea Fasoli <[email protected]>
1 parent 389bbf5 commit f5f49d6

File tree

1 file changed

+37
-0
lines changed

1 file changed

+37
-0
lines changed

examples/AIU_CONVERSION/README.md

Lines changed: 37 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,37 @@
1+
# Train and prepare INT8 checkpoint for the AIU using Direct Quantization
2+
This example builds on the [Direct Quantization (DQ) example](../DQ_SQ/README.md). We assume the user is already familiar with the DQ quantization process and would like to generate an INT8-quantized checkpoint that is made compliant with the requirements of the AIU.
3+
4+
Once created, this checkpoint can be run on the AIU by using an inference script from [aiu-fms-testing-utils](https://github.com/foundation-model-stack/aiu-fms-testing-utils).
5+
6+
7+
## Requirements
8+
- [FMS Model Optimizer requirements](../../README.md#requirements)
9+
10+
## QuickStart
11+
12+
**1. Prepare Data** as per DQ quantization process ([link](../DQ_SQ/README.md)). In this example, we assume the user wants to quantized RoBERTa-base model and has thus prepared the DQ data for it, stored under the folder `data_train` and `data_test`, by adapting the DQ example accordingly.
13+
14+
**2. Apply DQ with conversion** by providing the desired quantization parameters, as well as the flags `--save_ckpt_for_aiu` and `--recompute_narrow_weights`.
15+
16+
```bash
17+
python -m fms_mo.run_quant \
18+
--model_name_or_path "roberta-base" \
19+
--training_data_path data_train \
20+
--test_data_path data_test \
21+
--torch_dtype "float16" \
22+
--quant_method dq \
23+
--nbits_w 8 \
24+
--nbits_a 8 \
25+
--nbits_kvcache 32 \
26+
--qa_mode "pertokenmax"\
27+
--qw_mode "maxperCh" \
28+
--qmodel_calibration_new 1 \
29+
--output_dir "dq_test" \
30+
--save_ckpt_for_aiu \
31+
--recompute_narrow_weights
32+
```
33+
> [!TIP]
34+
> - In this example, we are not evaluating the perplexity of the quantized model, but, if so desired, the user can add the `--eval_ppl` flag.
35+
> - We set a single calibration example because the quantizers in use do not need calibration: weights remain static during DQ, so a single example will initialize the quantizer correctly, and the activation quantizer `pertokenmax` will dynamically recompute the quantization range at inference time, when running on the AIU.
36+
37+
**3. Reload checkpoint for testing**

0 commit comments

Comments
 (0)