Skip to content

Commit 2ebc20f

Browse files
bingyanghuangluotao1
authored andcommitted
Cherry-pick #16515 INT8v2 readme to Release 1.4 (#16686)
1 parent 2b80092 commit 2ebc20f

File tree

1 file changed

+70
-0
lines changed

1 file changed

+70
-0
lines changed
Lines changed: 70 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,70 @@
1+
# INT8 MKL-DNN quantization
2+
3+
This document describes how to use Paddle inference Engine to convert the FP32 model to INT8 model on ResNet-50 and MobileNet-V1. We provide the instructions on enabling INT8 MKL-DNN quantization in Paddle inference and show the ResNet-50 and MobileNet-V1 results in accuracy and performance.
4+
5+
## 0. Install PaddlePaddle
6+
Follow PaddlePaddle [installation instruction](https://github.com/PaddlePaddle/models/tree/develop/fluid/PaddleCV/image_classification#installation) to install PaddlePaddle. If you build PaddlePaddle yourself, please use the following cmake arguments.
7+
```
8+
cmake .. -DWITH_TESTING=ON -WITH_FLUID_ONLY=ON -DWITH_GPU=OFF -DWITH_MKL=ON -WITH_SWIG_PY=OFF -DWITH_INFERENCE_API_TEST=ON -DON_INFER=ON
9+
10+
```
11+
Note: MKL-DNN and MKL are required.
12+
13+
## 1. Enable INT8 MKL-DNN quantization
14+
For reference, please examine the code of unit test enclosed in [analyzer_int8_image_classification_tester.cc](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/fluid/inference/tests/api/analyzer_int8_image_classification_tester.cc).
15+
16+
* ### Create Analysis config
17+
INT8 quantization is one of the optimizations in analysis config. More information about analysis config can be found [here](https://github.com/PaddlePaddle/FluidDoc/blob/develop/doc/fluid/advanced_usage/deploy/inference/native_infer_en.md#upgrade-performance-based-on-contribanalysisconfig-prerelease)
18+
19+
* ### Create quantize config by analysis config
20+
We enable the MKL-DNN quantization procedure by calling an appropriate method from analysis config. Afterwards, all the required quantization parameters (quantization op names, quantization strategies etc.) can be set through quantizer config which is present in the analysis config. It is also necessary to specify a pre-processed warmup dataset and desired batch size.
21+
22+
```cpp
23+
//Enable MKL-DNN quantization
24+
cfg.EnableMkldnnQuantizer();
25+
26+
//use analysis config to call the MKL-DNN quantization config
27+
cfg.mkldnn_quantizer_config()->SetWarmupData(warmup_data);
28+
cfg.mkldnn_quantizer_config()->SetWarmupBatchSize(100);
29+
```
30+
31+
## 2. Accuracy and Performance benchmark
32+
33+
We provide the results of accuracy and performance measured on Intel(R) Xeon(R) Gold 6271 on single core.
34+
35+
>**I. Top-1 Accuracy on Intel(R) Xeon(R) Gold 6271**
36+
37+
| Model | Dataset | FP32 Accuracy | INT8 Accuracy | Accuracy Diff |
38+
| :------------: | :------------: | :------------: | :------------: | :------------: |
39+
| ResNet-50 | Full ImageNet Val | 76.63% | 76.48% | 0.15% |
40+
| MobileNet-V1 | Full ImageNet Val | 70.78% | 70.36% | 0.42% |
41+
42+
>**II. Throughput on Intel(R) Xeon(R) Gold 6271 (batch size 1 on single core)**
43+
44+
| Model | Dataset | FP32 Throughput | INT8 Throughput | Ratio(INT8/FP32) |
45+
| :------------: | :------------: | :------------: | :------------: | :------------: |
46+
| ResNet-50 | Full ImageNet Val | 13.17 images/s | 49.84 images/s | 3.78 |
47+
| MobileNet-V1 | Full ImageNet Val | 75.49 images/s | 232.38 images/s | 3.07 |
48+
49+
Notes:
50+
* Measurement of accuracy requires a model which accepts two inputs: data and labels.
51+
* Different sampling batch data may cause slight difference on INT8 top accuracy.
52+
* C-API performance data is better than Python API performance data because of the python overhead. Especially for the small computational model, python overhead will be more obvious.
53+
54+
55+
## 3. Commands to reproduce the above accuracy and performance benchmark
56+
* #### Full dataset (Single core)
57+
* ##### Download full ImageNet Validation Dataset
58+
```bash
59+
cd /PATH/TO/PADDLE/build
60+
python ../paddle/fluid/inference/tests/api/full_ILSVRC2012_val_preprocess.py
61+
```
62+
The converted data binary file is saved by default in ~/.cache/paddle/dataset/int8/download/int8_full_val.bin
63+
* ##### ResNet50 Full dataset benchmark
64+
```bash
65+
./paddle/fluid/inference/tests/api/test_analyzer_int8_resnet50 --infer_model=third_party/inference_demo/int8v2/resnet50/model --infer_data=/path/to/converted/int8_full_val.bin --batch_size=1 --paddle_num_threads=1
66+
```
67+
* ##### Mobilenet-v1 Full dataset benchmark
68+
```bash
69+
./paddle/fluid/inference/tests/api/test_analyzer_int8_mobilenet --infer_model=third_party/inference_demo/int8v2/mobilenet/model --infer_data=/path/to/converted/int8_full_val.bin --batch_size=1 --paddle_num_threads=1
70+
```

0 commit comments

Comments
 (0)