You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: python/paddle/fluid/contrib/slim/tests/QAT_mkldnn_int8_readme.md
+6-8Lines changed: 6 additions & 8 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -8,7 +8,7 @@ Notes:
8
8
* INT8 accuracy is best on CPU servers supporting AVX512 VNNI extension.
9
9
10
10
## 0. Prerequisite
11
-
You need to install at least PaddlePaddle-1.7 python package `pip install paddlepaddle==1.7`.
11
+
You need to install at least PaddlePaddle-1.7.1 python package `pip install paddlepaddle==1.7.1`.
12
12
13
13
## 1. How to generate INT8 MKL-DNN QAT model
14
14
You can refer to the unit test in [test_quantization_mkldnn_pass.py](test_quantization_mkldnn_pass.py). Users firstly use PaddleSlim quantization strategy to get a saved fake QAT model by [QuantizationFreezePass](https://github.com/PaddlePaddle/models/tree/develop/PaddleSlim/quant_low_level_api), then use the `QatInt8MkldnnPass` (from QAT1.0 MKL-DNN) to get a graph which can be run with MKL-DNN INT8 kernel. In Paddle Release 1.6, this pass supports `conv2d` and `depthwise_conv2d` ops with channel-wise quantization for weights. Apart from it, another pass called `Qat2Int8MkldnnPass` (from QAT2.0 MKL-DNN) is available for use. In Release 1.6, this pass additionally supports `pool2d` op and allows users to transform their QAT model into a highly performance-optimized INT8 model that is ran using INT8 MKL-DNN kernels. In Release 1.7, a support for `fc`, `reshape2` and `transpose2` ops was added to the pass.
Three steps are needed to reproduce the above-mentioned accuracy and performance results. Below we explain the steps taking ResNet50 as an example of image classification models. In order to reproduce NLP results, please follow [this guide](https://github.com/PaddlePaddle/benchmark/tree/master/Inference/c%2B%2B/ernie/mkldnn/README.md).
98
-
### Prepare dataset
97
+
To reproduce the above-mentioned Image Classification models accuracy and performance, follow steps as below (taking ResNet50 as an example).
98
+
To reproduce NLP models results (Ernie), please follow [How to reproduce Ernie QAT results on MKL-DNN](https://github.com/PaddlePaddle/benchmark/tree/master/Inference/c%2B%2B/ernie/mkldnn/README.md).
99
99
100
-
#### Image classification
100
+
###Prepare dataset
101
101
102
102
In order to download the dataset for image classification models benchmarking, execute:
103
103
@@ -109,7 +109,6 @@ The converted data binary file is saved by default in `$HOME/.cache/paddle/datas
109
109
110
110
### Prepare model
111
111
112
-
#### Image classification
113
112
You can run the following commands to download ResNet50 model. The exemplary code snippet provided below downloads a ResNet50 QAT model. The reason for having two different versions of the same model originates from having two different QAT training strategies: One for an non-optimized and second for an optimized graph transform which correspond to QAT1.0 and QAT2.0 respectively.
You can use the `qat_int8_image_classification_comparison.py` script to reproduce the accuracy result on ResNet50. The difference between commands usedin the QAT1.0 MKL-DNN and QAT2.0 MKL-DNN is that for QAT2.0 MKL-DNN two additional options are required: the `--qat2` option to enable QAT2.0 MKL-DNN, and the `--quantized_ops` option with a comma-separated list of operators to be quantized. To perform the QAT2.0 MKL-DNN performance test, the environment variable `OMP_NUM_THREADS=1` and `--batch_size=1` option should be set.
0 commit comments