Skip to content

Commit 36dae24

Browse files
[cherry-pick]Changes QAT MKL-DNN documents (#22875)
1 parent 674aa06 commit 36dae24

File tree

1 file changed

+6
-8
lines changed

1 file changed

+6
-8
lines changed

python/paddle/fluid/contrib/slim/tests/QAT_mkldnn_int8_readme.md

Lines changed: 6 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ Notes:
88
* INT8 accuracy is best on CPU servers supporting AVX512 VNNI extension.
99

1010
## 0. Prerequisite
11-
You need to install at least PaddlePaddle-1.7 python package `pip install paddlepaddle==1.7`.
11+
You need to install at least PaddlePaddle-1.7.1 python package `pip install paddlepaddle==1.7.1`.
1212

1313
## 1. How to generate INT8 MKL-DNN QAT model
1414
You can refer to the unit test in [test_quantization_mkldnn_pass.py](test_quantization_mkldnn_pass.py). Users firstly use PaddleSlim quantization strategy to get a saved fake QAT model by [QuantizationFreezePass](https://github.com/PaddlePaddle/models/tree/develop/PaddleSlim/quant_low_level_api), then use the `QatInt8MkldnnPass` (from QAT1.0 MKL-DNN) to get a graph which can be run with MKL-DNN INT8 kernel. In Paddle Release 1.6, this pass supports `conv2d` and `depthwise_conv2d` ops with channel-wise quantization for weights. Apart from it, another pass called `Qat2Int8MkldnnPass` (from QAT2.0 MKL-DNN) is available for use. In Release 1.6, this pass additionally supports `pool2d` op and allows users to transform their QAT model into a highly performance-optimized INT8 model that is ran using INT8 MKL-DNN kernels. In Release 1.7, a support for `fc`, `reshape2` and `transpose2` ops was added to the pass.
@@ -83,21 +83,21 @@ Notes:
8383
8484
| Model | FP32 Accuracy | QAT INT8 Accuracy | Accuracy Diff |
8585
|:------------:|:----------------------:|:----------------------:|:---------:|
86-
| Ernie | 79.76% | 79.28% | -0.48% |
86+
| Ernie | 80.20% | 79.96% | -0.24% |
8787

8888

8989
>**V. Ernie QAT2.0 MKL-DNN Performance on Intel(R) Xeon(R) Gold 6271**
9090
91-
| Threads | FP32 Latency (ms) | QAT INT8 Latency (ms) | Latency Diff |
91+
| Threads | FP32 Latency (ms) | QAT INT8 Latency (ms) | Ratio (FP32/INT8) |
9292
|:------------:|:----------------------:|:-------------------:|:---------:|
9393
| 1 thread | 252.131 | 93.8023 | 2.687x |
9494
| 20 threads | 29.1853 | 17.3765 | 1.680x |
9595

9696
## 3. How to reproduce the results
97-
Three steps are needed to reproduce the above-mentioned accuracy and performance results. Below we explain the steps taking ResNet50 as an example of image classification models. In order to reproduce NLP results, please follow [this guide](https://github.com/PaddlePaddle/benchmark/tree/master/Inference/c%2B%2B/ernie/mkldnn/README.md).
98-
### Prepare dataset
97+
To reproduce the above-mentioned Image Classification models accuracy and performance, follow steps as below (taking ResNet50 as an example).
98+
To reproduce NLP models results (Ernie), please follow [How to reproduce Ernie QAT results on MKL-DNN](https://github.com/PaddlePaddle/benchmark/tree/master/Inference/c%2B%2B/ernie/mkldnn/README.md).
9999

100-
#### Image classification
100+
### Prepare dataset
101101

102102
In order to download the dataset for image classification models benchmarking, execute:
103103

@@ -109,7 +109,6 @@ The converted data binary file is saved by default in `$HOME/.cache/paddle/datas
109109

110110
### Prepare model
111111

112-
#### Image classification
113112
You can run the following commands to download ResNet50 model. The exemplary code snippet provided below downloads a ResNet50 QAT model. The reason for having two different versions of the same model originates from having two different QAT training strategies: One for an non-optimized and second for an optimized graph transform which correspond to QAT1.0 and QAT2.0 respectively.
114113

115114
```bash
@@ -134,7 +133,6 @@ MODEL_NAME=resnet50, resnet101, mobilenetv1, mobilenetv2, vgg16, vgg19
134133
```
135134
### Commands to reproduce benchmark
136135

137-
#### Image classification
138136
You can use the `qat_int8_image_classification_comparison.py` script to reproduce the accuracy result on ResNet50. The difference between commands usedin the QAT1.0 MKL-DNN and QAT2.0 MKL-DNN is that for QAT2.0 MKL-DNN two additional options are required: the `--qat2` option to enable QAT2.0 MKL-DNN, and the `--quantized_ops` option with a comma-separated list of operators to be quantized. To perform the QAT2.0 MKL-DNN performance test, the environment variable `OMP_NUM_THREADS=1` and `--batch_size=1` option should be set.
139137
>*QAT1.0*
140138

0 commit comments

Comments
 (0)