Skip to content

Commit 8ef3c02

Browse files
Update DNNL QAT document 2.0-alpha (#24494)
Update DNNL QAT document 2.0-alpha
1 parent db2b6b6 commit 8ef3c02

File tree

1 file changed

+10
-45
lines changed

1 file changed

+10
-45
lines changed

python/paddle/fluid/contrib/slim/tests/QAT_mkldnn_int8_readme.md

Lines changed: 10 additions & 45 deletions
Original file line numberDiff line numberDiff line change
@@ -109,10 +109,9 @@ The code snipped shows how the `Qat2Int8MkldnnPass` can be applied to a model gr
109109

110110
## 5. Accuracy and Performance benchmark
111111

112-
This section contain QAT2 MKL-DNN accuracy and performance benchmark results measured on two servers:
112+
This section contain QAT2 MKL-DNN accuracy and performance benchmark results measured on the following server:
113113

114114
* Intel(R) Xeon(R) Gold 6271 (with AVX512 VNNI support),
115-
* Intel(R) Xeon(R) Gold 6148.
116115

117116
Performance benchmarks were run with the following environment settings:
118117

@@ -144,17 +143,6 @@ Performance benchmarks were run with the following environment settings:
144143
| VGG16 | 72.08% | 71.73% | -0.35% | 90.63% | 89.71% | -0.92% |
145144
| VGG19 | 72.57% | 72.12% | -0.45% | 90.84% | 90.15% | -0.69% |
146145

147-
>**Intel(R) Xeon(R) Gold 6148**
148-
149-
| Model | FP32 Top1 Accuracy | INT8 QAT Top1 Accuracy | Top1 Diff | FP32 Top5 Accuracy | INT8 QAT Top5 Accuracy | Top5 Diff |
150-
| :----------: | :----------------: | :--------------------: | :-------: | :----------------: | :--------------------: | :-------: |
151-
| MobileNet-V1 | 70.78% | 70.85% | 0.07% | 89.69% | 89.41% | -0.28% |
152-
| MobileNet-V2 | 71.90% | 72.08% | 0.18% | 90.56% | 90.66% | +0.10% |
153-
| ResNet101 | 77.50% | 77.51% | 0.01% | 93.58% | 93.50% | -0.08% |
154-
| ResNet50 | 76.63% | 76.55% | -0.08% | 93.10% | 92.96% | -0.14% |
155-
| VGG16 | 72.08% | 71.72% | -0.36% | 90.63% | 89.75% | -0.88% |
156-
| VGG19 | 72.57% | 72.08% | -0.49% | 90.84% | 90.11% | -0.73% |
157-
158146
#### Performance
159147

160148
Image classification models performance was measured using a single thread. The setting is included in the benchmark reproduction commands below.
@@ -164,23 +152,12 @@ Image classification models performance was measured using a single thread. The
164152
165153
| Model | FP32 (images/s) | INT8 QAT (images/s) | Ratio (INT8/FP32) |
166154
| :----------: | :-------------: | :-----------------: | :---------------: |
167-
| MobileNet-V1 | 77.00 | 210.76 | 2.74 |
168-
| MobileNet-V2 | 88.43 | 182.47 | 2.06 |
169-
| ResNet101 | 7.20 | 25.88 | 3.60 |
170-
| ResNet50 | 13.26 | 47.44 | 3.58 |
171-
| VGG16 | 3.48 | 10.11 | 2.90 |
172-
| VGG19 | 2.83 | 8.77 | 3.10 |
173-
174-
>**Intel(R) Xeon(R) Gold 6148**
175-
176-
| Model | FP32 (images/s) | INT8 QAT (images/s) | Ratio (INT8/FP32) |
177-
| :----------: | :-------------: | :-----------------: | :---------------: |
178-
| MobileNet-V1 | 75.23 | 103.63 | 1.38 |
179-
| MobileNet-V2 | 86.65 | 128.14 | 1.48 |
180-
| ResNet101 | 6.61 | 10.79 | 1.63 |
181-
| ResNet50 | 12.42 | 19.65 | 1.58 |
182-
| VGG16 | 3.31 | 4.74 | 1.43 |
183-
| VGG19 | 2.68 | 3.91 | 1.46 |
155+
| MobileNet-V1 | 74.05 | 196.98 | 2.66 |
156+
| MobileNet-V2 | 88.60 | 187.67 | 2.12 |
157+
| ResNet101 | 7.20 | 26.43 | 3.67 |
158+
| ResNet50 | 13.23 | 47.44 | 3.59 |
159+
| VGG16 | 3.47 | 10.20 | 2.94 |
160+
| VGG19 | 2.83 | 8.67 | 3.06 |
184161

185162
Notes:
186163

@@ -194,13 +171,8 @@ Notes:
194171
195172
| Model | FP32 Accuracy | QAT INT8 Accuracy | Accuracy Diff |
196173
|:------------:|:----------------------:|:----------------------:|:---------:|
197-
| Ernie | 80.20% | 79.88% | -0.32% |
174+
| Ernie | 80.20% | 79.44% | -0.76% |
198175

199-
>**Intel(R) Xeon(R) Gold 6148**
200-
201-
| Model | FP32 Accuracy | QAT INT8 Accuracy | Accuracy Diff |
202-
| :---: | :-----------: | :---------------: | :-----------: |
203-
| Ernie | 80.20% | 79.64% | -0.56% |
204176

205177
#### Performance
206178

@@ -209,16 +181,9 @@ Notes:
209181
210182
| Model | Threads | FP32 Latency (ms) | QAT INT8 Latency (ms) | Ratio (FP32/INT8) |
211183
|:------------:|:----------------------:|:-------------------:|:---------:|:---------:|
212-
| Ernie | 1 thread | 236.72 | 83.70 | 2.82x |
213-
| Ernie | 20 threads | 27.40 | 15.01 | 1.83x |
214-
215-
216-
>**Intel(R) Xeon(R) Gold 6148**
184+
| Ernie | 1 thread | 237.21 | 79.26 | 2.99x |
185+
| Ernie | 20 threads | 22.08 | 12.57 | 1.76x |
217186

218-
| Model | Threads | FP32 Latency (ms) | QAT INT8 Latency (ms) | Ratio (FP32/INT8) |
219-
| :---: | :--------: | :---------------: | :-------------------: | :---------------: |
220-
| Ernie | 1 thread | 248.42 | 169.30 | 1.46 |
221-
| Ernie | 20 threads | 28.92 | 20.83 | 1.39 |
222187

223188
## 6. How to reproduce the results
224189

0 commit comments

Comments
 (0)