@@ -109,10 +109,9 @@ The code snipped shows how the `Qat2Int8MkldnnPass` can be applied to a model gr
109
109
110
110
## 5. Accuracy and Performance benchmark
111
111
112
- This section contain QAT2 MKL-DNN accuracy and performance benchmark results measured on two servers :
112
+ This section contain QAT2 MKL-DNN accuracy and performance benchmark results measured on the following server :
113
113
114
114
* Intel(R) Xeon(R) Gold 6271 (with AVX512 VNNI support),
115
- * Intel(R) Xeon(R) Gold 6148.
116
115
117
116
Performance benchmarks were run with the following environment settings:
118
117
@@ -144,17 +143,6 @@ Performance benchmarks were run with the following environment settings:
144
143
| VGG16 | 72.08% | 71.73% | -0.35% | 90.63% | 89.71% | -0.92% |
145
144
| VGG19 | 72.57% | 72.12% | -0.45% | 90.84% | 90.15% | -0.69% |
146
145
147
- > ** Intel(R) Xeon(R) Gold 6148**
148
-
149
- | Model | FP32 Top1 Accuracy | INT8 QAT Top1 Accuracy | Top1 Diff | FP32 Top5 Accuracy | INT8 QAT Top5 Accuracy | Top5 Diff |
150
- | :----------: | :----------------: | :--------------------: | :-------: | :----------------: | :--------------------: | :-------: |
151
- | MobileNet-V1 | 70.78% | 70.85% | 0.07% | 89.69% | 89.41% | -0.28% |
152
- | MobileNet-V2 | 71.90% | 72.08% | 0.18% | 90.56% | 90.66% | +0.10% |
153
- | ResNet101 | 77.50% | 77.51% | 0.01% | 93.58% | 93.50% | -0.08% |
154
- | ResNet50 | 76.63% | 76.55% | -0.08% | 93.10% | 92.96% | -0.14% |
155
- | VGG16 | 72.08% | 71.72% | -0.36% | 90.63% | 89.75% | -0.88% |
156
- | VGG19 | 72.57% | 72.08% | -0.49% | 90.84% | 90.11% | -0.73% |
157
-
158
146
#### Performance
159
147
160
148
Image classification models performance was measured using a single thread. The setting is included in the benchmark reproduction commands below.
@@ -164,23 +152,12 @@ Image classification models performance was measured using a single thread. The
164
152
165
153
| Model | FP32 (images/s) | INT8 QAT (images/s) | Ratio (INT8/FP32) |
166
154
| :----------: | :-------------: | :-----------------: | :---------------: |
167
- | MobileNet-V1 | 77.00 | 210.76 | 2.74 |
168
- | MobileNet-V2 | 88.43 | 182.47 | 2.06 |
169
- | ResNet101 | 7.20 | 25.88 | 3.60 |
170
- | ResNet50 | 13.26 | 47.44 | 3.58 |
171
- | VGG16 | 3.48 | 10.11 | 2.90 |
172
- | VGG19 | 2.83 | 8.77 | 3.10 |
173
-
174
- > ** Intel(R) Xeon(R) Gold 6148**
175
-
176
- | Model | FP32 (images/s) | INT8 QAT (images/s) | Ratio (INT8/FP32) |
177
- | :----------: | :-------------: | :-----------------: | :---------------: |
178
- | MobileNet-V1 | 75.23 | 103.63 | 1.38 |
179
- | MobileNet-V2 | 86.65 | 128.14 | 1.48 |
180
- | ResNet101 | 6.61 | 10.79 | 1.63 |
181
- | ResNet50 | 12.42 | 19.65 | 1.58 |
182
- | VGG16 | 3.31 | 4.74 | 1.43 |
183
- | VGG19 | 2.68 | 3.91 | 1.46 |
155
+ | MobileNet-V1 | 74.05 | 196.98 | 2.66 |
156
+ | MobileNet-V2 | 88.60 | 187.67 | 2.12 |
157
+ | ResNet101 | 7.20 | 26.43 | 3.67 |
158
+ | ResNet50 | 13.23 | 47.44 | 3.59 |
159
+ | VGG16 | 3.47 | 10.20 | 2.94 |
160
+ | VGG19 | 2.83 | 8.67 | 3.06 |
184
161
185
162
Notes:
186
163
@@ -194,13 +171,8 @@ Notes:
194
171
195
172
| Model | FP32 Accuracy | QAT INT8 Accuracy | Accuracy Diff |
196
173
| :------------:| :----------------------:| :----------------------:| :---------:|
197
- | Ernie | 80.20% | 79.88 % | -0.32 % |
174
+ | Ernie | 80.20% | 79.44 % | -0.76 % |
198
175
199
- > ** Intel(R) Xeon(R) Gold 6148**
200
-
201
- | Model | FP32 Accuracy | QAT INT8 Accuracy | Accuracy Diff |
202
- | :---: | :-----------: | :---------------: | :-----------: |
203
- | Ernie | 80.20% | 79.64% | -0.56% |
204
176
205
177
#### Performance
206
178
@@ -209,16 +181,9 @@ Notes:
209
181
210
182
| Model | Threads | FP32 Latency (ms) | QAT INT8 Latency (ms) | Ratio (FP32/INT8) |
211
183
| :------------:| :----------------------:| :-------------------:| :---------:| :---------:|
212
- | Ernie | 1 thread | 236.72 | 83.70 | 2.82x |
213
- | Ernie | 20 threads | 27.40 | 15.01 | 1.83x |
214
-
215
-
216
- > ** Intel(R) Xeon(R) Gold 6148**
184
+ | Ernie | 1 thread | 237.21 | 79.26 | 2.99x |
185
+ | Ernie | 20 threads | 22.08 | 12.57 | 1.76x |
217
186
218
- | Model | Threads | FP32 Latency (ms) | QAT INT8 Latency (ms) | Ratio (FP32/INT8) |
219
- | :---: | :--------: | :---------------: | :-------------------: | :---------------: |
220
- | Ernie | 1 thread | 248.42 | 169.30 | 1.46 |
221
- | Ernie | 20 threads | 28.92 | 20.83 | 1.39 |
222
187
223
188
## 6. How to reproduce the results
224
189
0 commit comments