@@ -253,6 +253,146 @@ This page shows performance boost with Intel® Extension for PyTorch\* on severa
253253| Docker OS | Ubuntu 18.04.5 LTS |
254254| [ Spectre-Meltdown Mitigation] ( https://github.com/speed47/spectre-meltdown-checker ) | Mitigated |
255255
256+ ## FP32 with v1.11.200 on an AWS EC2 C6i.2xlarge instance
257+
258+ ### Performance Numbers
259+
260+ <table border =" 1 " cellpadding =" 10 " align =" center " class =" perf_table " >
261+ <tbody >
262+ <col >
263+ <col >
264+ <col >
265+ <colgroup span =" 2 " ></colgroup >
266+ <colgroup span =" 2 " ></colgroup >
267+ <col >
268+ <col >
269+ <col >
270+ <tr >
271+ <th rowspan="2" scope="col">Hardware</th>
272+ <th rowspan="2" scope="col">Workload<sup>1</sup></th>
273+ <th rowspan="2" scope="col">Precision</th>
274+ <th colspan="2" scope="colgroup">Throughput Inference<sup>2</sup></th>
275+ <th colspan="2" scope="colgroup">Real-time Inference<sup>3</sup></th>
276+ <th rowspan="2" scope="col">Model Type</th>
277+ <th rowspan="2" scope="col">Dataset</th>
278+ <th rowspan="2" scope="col">Input Data Shape</th>
279+ <th rowspan="2" scope="col">Tunable Parameters</th>
280+ </tr >
281+ <tr >
282+ <th scope="col">Batch Size</th>
283+ <th scope="col">Boost Ratio</th>
284+ <th scope="col">Batch Size</th>
285+ <th scope="col">Boost Ratio</th>
286+ </tr >
287+ <tr >
288+ <td style="text-align: center; vertical-align: middle" rowspan="10" scope="col">AWS EC2 C6i.2xlarge</td>
289+ <td style="text-align: center; vertical-align: middle" scope="col">ResNet50</td>
290+ <td style="text-align: center; vertical-align: middle" scope="col">Float32</td>
291+ <td style="text-align: center; vertical-align: middle" scope="col">64</td>
292+ <td style="text-align: center; vertical-align: middle" scope="col">1.24x</td>
293+ <td style="text-align: center; vertical-align: middle" scope="col">1</td>
294+ <td style="text-align: center; vertical-align: middle" scope="col">1.31x</td>
295+ <td style="text-align: center; vertical-align: middle" scope="col">Computer Vision</td>
296+ <td style="text-align: center; vertical-align: middle" scope="col">ImageNet</td>
297+ <td style="text-align: center; vertical-align: middle" scope="col">Input shape<br />[3, 224, 224]</td>
298+ <td style="text-align: center; vertical-align: middle" scope="col">Default memory allocator;<br />Intel(R) OpenMP;<br /><a href="https://github.com/IntelAI/models/tree/pytorch-r1.10-models/quickstart/image_recognition/pytorch/resnet50/inference/cpu">inference scripts</a></td>
299+ </tr >
300+ <tr >
301+ <td style="text-align: center; vertical-align: middle" scope="col">ResNext 32x16d</td>
302+ <td style="text-align: center; vertical-align: middle" scope="col">Float32</td>
303+ <td style="text-align: center; vertical-align: middle" scope="col">64</td>
304+ <td style="text-align: center; vertical-align: middle" scope="col">1.07x</td>
305+ <td style="text-align: center; vertical-align: middle" scope="col">1</td>
306+ <td style="text-align: center; vertical-align: middle" scope="col">1.05x</td>
307+ <td style="text-align: center; vertical-align: middle" scope="col">Computer Vision</td>
308+ <td style="text-align: center; vertical-align: middle" scope="col">ImageNet</td>
309+ <td style="text-align: center; vertical-align: middle" scope="col">Input shape<br />[3, 224, 224]</td>
310+ <td style="text-align: center; vertical-align: middle" scope="col">Default memory allocator;<br />Intel(R) OpenMP;<br /><a href="https://github.com/IntelAI/models/tree/pytorch-r1.10-models/quickstart/image_recognition/pytorch/resnext-32x16d/inference/cpu">inference scripts</a></td>
311+ </tr >
312+ <tr >
313+ <td style="text-align: center; vertical-align: middle" scope="col">VGG-11</td>
314+ <td style="text-align: center; vertical-align: middle" scope="col">Float32</td>
315+ <td style="text-align: center; vertical-align: middle" scope="col">64</td>
316+ <td style="text-align: center; vertical-align: middle" scope="col">1.15x</td>
317+ <td style="text-align: center; vertical-align: middle" scope="col">1</td>
318+ <td style="text-align: center; vertical-align: middle" scope="col">1.21x</td>
319+ <td style="text-align: center; vertical-align: middle" scope="col">Computer Vision</td>
320+ <td style="text-align: center; vertical-align: middle" scope="col">ImageNet</td>
321+ <td style="text-align: center; vertical-align: middle" scope="col">Input shape<br />[3, 224, 224]</td>
322+ <td style="text-align: center; vertical-align: middle" scope="col">Default memory allocator;<br />Intel(R) OpenMP;<br /><a href="https://github.com/IntelAI/models/tree/pytorch-r1.10-models/quickstart/image_recognition/pytorch/vgg11/inference/cpu">inference scripts</a></td>
323+ </tr >
324+ <tr >
325+ <td style="text-align: center; vertical-align: middle" scope="col">ShuffleNetv2_x1.0</td>
326+ <td style="text-align: center; vertical-align: middle" scope="col">Float32</td>
327+ <td style="text-align: center; vertical-align: middle" scope="col">64</td>
328+ <td style="text-align: center; vertical-align: middle" scope="col">1.12x</td>
329+ <td style="text-align: center; vertical-align: middle" scope="col">1</td>
330+ <td style="text-align: center; vertical-align: middle" scope="col">1.30x</td>
331+ <td style="text-align: center; vertical-align: middle" scope="col">Computer Vision</td>
332+ <td style="text-align: center; vertical-align: middle" scope="col">ImageNet</td>
333+ <td style="text-align: center; vertical-align: middle" scope="col">Input shape<br />[3, 224, 224]</td>
334+ <td style="text-align: center; vertical-align: middle" scope="col">Default memory allocator;<br />Intel(R) OpenMP;</td>
335+ </tr >
336+ <tr >
337+ <td style="text-align: center; vertical-align: middle" scope="col">MobileNet v2</td>
338+ <td style="text-align: center; vertical-align: middle" scope="col">Float32</td>
339+ <td style="text-align: center; vertical-align: middle" scope="col">64</td>
340+ <td style="text-align: center; vertical-align: middle" scope="col">1.08x</td>
341+ <td style="text-align: center; vertical-align: middle" scope="col">1</td>
342+ <td style="text-align: center; vertical-align: middle" scope="col">1.12x</td>
343+ <td style="text-align: center; vertical-align: middle" scope="col">Computer Vision</td>
344+ <td style="text-align: center; vertical-align: middle" scope="col">ImageNet</td>
345+ <td style="text-align: center; vertical-align: middle" scope="col">Input shape<br />[3, 224, 224]</td>
346+ <td style="text-align: center; vertical-align: middle" scope="col">Default memory allocator;<br />Intel(R) OpenMP;</td>
347+ </tr >
348+ <tr >
349+ <td style="text-align: center; vertical-align: middle" scope="col">BERT-Large</td>
350+ <td style="text-align: center; vertical-align: middle" scope="col">Float32</td>
351+ <td style="text-align: center; vertical-align: middle" scope="col">64</td>
352+ <td style="text-align: center; vertical-align: middle" scope="col">1.05x</td>
353+ <td style="text-align: center; vertical-align: middle" scope="col">1</td>
354+ <td style="text-align: center; vertical-align: middle" scope="col">1.03x</td>
355+ <td style="text-align: center; vertical-align: middle" scope="col">NLP</td>
356+ <td style="text-align: center; vertical-align: middle" scope="col">Squad</td>
357+ <td style="text-align: center; vertical-align: middle" scope="col">max_seq_len=384<br />Task: Question Answering</td>
358+ <td style="text-align: center; vertical-align: middle" scope="col">Default memory allocator;<br />Intel(R) OpenMP;<br /><a href="https://github.com/IntelAI/models/tree/pytorch-r1.10-models/quickstart/language_modeling/pytorch/bert_large/inference/cpu">inference scripts</a>;<br />Recommend to set auto_kernel_selection to ON when seq_len exceeds 64</td>
359+ </tr >
360+ <tr >
361+ <td style="text-align: center; vertical-align: middle" scope="col">Bert-Base</td>
362+ <td style="text-align: center; vertical-align: middle" scope="col">Float32</td>
363+ <td style="text-align: center; vertical-align: middle" scope="col">64</td>
364+ <td style="text-align: center; vertical-align: middle" scope="col">1.08x</td>
365+ <td style="text-align: center; vertical-align: middle" scope="col">1</td>
366+ <td style="text-align: center; vertical-align: middle" scope="col">1.09x</td>
367+ <td style="text-align: center; vertical-align: middle" scope="col">NLP</td>
368+ <td style="text-align: center; vertical-align: middle" scope="col">MRPC</td>
369+ <td style="text-align: center; vertical-align: middle" scope="col">max_seq_len=128<br />Task: Text Classification</td>
370+ <td style="text-align: center; vertical-align: middle" scope="col">Jemalloc;<br />Intel(R) OpenMP;<br /><a href="https://github.com/IntelAI/models/tree/pytorch-r1.10-models/quickstart/language_modeling/pytorch/bert_base/inference/cpu">inference scripts</a>;<br />Recommend to set auto_kernel_selection to ON when seq_len exceeds 128</td>
371+ </tr >
372+ </tbody >
373+ </table >
374+
375+ <br />
376+ <sup >1. <a href =" https://github.com/IntelAI/models/tree/pytorch-r1.11-models " >Model Zoo for Intel® Architecture</a ></sup >
377+ <br />
378+ <sup >2. Throughput inference runs with single instance per socket.</sup >
379+ <br />
380+ <sup >3. Realtime inference runs with multiple instances, 4 cores per instance.</sup >
381+ <br />
382+
383+ * Note:* Performance numbers with stock PyTorch are measured with its most performant configuration.
384+
385+ * Note:* Environment variable * DNNL_PRIMITIVE_CACHE_CAPACITY* is set to * 1024* .
386+
387+ ### Configuration
388+
389+ #### Software Version
390+
391+ | Software | Version |
392+ | :-: | :-: |
393+ | PyTorch | [ v1.11.0] ( https://pytorch.org/get-started/locally/ ) |
394+ | Intel® Extension for PyTorch\* | [ v1.11.200] ( https://github.com/intel/intel-extension-for-pytorch/releases ) |
395+
256396## FP32 and BFloat16 with v1.10
257397
258398### Performance Numbers
0 commit comments