Skip to content

Commit 88d4474

Browse files
authored
fix a doc hyperlink bug, add aws performance numbers (#965)
1 parent f712c55 commit 88d4474

File tree

2 files changed

+141
-1
lines changed

2 files changed

+141
-1
lines changed

docs/tutorials/features.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -93,7 +93,7 @@ Optimizers are one of key parts of the training workloads. Intel® Extension for
9393
2. SplitSGD for BF16 training, which reduces the memory footprint of the master weights by half.
9494

9595

96-
For more detailed information, check `Optimizer Fusion <features/optimizer_fusion.md>`_ and `Split SGD <features/split_sgd.md>`_
96+
For more detailed information, check `Optimizer Fusion <features/optimizer_fusion.md>`_ and `Split SGD <features/split_sgd.html>`_
9797

9898
.. toctree::
9999
:hidden:

docs/tutorials/performance.md

Lines changed: 140 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -253,6 +253,146 @@ This page shows performance boost with Intel® Extension for PyTorch\* on severa
253253
| Docker OS | Ubuntu 18.04.5 LTS |
254254
| [Spectre-Meltdown Mitigation](https://github.com/speed47/spectre-meltdown-checker) | Mitigated |
255255

256+
## FP32 with v1.11.200 on an AWS EC2 C6i.2xlarge instance
257+
258+
### Performance Numbers
259+
260+
<table border="1" cellpadding="10" align="center" class="perf_table">
261+
<tbody>
262+
<col>
263+
<col>
264+
<col>
265+
<colgroup span="2"></colgroup>
266+
<colgroup span="2"></colgroup>
267+
<col>
268+
<col>
269+
<col>
270+
<tr>
271+
<th rowspan="2" scope="col">Hardware</th>
272+
<th rowspan="2" scope="col">Workload<sup>1</sup></th>
273+
<th rowspan="2" scope="col">Precision</th>
274+
<th colspan="2" scope="colgroup">Throughput Inference<sup>2</sup></th>
275+
<th colspan="2" scope="colgroup">Real-time Inference<sup>3</sup></th>
276+
<th rowspan="2" scope="col">Model Type</th>
277+
<th rowspan="2" scope="col">Dataset</th>
278+
<th rowspan="2" scope="col">Input Data Shape</th>
279+
<th rowspan="2" scope="col">Tunable Parameters</th>
280+
</tr>
281+
<tr>
282+
<th scope="col">Batch Size</th>
283+
<th scope="col">Boost Ratio</th>
284+
<th scope="col">Batch Size</th>
285+
<th scope="col">Boost Ratio</th>
286+
</tr>
287+
<tr>
288+
<td style="text-align: center; vertical-align: middle" rowspan="10" scope="col">AWS EC2 C6i.2xlarge</td>
289+
<td style="text-align: center; vertical-align: middle" scope="col">ResNet50</td>
290+
<td style="text-align: center; vertical-align: middle" scope="col">Float32</td>
291+
<td style="text-align: center; vertical-align: middle" scope="col">64</td>
292+
<td style="text-align: center; vertical-align: middle" scope="col">1.24x</td>
293+
<td style="text-align: center; vertical-align: middle" scope="col">1</td>
294+
<td style="text-align: center; vertical-align: middle" scope="col">1.31x</td>
295+
<td style="text-align: center; vertical-align: middle" scope="col">Computer Vision</td>
296+
<td style="text-align: center; vertical-align: middle" scope="col">ImageNet</td>
297+
<td style="text-align: center; vertical-align: middle" scope="col">Input shape<br />[3, 224, 224]</td>
298+
<td style="text-align: center; vertical-align: middle" scope="col">Default memory allocator;<br />Intel(R) OpenMP;<br /><a href="https://github.com/IntelAI/models/tree/pytorch-r1.10-models/quickstart/image_recognition/pytorch/resnet50/inference/cpu">inference scripts</a></td>
299+
</tr>
300+
<tr>
301+
<td style="text-align: center; vertical-align: middle" scope="col">ResNext 32x16d</td>
302+
<td style="text-align: center; vertical-align: middle" scope="col">Float32</td>
303+
<td style="text-align: center; vertical-align: middle" scope="col">64</td>
304+
<td style="text-align: center; vertical-align: middle" scope="col">1.07x</td>
305+
<td style="text-align: center; vertical-align: middle" scope="col">1</td>
306+
<td style="text-align: center; vertical-align: middle" scope="col">1.05x</td>
307+
<td style="text-align: center; vertical-align: middle" scope="col">Computer Vision</td>
308+
<td style="text-align: center; vertical-align: middle" scope="col">ImageNet</td>
309+
<td style="text-align: center; vertical-align: middle" scope="col">Input shape<br />[3, 224, 224]</td>
310+
<td style="text-align: center; vertical-align: middle" scope="col">Default memory allocator;<br />Intel(R) OpenMP;<br /><a href="https://github.com/IntelAI/models/tree/pytorch-r1.10-models/quickstart/image_recognition/pytorch/resnext-32x16d/inference/cpu">inference scripts</a></td>
311+
</tr>
312+
<tr>
313+
<td style="text-align: center; vertical-align: middle" scope="col">VGG-11</td>
314+
<td style="text-align: center; vertical-align: middle" scope="col">Float32</td>
315+
<td style="text-align: center; vertical-align: middle" scope="col">64</td>
316+
<td style="text-align: center; vertical-align: middle" scope="col">1.15x</td>
317+
<td style="text-align: center; vertical-align: middle" scope="col">1</td>
318+
<td style="text-align: center; vertical-align: middle" scope="col">1.21x</td>
319+
<td style="text-align: center; vertical-align: middle" scope="col">Computer Vision</td>
320+
<td style="text-align: center; vertical-align: middle" scope="col">ImageNet</td>
321+
<td style="text-align: center; vertical-align: middle" scope="col">Input shape<br />[3, 224, 224]</td>
322+
<td style="text-align: center; vertical-align: middle" scope="col">Default memory allocator;<br />Intel(R) OpenMP;<br /><a href="https://github.com/IntelAI/models/tree/pytorch-r1.10-models/quickstart/image_recognition/pytorch/vgg11/inference/cpu">inference scripts</a></td>
323+
</tr>
324+
<tr>
325+
<td style="text-align: center; vertical-align: middle" scope="col">ShuffleNetv2_x1.0</td>
326+
<td style="text-align: center; vertical-align: middle" scope="col">Float32</td>
327+
<td style="text-align: center; vertical-align: middle" scope="col">64</td>
328+
<td style="text-align: center; vertical-align: middle" scope="col">1.12x</td>
329+
<td style="text-align: center; vertical-align: middle" scope="col">1</td>
330+
<td style="text-align: center; vertical-align: middle" scope="col">1.30x</td>
331+
<td style="text-align: center; vertical-align: middle" scope="col">Computer Vision</td>
332+
<td style="text-align: center; vertical-align: middle" scope="col">ImageNet</td>
333+
<td style="text-align: center; vertical-align: middle" scope="col">Input shape<br />[3, 224, 224]</td>
334+
<td style="text-align: center; vertical-align: middle" scope="col">Default memory allocator;<br />Intel(R) OpenMP;</td>
335+
</tr>
336+
<tr>
337+
<td style="text-align: center; vertical-align: middle" scope="col">MobileNet v2</td>
338+
<td style="text-align: center; vertical-align: middle" scope="col">Float32</td>
339+
<td style="text-align: center; vertical-align: middle" scope="col">64</td>
340+
<td style="text-align: center; vertical-align: middle" scope="col">1.08x</td>
341+
<td style="text-align: center; vertical-align: middle" scope="col">1</td>
342+
<td style="text-align: center; vertical-align: middle" scope="col">1.12x</td>
343+
<td style="text-align: center; vertical-align: middle" scope="col">Computer Vision</td>
344+
<td style="text-align: center; vertical-align: middle" scope="col">ImageNet</td>
345+
<td style="text-align: center; vertical-align: middle" scope="col">Input shape<br />[3, 224, 224]</td>
346+
<td style="text-align: center; vertical-align: middle" scope="col">Default memory allocator;<br />Intel(R) OpenMP;</td>
347+
</tr>
348+
<tr>
349+
<td style="text-align: center; vertical-align: middle" scope="col">BERT-Large</td>
350+
<td style="text-align: center; vertical-align: middle" scope="col">Float32</td>
351+
<td style="text-align: center; vertical-align: middle" scope="col">64</td>
352+
<td style="text-align: center; vertical-align: middle" scope="col">1.05x</td>
353+
<td style="text-align: center; vertical-align: middle" scope="col">1</td>
354+
<td style="text-align: center; vertical-align: middle" scope="col">1.03x</td>
355+
<td style="text-align: center; vertical-align: middle" scope="col">NLP</td>
356+
<td style="text-align: center; vertical-align: middle" scope="col">Squad</td>
357+
<td style="text-align: center; vertical-align: middle" scope="col">max_seq_len=384<br />Task: Question Answering</td>
358+
<td style="text-align: center; vertical-align: middle" scope="col">Default memory allocator;<br />Intel(R) OpenMP;<br /><a href="https://github.com/IntelAI/models/tree/pytorch-r1.10-models/quickstart/language_modeling/pytorch/bert_large/inference/cpu">inference scripts</a>;<br />Recommend to set auto_kernel_selection to ON when seq_len exceeds 64</td>
359+
</tr>
360+
<tr>
361+
<td style="text-align: center; vertical-align: middle" scope="col">Bert-Base</td>
362+
<td style="text-align: center; vertical-align: middle" scope="col">Float32</td>
363+
<td style="text-align: center; vertical-align: middle" scope="col">64</td>
364+
<td style="text-align: center; vertical-align: middle" scope="col">1.08x</td>
365+
<td style="text-align: center; vertical-align: middle" scope="col">1</td>
366+
<td style="text-align: center; vertical-align: middle" scope="col">1.09x</td>
367+
<td style="text-align: center; vertical-align: middle" scope="col">NLP</td>
368+
<td style="text-align: center; vertical-align: middle" scope="col">MRPC</td>
369+
<td style="text-align: center; vertical-align: middle" scope="col">max_seq_len=128<br />Task: Text Classification</td>
370+
<td style="text-align: center; vertical-align: middle" scope="col">Jemalloc;<br />Intel(R) OpenMP;<br /><a href="https://github.com/IntelAI/models/tree/pytorch-r1.10-models/quickstart/language_modeling/pytorch/bert_base/inference/cpu">inference scripts</a>;<br />Recommend to set auto_kernel_selection to ON when seq_len exceeds 128</td>
371+
</tr>
372+
</tbody>
373+
</table>
374+
375+
<br />
376+
<sup>1. <a href="https://github.com/IntelAI/models/tree/pytorch-r1.11-models">Model Zoo for Intel® Architecture</a></sup>
377+
<br />
378+
<sup>2. Throughput inference runs with single instance per socket.</sup>
379+
<br />
380+
<sup>3. Realtime inference runs with multiple instances, 4 cores per instance.</sup>
381+
<br />
382+
383+
*Note:* Performance numbers with stock PyTorch are measured with its most performant configuration.
384+
385+
*Note:* Environment variable *DNNL_PRIMITIVE_CACHE_CAPACITY* is set to *1024*.
386+
387+
### Configuration
388+
389+
#### Software Version
390+
391+
| Software | Version |
392+
| :-: | :-: |
393+
| PyTorch | [v1.11.0](https://pytorch.org/get-started/locally/) |
394+
| Intel® Extension for PyTorch\* | [v1.11.200](https://github.com/intel/intel-extension-for-pytorch/releases) |
395+
256396
## FP32 and BFloat16 with v1.10
257397

258398
### Performance Numbers

0 commit comments

Comments
 (0)