Skip to content

Commit 12b127c

Browse files
authored
add Sapphire Rapids Max HBM to benchmarks (#617)
1 parent 2651071 commit 12b127c

File tree

1 file changed

+36
-35
lines changed

1 file changed

+36
-35
lines changed

docs/documentation/expectedPerformance.md

Lines changed: 36 additions & 35 deletions
Original file line numberDiff line numberDiff line change
@@ -20,41 +20,42 @@ Note:
2020
These are reported as (X/Y cores), where X is the used cores, and Y is the total on the die.
2121
* GPU results are for a single GPU device. For single-precision (SP) GPUs, we performed computation in double-precision via conversion in compiler/software; these numbers are _not_ for single-precision computation. AMD MI250X and MI300A GPUs have multiple graphics compute dies (GCDs) per device; we report results for one _GCD_*, though one can quickly estimate full device runtime by dividing the grind time number by the number of GCDs on the device (the MI250X has 2 GCDs). We gratefully acknowledge the permission of LLNL, HPE/Cray, and AMD for permission to release MI300A performance numbers.
2222

23-
| Hardware | Details | Type | Usage | Grind Time [ns] | Compiler | Computer |
24-
| ---: | ----: | ----: | ----: | ----: | :--- | :--- |
25-
| NVIDIA GH200 | GPU only | APU | 1 GPU | 0.32 | NVHPC 24.1 | GT Rogues Gallery |
26-
| NVIDIA H100 | | GPU | 1 GPU | 0.45 | NVHPC 24.5 | GT Rogues Gallery |
27-
| AMD MI300A | | APU | 1 _GCD_* | 0.60 | CCE 18.0.0 | LLNL Tioga |
28-
| NVIDIA A100 | | GPU | 1 GPU | 0.62 | NVHPC 22.11 | GT Phoenix |
29-
| NVIDIA V100 | | GPU | 1 GPU | 0.99 | NVHPC 22.11 | GT Phoenix |
30-
| NVIDIA A30 | | GPU | 1 GPU | 1.1 | NVHPC 24.1 | GT Rogues Gallery |
31-
| AMD MI250X | | GPU | 1 _GCD_* | 1.1 | CCE 16.0.1 | OLCF Frontier |
32-
| AMD MI100 | | GPU | 1 GPU | 1.4 | CCE 16.0.1 | Cray internal system |
33-
| NVIDIA L40S | Single-precision GPU | GPU | 1 GPU | 1.7 | NVHPC 24.5 | GT ICE |
34-
| AMD EPYC 9654 | Genoa | CPU | 96/96 cores | 1.7 | Intel oneAPI 2021.9 | DOD Carpenter |
35-
| NVIDIA P100 | | GPU | 1 GPU | 2.4 | NVHPC 23.5 | GT CSE Internal |
36-
| AMD EPYC 9534 | Genoa | CPU | 64/64 cores | 2.7 | GNU 12.3.0 | GT Phoenix |
37-
| NVIDIA A40 | Single-precision GPU | GPU | 1 GPU | 3.3 | NVHPC 22.11 | NCSA Delta |
38-
| NVIDIA Grace CPU | Arm, Neoverse V2 | CPU | 72/72 cores | 3.7 | NVHPC 24.1 | GT Rogues Gallery |
39-
| NVIDIA RTX6000 | Single-precision GPU | GPU | 1 GPU | 3.9 | NVHPC 22.11 | GT Phoenix |
40-
| AMD EPYC 7763 | Milan | CPU | 64/64 cores | 4.1 | GNU 11.4.0 | NCSA Delta |
41-
| AMD EPYC 7713 | Milan | CPU | 64/64 cores | 5.0 | GNU 12.3.0 | GT Phoenix |
42-
| Intel Xeon 8480CL | Sapphire Rapids | CPU | 56/56 cores | 5.0 | NVHPC 24.5 | GT Phoenix |
43-
| Intel Xeon 6454S | Sapphire Rapids | CPU | 32/32 cores | 5.6 | NVHPC 24.5 | GT Rogues Gallery |
44-
| Intel Xeon 8462Y+ | Sapphire Rapids | CPU | 32/32 cores | 6.2 | GNU 12.3.0 | GT ICE |
45-
| Intel Xeon 6548Y+ | Emerald Rapids | CPU | 32/32 cores | 6.6 | Intel oneAPI 2021.9 | GT ICE |
46-
| Intel Xeon 8352Y | Ice Lake | CPU | 32/32 cores | 6.6 | NVHPC 24.5 | GT Rogues Gallery |
47-
| Ampere Altra Q80-28 | Arm, Neoverse-N1 | CPU | 80/80 cores | 6.8 | GNU 12.2.0 | OLCF Wombat |
48-
| AMD EPYC 7513 | Milan | CPU | 32/32 cores | 7.4 | GNU 12.3.0 | GT ICE |
49-
| AMD EPYC 7452 | Rome | CPU | 32/32 cores | 8.4 | GNU 12.3.0 | GT ICE |
50-
| IBM Power10 | | CPU | 24/24 cores | 10 | GNU 13.3.1 | GT Rogues Gallery |
51-
| AMD EPYC 7401 | Naples | CPU | 24/24 cores | 10 | GNU 10.3.1 | LLNL Corona |
52-
| Apple M1 Pro | | CPU | 8/10 cores | 14 | GNU 13.2.0 | N/A |
53-
| Intel Xeon 6226 | Cascade Lake | CPU | 12/12 cores | 17 | GNU 12.3.0 | GT ICE |
54-
| Apple M1 Max | | CPU | 8/10 cores | 18 | GNU 14.1.0 | N/A |
55-
| IBM Power9 | | CPU | 20/21 cores | 21 | GNU 9.1.0 | OLCF Summit |
56-
| Intel Xeon E5-2650V4 | Broadwell | CPU | 12/12 cores | 27 | NVHPC 23.5 | GT CSE Internal |
57-
| Intel Xeon E7-4850V3 | Haswell | CPU | 14/14 cores | 34 | GNU 9.4.0 | GT CSE Internal |
23+
| Hardware | Details | Type | Usage | Grind Time [ns] | Compiler | Computer |
24+
| ---: | ----: | ----: | ----: | ----: | :--- | :--- |
25+
| NVIDIA GH200 | GPU only | APU | 1 GPU | 0.32 | NVHPC 24.1 | GT Rogues Gallery |
26+
| NVIDIA H100 | | GPU | 1 GPU | 0.45 | NVHPC 24.5 | GT Rogues Gallery |
27+
| AMD MI300A | | APU | 1 _GCD_* | 0.60 | CCE 18.0.0 | LLNL Tioga |
28+
| NVIDIA A100 | | GPU | 1 GPU | 0.62 | NVHPC 22.11 | GT Phoenix |
29+
| NVIDIA V100 | | GPU | 1 GPU | 0.99 | NVHPC 22.11 | GT Phoenix |
30+
| NVIDIA A30 | | GPU | 1 GPU | 1.1 | NVHPC 24.1 | GT Rogues Gallery |
31+
| AMD MI250X | | GPU | 1 _GCD_* | 1.1 | CCE 16.0.1 | OLCF Frontier |
32+
| AMD MI100 | | GPU | 1 GPU | 1.4 | CCE 16.0.1 | Cray internal system |
33+
| NVIDIA L40S | FP32-only GPU | GPU | 1 GPU | 1.7 | NVHPC 24.5 | GT ICE |
34+
| AMD EPYC 9654 | Genoa | CPU | 96 cores | 1.7 | Intel 2021.9 | DOD Carpenter |
35+
| NVIDIA P100 | | GPU | 1 GPU | 2.4 | NVHPC 23.5 | GT CSE Internal |
36+
| AMD EPYC 9534 | Genoa | CPU | 64 cores | 2.7 | GNU 12.3.0 | GT Phoenix |
37+
| NVIDIA A40 | FP32-only GPU | GPU | 1 GPU | 3.3 | NVHPC 22.11 | NCSA Delta |
38+
| Intel Xeon Max 9468 | Sapphire Rapids HBM | CPU | 48 cores | 3.5 | NVHPC 24.5 | GT Rogues Gallery |
39+
| NVIDIA Grace CPU | Arm, Neoverse V2 | CPU | 72 cores | 3.7 | NVHPC 24.1 | GT Rogues Gallery |
40+
| NVIDIA RTX6000 | FP32-only GPU | GPU | 1 GPU | 3.9 | NVHPC 22.11 | GT Phoenix |
41+
| AMD EPYC 7763 | Milan | CPU | 64 cores | 4.1 | GNU 11.4.0 | NCSA Delta |
42+
| AMD EPYC 7713 | Milan | CPU | 64 cores | 5.0 | GNU 12.3.0 | GT Phoenix |
43+
| Intel Xeon 8480CL | Sapphire Rapids | CPU | 56 cores | 5.0 | NVHPC 24.5 | GT Phoenix |
44+
| Intel Xeon 6454S | Sapphire Rapids | CPU | 32 cores | 5.6 | NVHPC 24.5 | GT Rogues Gallery |
45+
| Intel Xeon 8462Y+ | Sapphire Rapids | CPU | 32 cores | 6.2 | GNU 12.3.0 | GT ICE |
46+
| Intel Xeon 6548Y+ | Emerald Rapids | CPU | 32 cores | 6.6 | Intel 2021.9 | GT ICE |
47+
| Intel Xeon 8352Y | Ice Lake | CPU | 32 cores | 6.6 | NVHPC 24.5 | GT Rogues Gallery |
48+
| Ampere Altra Q80-28 | Arm, Neoverse-N1 | CPU | 80 cores | 6.8 | GNU 12.2.0 | OLCF Wombat |
49+
| AMD EPYC 7513 | Milan | CPU | 32 cores | 7.4 | GNU 12.3.0 | GT ICE |
50+
| AMD EPYC 7452 | Rome | CPU | 32 cores | 8.4 | GNU 12.3.0 | GT ICE |
51+
| IBM Power10 | | CPU | 24 cores | 10 | GNU 13.3.1 | GT Rogues Gallery |
52+
| AMD EPYC 7401 | Naples | CPU | 24 cores | 10 | GNU 10.3.1 | LLNL Corona |
53+
| Apple M1 Pro | | CPU | 8 cores | 14 | GNU 13.2.0 | N/A |
54+
| Intel Xeon 6226 | Cascade Lake | CPU | 12 cores | 17 | GNU 12.3.0 | GT ICE |
55+
| Apple M1 Max | | CPU | 8 cores | 18 | GNU 14.1.0 | N/A |
56+
| IBM Power9 | | CPU | 20 cores | 21 | GNU 9.1.0 | OLCF Summit |
57+
| Intel Xeon E5-2650V4 | Broadwell | CPU | 12 cores | 27 | NVHPC 23.5 | GT CSE Internal |
58+
| Intel Xeon E7-4850V3 | Haswell | CPU | 14 cores | 34 | GNU 9.4.0 | GT CSE Internal |
5859

5960
__All grind times are in nanoseconds (ns) per grid point (gp) per equation (eq) per right-hand side (rhs) evaluation, so X ns/gp/eq/rhs. Lower is better.__
6061

0 commit comments

Comments
 (0)