@@ -20,38 +20,38 @@ Note:
2020These are reported as (X/Y cores), where X is the used cores, and Y is the total on the die.
2121* GPU results are for a single GPU device. For single-precision (SP) GPUs, we performed computation in double-precision via conversion in compiler/software; these numbers are _ not_ for single-precision computation. AMD MI250X and MI300A GPUs have multiple graphics compute dies (GCDs) per device; we report results for one _ GCD_ * , though one can quickly estimate full device runtime by dividing the grind time number by the number of GCDs on the device (the MI250X has 2 GCDs). We gratefully acknowledge the permission of LLNL, HPE/Cray, and AMD for permission to release MI300A performance numbers.
2222
23- | Hardware | | Grind Time [ ns] | Compiler | Computer |
24- | ---: | ----: | ----: | :--- | :--- |
25- | NVIDIA GH200 ( GPU only) | 1 GPU | 0.32 | NVHPC 24.1 | GT Rogues Gallery |
26- | NVIDIA H100 | 1 GPU | 0.45 | NVHPC 24.5 | GT Rogues Gallery |
27- | AMD MI300A | 1 _ GCD_ * | 0.60 | CCE 18.0.0 | LLNL Tioga |
28- | NVIDIA A100 | 1 GPU | 0.62 | NVHPC 22.11 | GT Phoenix |
29- | NVIDIA V100 | 1 GPU | 0.99 | NVHPC 22.11 | GT Phoenix |
30- | NVIDIA A30 | 1 GPU | 1.1 | NVHPC 24.1 | GT Rogues Gallery |
31- | AMD MI250X | 1 _ GCD_ * | 1.1 | CCE 16.0.1 | OLCF Frontier |
32- | AMD MI100 | 1 GPU | 1.4 | CCE 16.0.1 | Cray internal system |
33- | NVIDIA L40S (SP GPU) | 1 GPU | 1.7 | NVHPC 24.5 | GT ICE |
34- | NVIDIA P100 | 1 GPU | 2.4 | NVHPC 23.5 | GT CSE Internal |
35- | AMD EPYC 9534 ( Genoa) | 64/64 cores | 2.7 | GNU 12.3.0 | GT Phoenix |
36- | NVIDIA A40 (SP GPU) | 1 GPU | 3.3 | NVHPC 22.11 | NCSA Delta |
37- | NVIDIA Grace CPU ( Arm, Neoverse V2) | 72/72 cores | 3.7 | NVHPC 24.1 | GT Rogues Gallery |
38- | NVIDIA RTX6000 (SP GPU) | 1 GPU | 3.9 | NVHPC 22.11 | GT Phoenix |
39- | AMD EPYC 7763 ( Milan) | 64/64 cores | 4.1 | GNU 11.4.0 | NCSA Delta |
40- | AMD EPYC 7713 ( Milan) | 64/64 cores | 5.0 | GNU 12.3.0 | GT Phoenix |
41- | Intel Xeon Platinum 8480CL ( Sapphire Rapids) | 56/56 cores | 5.0 | NVHPC 24.5 | GT Phoenix |
42- | Intel Xeon Gold 6454S ( Sapphire Rapids) | 32/32 cores | 5.6 | NVHPC 24.5 | GT Rogues Gallery |
43- | Intel Xeon Platinum 8462Y+ ( Sapphire Rapids) | 32/32 cores | 6.2 | GNU 12.3.0 | GT ICE |
44- | Intel Xeon Gold 6548Y+ ( Emerald Rapids) | 32/32 cores | 6.6 | Intel oneAPI 2021.9 | GT ICE |
45- | Intel Xeon Platinum 8352Y ( Ice Lake) | 32/32 cores | 6.6 | NVHPC 24.5 | GT Rogues Gallery |
46- | Ampere Altra Max Q80-28 ( Arm, Neoverse-N1) | 80/80 cores | 6.8 | GNU 12.2.0 | OLCF Wombat |
47- | AMD EPYC 7513 ( Milan) | 32/32 cores | 7.4 | GNU 12.3.0 | GT ICE |
48- | AMD EPYC 7452 ( Rome) | 32/32 cores | 8.4 | GNU 12.3.0 | GT ICE |
49- | AMD EPYC 7401 ( Naples) | 24/24 cores | 10 | GNU 10.3.1 | LLNL Corona |
50- | Apple M1 Pro | 8/10 cores | 14 | GNU 13.2.0 | N/A |
51- | Intel Xeon Gold 6226 ( Cascade Lake) | 12/12 cores | 17 | GNU 12.3.0 | GT ICE |
52- | Apple M1 Max | 8/10 cores | 18 | GNU 14.1.0 | N/A |
53- | IBM Power9 | 20/21 cores | 21 | GNU 9.1.0 | OLCF Summit |
54- | Intel Xeon E5-2650V4 ( Broadwell) | 12/12 cores | 27 | NVHPC 23.5 | GT CSE Internal |
23+ | Hardware | Details | Type | Usage | Grind Time [ ns] | Compiler | Computer |
24+ | ---: | ----: | ----: | ----: | ----: | :--- | :--- |
25+ | NVIDIA GH200 | GPU only | APU | 1 GPU | 0.32 | NVHPC 24.1 | GT Rogues Gallery |
26+ | NVIDIA H100 | | GPU | 1 GPU | 0.45 | NVHPC 24.5 | GT Rogues Gallery |
27+ | AMD MI300A | | APU | 1 _ GCD_ * | 0.60 | CCE 18.0.0 | LLNL Tioga |
28+ | NVIDIA A100 | | GPU | 1 GPU | 0.62 | NVHPC 22.11 | GT Phoenix |
29+ | NVIDIA V100 | | GPU | 1 GPU | 0.99 | NVHPC 22.11 | GT Phoenix |
30+ | NVIDIA A30 | | GPU | 1 GPU | 1.1 | NVHPC 24.1 | GT Rogues Gallery |
31+ | AMD MI250X | | GPU | 1 _ GCD_ * | 1.1 | CCE 16.0.1 | OLCF Frontier |
32+ | AMD MI100 | | GPU | 1 GPU | 1.4 | CCE 16.0.1 | Cray internal system |
33+ | NVIDIA L40S | Single-precision GPU | GPU | 1 GPU | 1.7 | NVHPC 24.5 | GT ICE |
34+ | NVIDIA P100 | | GPU | 1 GPU | 2.4 | NVHPC 23.5 | GT CSE Internal |
35+ | AMD EPYC 9534 | Genoa | CPU | 64/64 cores | 2.7 | GNU 12.3.0 | GT Phoenix |
36+ | NVIDIA A40 | Single-precision GPU | GPU | 1 GPU | 3.3 | NVHPC 22.11 | NCSA Delta |
37+ | NVIDIA Grace CPU | Arm, Neoverse V2 | CPU | 72/72 cores | 3.7 | NVHPC 24.1 | GT Rogues Gallery |
38+ | NVIDIA RTX6000 | Single-precision GPU | GPU | 1 GPU | 3.9 | NVHPC 22.11 | GT Phoenix |
39+ | AMD EPYC 7763 | Milan | CPU | 64/64 cores | 4.1 | GNU 11.4.0 | NCSA Delta |
40+ | AMD EPYC 7713 | Milan | CPU | 64/64 cores | 5.0 | GNU 12.3.0 | GT Phoenix |
41+ | Intel Xeon 8480CL | Platinum, Sapphire Rapids | CPU | 56/56 cores | 5.0 | NVHPC 24.5 | GT Phoenix |
42+ | Intel Xeon 6454S | Gold, Sapphire Rapids | CPU | 32/32 cores | 5.6 | NVHPC 24.5 | GT Rogues Gallery |
43+ | Intel Xeon 8462Y+ | Platinum, Sapphire Rapids | CPU | 32/32 cores | 6.2 | GNU 12.3.0 | GT ICE |
44+ | Intel Xeon 6548Y+ | Gold, Emerald Rapids | CPU | 32/32 cores | 6.6 | Intel oneAPI 2021.9 | GT ICE |
45+ | Intel Xeon 8352Y | Platinum, Ice Lake | CPU | 32/32 cores | 6.6 | NVHPC 24.5 | GT Rogues Gallery |
46+ | Ampere Altra Q80-28 | Arm, Neoverse-N1 | CPU | 80/80 cores | 6.8 | GNU 12.2.0 | OLCF Wombat |
47+ | AMD EPYC 7513 | Milan | CPU | 32/32 cores | 7.4 | GNU 12.3.0 | GT ICE |
48+ | AMD EPYC 7452 | Rome | CPU | 32/32 cores | 8.4 | GNU 12.3.0 | GT ICE |
49+ | AMD EPYC 7401 | Naples | CPU | 24/24 cores | 10 | GNU 10.3.1 | LLNL Corona |
50+ | Apple M1 Pro | | CPU | 8/10 cores | 14 | GNU 13.2.0 | N/A |
51+ | Intel Xeon Gold 6226 | Cascade Lake | CPU | 12/12 cores | 17 | GNU 12.3.0 | GT ICE |
52+ | Apple M1 Max | | CPU | 8/10 cores | 18 | GNU 14.1.0 | N/A |
53+ | IBM Power9 | | CPU | 20/21 cores | 21 | GNU 9.1.0 | OLCF Summit |
54+ | Intel Xeon E5-2650V4 | Broadwell | CPU | 12/12 cores | 27 | NVHPC 23.5 | GT CSE Internal |
5555
5656__ All grind times are in nanoseconds (ns) per grid point (gp) per equation (eq) per right-hand side (rhs) evaluation, so X ns/gp/eq/rhs. Lower is better.__
5757
0 commit comments