Skip to content

Commit 482f497

Browse files
authored
Overhaul performance numbers (#562)
1 parent 68a1b77 commit 482f497

File tree

1 file changed

+19
-23
lines changed

1 file changed

+19
-23
lines changed

docs/documentation/expectedPerformance.md

Lines changed: 19 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -18,35 +18,31 @@ These are reported as (X/Y cores), where X is the used cores, and Y is the total
1818
* GPU results are for a single GPU device. For single-precision (SP) GPUs, we performed computation in double-precision via conversion in compiler/software; these numbers are _not_ for single-precision computation. AMD MI250X GPUs have two graphics compute dies (GCDs) per MI250X device; we report results for one GCD, though one can quickly estimate full MI250X runtime by halving the single GCD grind time number.
1919

2020
| Hardware | | Grind Time | Compiler | Computer |
21-
| ---: | ----: | ----: | :--- | :--- |
21+
| ---: | ----: | ----: | :--- | :--- |
2222
| NVIDIA GH200 (GPU only) | 1 GPU | 0.32 | NVHPC 24.1 | GT Rogues Gallery |
2323
| NVIDIA H100 | 1 GPU | 0.45 | NVHPC 24.5 | GT Rogues Gallery |
2424
| NVIDIA A100 | 1 GPU | 0.62 | NVHPC 22.11 | GT Phoenix |
2525
| NVIDIA V100 | 1 GPU | 0.99 | NVHPC 22.11 | GT Phoenix |
26-
| NVIDIA A30 | 1 GPU | 1.06 | NVHPC 24.1 | GT Rogues Gallery |
27-
| AMD MI250X | 1 __GCD__ | 1.09 | CCE 16.0.1 | OLCF Frontier |
28-
| AMD MI100 | 1 GPU | 1.38 | CCE 16.0.1 | Cray internal system |
29-
| NVIDIA P100 | 1 GPU | 2.35 | NVHPC 23.5 | GT CSE Internal |
26+
| NVIDIA A30 | 1 GPU | 1.1 | NVHPC 24.1 | GT Rogues Gallery |
27+
| AMD MI250X | 1 __GCD__ | 1.1 | CCE 16.0.1 | OLCF Frontier |
28+
| AMD MI100 | 1 GPU | 1.4 | CCE 16.0.1 | Cray internal system |
29+
| NVIDIA P100 | 1 GPU | 2.4 | NVHPC 23.5 | GT CSE Internal |
30+
| AMD EPYC 9534 (Genoa) | 64/64 cores | 2.7 | GNU 12.3.0 | GT Phoenix |
3031
| NVIDIA A40 (SP GPU) | 1 GPU | 3.3 | NVHPC 22.11 | NCSA Delta |
32+
| NVIDIA Grace CPU (Arm, Neoverse V2) | 72/72 cores | 3.7 | NVHPC 24.1 | GT Rogues Gallery |
3133
| NVIDIA RTX6000 (SP GPU) | 1 GPU | 3.9 | NVHPC 22.11 | GT Phoenix |
32-
| Apple M1 Max | 8/10 cores | 14.3 | GNU 14.1.0 | N/A |
33-
| IBM Power9 | 20/21 cores | 21.2 | GNU 9.1.0 | OLCF Summit |
34-
35-
Processors To-do:
36-
37-
| Hardware | | Grind Time | Compiler | Computer |
38-
| ---: | ----: | ----: | :--- | :--- |
39-
| AMD EPYC 9534 (Genoa) | 64/64 cores | n/a | GNU 12.3.0 | GT Phoenix |
40-
| AMD EPYC 7763 (Milan) | 24/64 cores | n/a | GNU 11.4.0 | NCSA Delta |
41-
| Intel Xeon Platinum 8462Y+ (Sapphire Rapids) | 16/32 cores | n/a | GNU 12.3.0 | GT ICE |
42-
| Intel Xeon Gold 6454S (Sapphire Rapids) | 16/32 cores | n/a | NVHPC 24.5 | GT Rogues Gallery |
43-
| NVIDIA Grace CPU (Arm, Neoverse V2) | 18/72 cores | n/a | NVHPC 24.1 | GT Rogues Gallery |
44-
| AMD EPYC 7452 (Rome) | 16/32 cores | n/a | GNU 12.3.0 | GT ICE |
45-
| Intel Xeon Platinum 8352Y (Ice Lake) | 12/32 cores | n/a | NVHPC 24.5 | GT Rogues Gallery |
46-
| AMD EPYC 7713 (Milan) | 32/64 cores | n/a | GNU 12.1.0 | GT Phoenix |
47-
| Intel Xeon Gold 6226 (Cascade Lake) | 12/12 cores | n/a | Intel oneAPI 2022.1 | GT Phoenix |
48-
| Ampere Altra Max (Arm, Neoverse-N1) | 8/80 cores | n/a | GNU 12.2.0 | OLCF Wombat |
49-
| Intel Xeon E5-2650V4 (Broadwell) | 8/12 cores | n/a | NVHPC 23.5 | GT CSE Internal |
34+
| AMD EPYC 7763 (Milan) | 64/64 cores | 4.1 | GNU 11.4.0 | NCSA Delta |
35+
| AMD EPYC 7713 (Milan) | 64/64 cores | 5.0 | GNU 12.3.0 | GT Phoenix |
36+
| Intel Xeon Gold 6454S (Sapphire Rapids) | 32/32 cores | 5.6 | NVHPC 24.5 | GT Rogues Gallery |
37+
| Intel Xeon Platinum 8462Y+ (Sapphire Rapids) | 32/32 cores | 6.2 | GNU 12.3.0 | GT ICE |
38+
| Intel Xeon Platinum 8352Y (Ice Lake) | 32/32 cores | 6.6 | NVHPC 24.5 | GT Rogues Gallery |
39+
| Ampere Altra Max Q80-28 (Arm, Neoverse-N1) | 80/80 cores | 6.8 | GNU 12.2.0 | OLCF Wombat |
40+
| AMD EPYC 7513 (Milan) | 32/32 cores | 7.4 | GNU 12.3.0 | GT ICE |
41+
| AMD EPYC 7452 (Rome) | 32/32 cores | 8.4 | GNU 12.3.0 | GT ICE |
42+
| Apple M1 Max | 8/10 cores | 14 | GNU 14.1.0 | N/A |
43+
| Intel Xeon Gold 6226 (Cascade Lake) | 12/12 cores | 17 | GNU 12.3.0 | GT ICE |
44+
| IBM Power9 | 20/21 cores | 21 | GNU 9.1.0 | OLCF Summit |
45+
| Intel Xeon E5-2650V4 (Broadwell) | 12/12 cores | 27 | NVHPC 23.5 | GT CSE Internal |
5046

5147
__All grind times are in nanoseconds (ns) per grid point (gp) per equation (eq) per right-hand side (rhs) evaluation, so X ns/gp/eq/rhs. Lower is better.__
5248

0 commit comments

Comments
 (0)