Skip to content

Commit 171343a

Browse files
authored
Correct performance metrics! (#537)
1 parent 24ea4af commit 171343a

File tree

1 file changed

+24
-13
lines changed

1 file changed

+24
-13
lines changed

docs/documentation/expectedPerformance.md

Lines changed: 24 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -6,21 +6,32 @@ This page shows a summary of these results.
66
## Expected time-steps/hour
77

88
The following table outlines observed performance as nanoseconds per grid point (ns/GP) per equation (eq) per right-hand side (rhs) evaluation (lower is better).
9-
We solve an example 3D, inviscid, 5-equation model problem with two advected species (a total of 8 PDEs).
10-
The numerics are WENO5 and the HLLC approximate Riemann solver.
9+
We solve an example 3D, inviscid, 5-equation model problem with two advected species (8 PDEs) and 8M grid points (158-cubed uniform grid).
10+
The numerics are WENO5 finite volume reconstruction and HLLC approximate Riemann solver.
1111
This case is located in `examples/3D_performance_test`.
1212
We report results for various numbers of grid points per CPU die (or GPU device) and hardware.
13-
14-
| Hardware | | 1M GPs | 4M GPs | 8M GPs | Compiler | Computer |
15-
| ---: | :----: | :----: | :---: | :---: | :----: | :--- |
16-
| NVIDIA V100 | 1 device | 12.0 | 13.0 | 13.0 | NVHPC 22.11 | PACE Phoenix |
17-
| NVIDIA V100 | 1 device | 12.6 | 13.0 | 13.0 | NVHPC 22.11 | OLCF Summit |
18-
| NVIDIA A100 | 1 device | 8.9 | 7.0 | 7.4 | NVHPC 23.5 | Wingtip |
19-
| AMD MI250X | 1 GCD | 13.5 | 11.3 | 12 | CCE 16.0.1 | OLCF Frontier |
20-
| Intel Xeon Gold 6226 | 12 cores | 245 | 211 | 211 | GNU 10.3.0 | PACE Phoenix |
21-
| Apple M2 | 6 cores | 365 | 306 | 563 | GNU 13.2.0 | N/A |
22-
23-
__All results are in nanoseconds (ns) per grid point (gp) per equation (eq) per right-hand side (rhs) evaluation, so X ns/gp/eq/rhs. Lower is better.__
13+
Results are for MFC v4.9.3 (July 2024 release), though numbers have not changed meaningfully since then.
14+
All results are for the compiler that gave the best performance.
15+
CPU results may be performed on CPUs with more cores than reported in the table; we report results for the best performance given the full processor die by checking the performance for different core counts on that device.
16+
GPU results on single-precision (SP) GPUs performed computation in double-precision via conversion in compiler/software; these numbers are _not_ for single-precision computation.
17+
AMD MI250X GPUs have two graphics compute dies (GCDs) per MI250X device; we report results for one GCD, though one can quickly estimate full MI250X runtime by halving the single GCD grind time number.
18+
19+
20+
| Hardware | | Grind Time | Compiler | Computer |
21+
| ---: | ----: | :----: | :--- | :--- |
22+
| NVIDIA GH200 (GPU only) | 1 GPU | 0.32 | NVHPC 24.1 | GT Rogues Gallery |
23+
| NVIDIA H100 | 1 GPU | 0.45 | NVHPC 24.5 | GT Rogues Gallery |
24+
| NVIDIA A100 | 1 GPU | 0.62 | NVHPC 22.11 | GT Phoenix |
25+
| NVIDIA V100 | 1 GPU | 0.99 | NVHPC 22.11 | GT Phoenix |
26+
| NVIDIA A30 | 1 GPU | 1.06 | NVHPC 24.1 | GT Rogues Gallery |
27+
| AMD MI250X | 1 __GCD__ | 1.09 | CCE 16.0.1 | OLCF Frontier |
28+
| NVIDIA A40 (SP GPU) | 1 GPU | 3.3 | NVHPC 22.11 | NCSA Delta |
29+
| NVIDIA RTX6000 (SP GPU) | 1 GPU | 3.9 | NVHPC 22.11 | GT Phoenix |
30+
| Apple M1 Max | 8 cores | 72 | GNU 14.1.0 | N/A |
31+
| AMD EPYC 7713 | 32 cores | 137 | GNU 12.1.0 | GT Phoenix |
32+
| Intel Xeon Gold 6226 | 12 cores | 152 | Intel oneAPI 2022.1 | GT Phoenix |
33+
34+
__All grind times are in nanoseconds (ns) per grid point (gp) per equation (eq) per right-hand side (rhs) evaluation, so X ns/gp/eq/rhs. Lower is better.__
2435

2536
## Weak scaling
2637

0 commit comments

Comments
 (0)