Update expectedPerformance.md (#539)

sbryngelson · web-flow · commit 6c18780cfdc8 · 2024-07-28T14:44:37.000-04:00
diff --git a/docs/documentation/expectedPerformance.md b/docs/documentation/expectedPerformance.md
@@ -1,22 +1,23 @@
-# Performance Results
+# Performance
 
 MFC has been benchmarked on several CPUs and GPU devices.
 This page shows a summary of these results.
 
-## Expected time-steps/hour
+## Figure of merit: Grind time performance
 
-The following table outlines observed performance as nanoseconds per grid point (ns/GP) per equation (eq) per right-hand side (rhs) evaluation (lower is better).
+The following table outlines observed performance as nanoseconds per grid point (ns/GP) per equation (eq) per right-hand side (rhs) evaluation (lower is better), also known as the grind time.
 We solve an example 3D, inviscid, 5-equation model problem with two advected species (8 PDEs) and 8M grid points (158-cubed uniform grid).
 The numerics are WENO5 finite volume reconstruction and HLLC approximate Riemann solver.
 This case is located in `examples/3D_performance_test`.
-We report results for various numbers of grid points per CPU die (or GPU device) and hardware.
+
 Results are for MFC v4.9.3 (July 2024 release), though numbers have not changed meaningfully since then.
 All results are for the compiler that gave the best performance.
-CPU results may be performed on CPUs with more cores than reported in the table; we report results for the best performance given the full processor die by checking the performance for different core counts on that device.
-GPU results on single-precision (SP) GPUs performed computation in double-precision via conversion in compiler/software; these numbers are _not_ for single-precision computation.
+Note:
+* CPU results may be performed on CPUs with more cores than reported in the table; we report results for the best performance given the full processor die by checking the performance for different core counts on that device.
+These are reported as (X/Y cores), where X is the used cores, and Y is the total on the die.
+* GPU results on single-precision (SP) GPUs performed computation in double-precision via conversion in compiler/software; these numbers are _not_ for single-precision computation.
 AMD MI250X GPUs have two graphics compute dies (GCDs) per MI250X device; we report results for one GCD, though one can quickly estimate full MI250X runtime by halving the single GCD grind time number.
 
-
 | Hardware                  |            |   Grind Time   |    Compiler    |   Computer   |
 | ---:                      | ----:      |    :----:      |  :---         | :---         | 
 | NVIDIA GH200 (GPU only)   | 1 GPU          | 0.32       | NVHPC 24.1           | GT Rogues Gallery  |
@@ -27,9 +28,11 @@ AMD MI250X GPUs have two graphics compute dies (GCDs) per MI250X device; we repo
 | AMD MI250X                | 1 __GCD__      | 1.09       | CCE 16.0.1           | OLCF Frontier |
 | NVIDIA A40 (SP GPU)       | 1 GPU          | 3.3        | NVHPC 22.11          | NCSA Delta  |
 | NVIDIA RTX6000 (SP GPU)   | 1 GPU          | 3.9        | NVHPC 22.11          | GT Phoenix  |
-| Apple M1 Max              | 8 cores        | 72         | GNU 14.1.0           | N/A         |
-| AMD EPYC 7713             | 32 cores       | 137        | GNU 12.1.0           | GT Phoenix  |
-| Intel Xeon Gold 6226      | 12 cores       | 152        | Intel oneAPI 2022.1  | GT Phoenix  |
+| Apple M1 Max                                 | 8/10 cores      | 72         | GNU 14.1.0           | N/A         |
+| AMD EPYC 9534 (Genoa)                        | 64/64 cores     | 96         | GNU 12.3.0           | GT Phoenix  |
+| Intel Xeon Gold 6454S (Sapphire Rapids)      | 16/32 cores     | 111        | NVHPC 24.5           | GT Rogues Gallery  |
+| AMD EPYC 7713 (Milan)                        | 32/64 cores     | 137        | GNU 12.1.0           | GT Phoenix  |
+| Intel Xeon Gold 6226 (Cascade Lake)          | 12/12 cores     | 152        | Intel oneAPI 2022.1  | GT Phoenix  |
 
 __All grind times are in nanoseconds (ns) per grid point (gp) per equation (eq) per right-hand side (rhs) evaluation, so X ns/gp/eq/rhs. Lower is better.__
 
diff --git a/docs/documentation/readme.md b/docs/documentation/readme.md
@@ -8,7 +8,7 @@
 - [Example Cases](examples.md)
 - [Running MFC](running.md)
 - [Flow Visualization](visualization.md)
-- [Performance Results](expectedPerformance.md)
+- [Performance](expectedPerformance.md)
 - [MFC's Authors](authors.md)
 - [References](references.md)