Skip to content

Commit dc00631

Browse files
author
MFC Action
committed
Docs @ 7ec9ef2
1 parent 3bae82e commit dc00631

File tree

1 file changed

+14
-12
lines changed

1 file changed

+14
-12
lines changed

documentation/md_expectedPerformance.html

Lines changed: 14 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -141,7 +141,7 @@ <h1><a class="anchor" id="autotoc_md71"></a>
141141
Figure of merit: Grind time performance</h1>
142142
<p>The following table outlines observed performance as nanoseconds per grid point (ns/gp) per equation (eq) per right-hand side (rhs) evaluation (lower is better), also known as the grind time. We solve an example 3D, inviscid, 5-equation model problem with two advected species (8 PDEs) and 8M grid points (158-cubed uniform grid). The numerics are WENO5 finite volume reconstruction and HLLC approximate Riemann solver. This case is located in <code>examples/3D_performance_test</code>. You can run it via <code>./mfc.sh run -n &lt;num_processors&gt; -j $(nproc) ./examples/3D_performance_test/case.py -t pre_process simulation --case-optimization</code> for CPU cases right after building MFC, which will build an optimized version of the code for this case then execute it. For benchmarking GPU devices, you will likely want to use <code>-n &lt;num_gpus&gt;</code> where <code>&lt;num_gpus&gt;</code> should likely be <code>1</code>. If the above does not work on your machine, see the rest of this documentation for other ways to use the <code>./mfc.sh run</code> command.</p>
143143
<p>Results are for MFC v4.9.3 (July 2024 release), though numbers have not changed meaningfully since then. Similar performance is also seen for other problem configurations, such as the Euler equations (4 PDEs). All results are for the compiler that gave the best performance. Note:</p><ul>
144-
<li>CPU results may be performed on CPUs with more cores than reported in the table; we report results for the best performance given the full processor die by checking the performance for different core counts on that device. CPU results are the best performance we achieved using a single socket (or die). These are reported as (X/Y cores), where X is the used cores, and Y is the total on the die.</li>
144+
<li>CPU results may be performed on CPUs with more cores than reported in the table; we report results for the best performance given the full processor die by checking the performance for different core counts on that device. CPU results are the best performance we achieved using a single socket (or die).</li>
145145
<li>GPU results are for a single GPU device. For single-precision (SP) GPUs, we performed computation in double-precision via conversion in compiler/software; these numbers are <em>not</em> for single-precision computation. AMD MI250X and MI300A GPUs have multiple compute dies per socket; we report results for one <em>GCD</em>* for the MI250X and the entire APU (6 XCDs) for MI300A, though one can quickly estimate full device runtime by dividing the grind time number by the number of GCDs on the device (the MI250X has 2 GCDs). We gratefully acknowledge the permission of LLNL, HPE/Cray, and AMD for permission to release MI300A performance numbers.</li>
146146
</ul>
147147
<table class="markdownTable">
@@ -178,7 +178,7 @@ <h1><a class="anchor" id="autotoc_md71"></a>
178178
<tr class="markdownTableRowOdd">
179179
<td class="markdownTableBodyRight">Intel Xeon 8592+ </td><td class="markdownTableBodyRight">Emerald Rapids </td><td class="markdownTableBodyRight">CPU </td><td class="markdownTableBodyRight">64 cores </td><td class="markdownTableBodyRight">2.6 </td><td class="markdownTableBodyLeft">Intel 2024.2 </td><td class="markdownTableBodyLeft">Intel AI Cloud </td></tr>
180180
<tr class="markdownTableRowEven">
181-
<td class="markdownTableBodyRight">Intel Xeon SF-AP </td><td class="markdownTableBodyRight">Sierra Forest Advanced, 2.8GHz Boost, 384 MiB L3 </td><td class="markdownTableBodyRight">CPU </td><td class="markdownTableBodyRight">192 cores </td><td class="markdownTableBodyRight">2.6 </td><td class="markdownTableBodyLeft">Intel 2024.2 </td><td class="markdownTableBodyLeft">Intel AI Cloud </td></tr>
181+
<td class="markdownTableBodyRight">Intel Xeon 6900E </td><td class="markdownTableBodyRight">Sierra Forest Advanced, 2.8GHz Boost, 384 MiB L3 </td><td class="markdownTableBodyRight">CPU </td><td class="markdownTableBodyRight">192 cores </td><td class="markdownTableBodyRight">2.6 </td><td class="markdownTableBodyLeft">Intel 2024.2 </td><td class="markdownTableBodyLeft">Intel AI Cloud </td></tr>
182182
<tr class="markdownTableRowOdd">
183183
<td class="markdownTableBodyRight">AMD EPYC 9534 </td><td class="markdownTableBodyRight">Genoa </td><td class="markdownTableBodyRight">CPU </td><td class="markdownTableBodyRight">64 cores </td><td class="markdownTableBodyRight">2.7 </td><td class="markdownTableBodyLeft">GNU 12.3.0 </td><td class="markdownTableBodyLeft">GT Phoenix </td></tr>
184184
<tr class="markdownTableRowEven">
@@ -216,26 +216,28 @@ <h1><a class="anchor" id="autotoc_md71"></a>
216216
<tr class="markdownTableRowEven">
217217
<td class="markdownTableBodyRight">NVIDIA T4 </td><td class="markdownTableBodyRight">FP32-only GPU </td><td class="markdownTableBodyRight">GPU </td><td class="markdownTableBodyRight">1 GPU </td><td class="markdownTableBodyRight">8.8 </td><td class="markdownTableBodyLeft">NVHPC 24.1 </td><td class="markdownTableBodyLeft">TAMU Faster </td></tr>
218218
<tr class="markdownTableRowOdd">
219-
<td class="markdownTableBodyRight">IBM Power10 </td><td class="markdownTableBodyRight"></td><td class="markdownTableBodyRight">CPU </td><td class="markdownTableBodyRight">24 cores </td><td class="markdownTableBodyRight">10 </td><td class="markdownTableBodyLeft">GNU 13.3.1 </td><td class="markdownTableBodyLeft">GT Rogues Gallery </td></tr>
219+
<td class="markdownTableBodyRight">Intel Xeon 8160 </td><td class="markdownTableBodyRight">Skylake </td><td class="markdownTableBodyRight">CPU </td><td class="markdownTableBodyRight">24 cores </td><td class="markdownTableBodyRight">8.9 </td><td class="markdownTableBodyLeft">Intel 2024.0 </td><td class="markdownTableBodyLeft">TACC Stampede3 </td></tr>
220220
<tr class="markdownTableRowEven">
221-
<td class="markdownTableBodyRight">AMD EPYC 7401 </td><td class="markdownTableBodyRight">Naples </td><td class="markdownTableBodyRight">CPU </td><td class="markdownTableBodyRight">24 cores </td><td class="markdownTableBodyRight">10 </td><td class="markdownTableBodyLeft">GNU 10.3.1 </td><td class="markdownTableBodyLeft">LLNL Corona </td></tr>
221+
<td class="markdownTableBodyRight">IBM Power10 </td><td class="markdownTableBodyRight"></td><td class="markdownTableBodyRight">CPU </td><td class="markdownTableBodyRight">24 cores </td><td class="markdownTableBodyRight">10 </td><td class="markdownTableBodyLeft">GNU 13.3.1 </td><td class="markdownTableBodyLeft">GT Rogues Gallery </td></tr>
222222
<tr class="markdownTableRowOdd">
223-
<td class="markdownTableBodyRight">Intel Xeon 6226 </td><td class="markdownTableBodyRight">Cascade Lake </td><td class="markdownTableBodyRight">CPU </td><td class="markdownTableBodyRight">12 cores </td><td class="markdownTableBodyRight">17 </td><td class="markdownTableBodyLeft">GNU 12.3.0 </td><td class="markdownTableBodyLeft">GT ICE </td></tr>
223+
<td class="markdownTableBodyRight">AMD EPYC 7401 </td><td class="markdownTableBodyRight">Naples </td><td class="markdownTableBodyRight">CPU </td><td class="markdownTableBodyRight">24 cores </td><td class="markdownTableBodyRight">10 </td><td class="markdownTableBodyLeft">GNU 10.3.1 </td><td class="markdownTableBodyLeft">LLNL Corona </td></tr>
224224
<tr class="markdownTableRowEven">
225-
<td class="markdownTableBodyRight">Apple M1 Max </td><td class="markdownTableBodyRight"></td><td class="markdownTableBodyRight">CPU </td><td class="markdownTableBodyRight">10 cores </td><td class="markdownTableBodyRight">20 </td><td class="markdownTableBodyLeft">GNU 14.1.0 </td><td class="markdownTableBodyLeft">N/A </td></tr>
225+
<td class="markdownTableBodyRight">Intel Xeon 6226 </td><td class="markdownTableBodyRight">Cascade Lake </td><td class="markdownTableBodyRight">CPU </td><td class="markdownTableBodyRight">12 cores </td><td class="markdownTableBodyRight">17 </td><td class="markdownTableBodyLeft">GNU 12.3.0 </td><td class="markdownTableBodyLeft">GT ICE </td></tr>
226226
<tr class="markdownTableRowOdd">
227-
<td class="markdownTableBodyRight">IBM Power9 </td><td class="markdownTableBodyRight"></td><td class="markdownTableBodyRight">CPU </td><td class="markdownTableBodyRight">20 cores </td><td class="markdownTableBodyRight">21 </td><td class="markdownTableBodyLeft">GNU 9.1.0 </td><td class="markdownTableBodyLeft">OLCF Summit </td></tr>
227+
<td class="markdownTableBodyRight">Apple M1 Max </td><td class="markdownTableBodyRight"></td><td class="markdownTableBodyRight">CPU </td><td class="markdownTableBodyRight">10 cores </td><td class="markdownTableBodyRight">20 </td><td class="markdownTableBodyLeft">GNU 14.1.0 </td><td class="markdownTableBodyLeft">N/A </td></tr>
228228
<tr class="markdownTableRowEven">
229-
<td class="markdownTableBodyRight">Cavium ThunderX2 </td><td class="markdownTableBodyRight">Arm </td><td class="markdownTableBodyRight">CPU </td><td class="markdownTableBodyRight">32 cores </td><td class="markdownTableBodyRight">21 </td><td class="markdownTableBodyLeft">GNU 13.2.0 </td><td class="markdownTableBodyLeft">SBU Ookami </td></tr>
229+
<td class="markdownTableBodyRight">IBM Power9 </td><td class="markdownTableBodyRight"></td><td class="markdownTableBodyRight">CPU </td><td class="markdownTableBodyRight">20 cores </td><td class="markdownTableBodyRight">21 </td><td class="markdownTableBodyLeft">GNU 9.1.0 </td><td class="markdownTableBodyLeft">OLCF Summit </td></tr>
230230
<tr class="markdownTableRowOdd">
231-
<td class="markdownTableBodyRight">Arm Cortex-A78AE </td><td class="markdownTableBodyRight">Arm, BlueField3 </td><td class="markdownTableBodyRight">CPU </td><td class="markdownTableBodyRight">16 cores </td><td class="markdownTableBodyRight">25 </td><td class="markdownTableBodyLeft">NVHPC 24.5 </td><td class="markdownTableBodyLeft">GT Rogues Gallery </td></tr>
231+
<td class="markdownTableBodyRight">Cavium ThunderX2 </td><td class="markdownTableBodyRight">Arm </td><td class="markdownTableBodyRight">CPU </td><td class="markdownTableBodyRight">32 cores </td><td class="markdownTableBodyRight">21 </td><td class="markdownTableBodyLeft">GNU 13.2.0 </td><td class="markdownTableBodyLeft">SBU Ookami </td></tr>
232232
<tr class="markdownTableRowEven">
233-
<td class="markdownTableBodyRight">Intel Xeon E5-2650V4 </td><td class="markdownTableBodyRight">Broadwell </td><td class="markdownTableBodyRight">CPU </td><td class="markdownTableBodyRight">12 cores </td><td class="markdownTableBodyRight">27 </td><td class="markdownTableBodyLeft">NVHPC 23.5 </td><td class="markdownTableBodyLeft">GT CSE Internal </td></tr>
233+
<td class="markdownTableBodyRight">Arm Cortex-A78AE </td><td class="markdownTableBodyRight">Arm, BlueField3 </td><td class="markdownTableBodyRight">CPU </td><td class="markdownTableBodyRight">16 cores </td><td class="markdownTableBodyRight">25 </td><td class="markdownTableBodyLeft">NVHPC 24.5 </td><td class="markdownTableBodyLeft">GT Rogues Gallery </td></tr>
234234
<tr class="markdownTableRowOdd">
235-
<td class="markdownTableBodyRight">Apple M2 </td><td class="markdownTableBodyRight"></td><td class="markdownTableBodyRight">CPU </td><td class="markdownTableBodyRight">8 cores </td><td class="markdownTableBodyRight">32 </td><td class="markdownTableBodyLeft">GNU 14.1.0 </td><td class="markdownTableBodyLeft">N/A </td></tr>
235+
<td class="markdownTableBodyRight">Intel Xeon E5-2650V4 </td><td class="markdownTableBodyRight">Broadwell </td><td class="markdownTableBodyRight">CPU </td><td class="markdownTableBodyRight">12 cores </td><td class="markdownTableBodyRight">27 </td><td class="markdownTableBodyLeft">NVHPC 23.5 </td><td class="markdownTableBodyLeft">GT CSE Internal </td></tr>
236236
<tr class="markdownTableRowEven">
237-
<td class="markdownTableBodyRight">Intel Xeon E7-4850V3 </td><td class="markdownTableBodyRight">Haswell </td><td class="markdownTableBodyRight">CPU </td><td class="markdownTableBodyRight">14 cores </td><td class="markdownTableBodyRight">34 </td><td class="markdownTableBodyLeft">GNU 9.4.0 </td><td class="markdownTableBodyLeft">GT CSE Internal </td></tr>
237+
<td class="markdownTableBodyRight">Apple M2 </td><td class="markdownTableBodyRight"></td><td class="markdownTableBodyRight">CPU </td><td class="markdownTableBodyRight">8 cores </td><td class="markdownTableBodyRight">32 </td><td class="markdownTableBodyLeft">GNU 14.1.0 </td><td class="markdownTableBodyLeft">N/A </td></tr>
238238
<tr class="markdownTableRowOdd">
239+
<td class="markdownTableBodyRight">Intel Xeon E7-4850V3 </td><td class="markdownTableBodyRight">Haswell </td><td class="markdownTableBodyRight">CPU </td><td class="markdownTableBodyRight">14 cores </td><td class="markdownTableBodyRight">34 </td><td class="markdownTableBodyLeft">GNU 9.4.0 </td><td class="markdownTableBodyLeft">GT CSE Internal </td></tr>
240+
<tr class="markdownTableRowEven">
239241
<td class="markdownTableBodyRight">Fujitsu A64FX </td><td class="markdownTableBodyRight">Arm </td><td class="markdownTableBodyRight">CPU </td><td class="markdownTableBodyRight">48 cores </td><td class="markdownTableBodyRight">63 </td><td class="markdownTableBodyLeft">GNU 13.2.0 </td><td class="markdownTableBodyLeft">SBU Ookami </td></tr>
240242
</table>
241243
<p><b>All grind times are in nanoseconds (ns) per grid point (gp) per equation (eq) per right-hand side (rhs) evaluation, so X ns/gp/eq/rhs. Lower is better.</b></p>

0 commit comments

Comments
 (0)