MFlowCode
diff --git a/‎documentation/doxygen_crawl.html‎
Lines changed: 18 additions & 16 deletions b/‎documentation/doxygen_crawl.html‎
Lines changed: 18 additions & 16 deletions
diff --git a/‎documentation/md_case.html‎
Lines changed: 11 additions & 9 deletions b/‎documentation/md_case.html‎
Lines changed: 11 additions & 9 deletions
diff --git a/‎documentation/md_examples.html‎
Lines changed: 15 additions & 10 deletions b/‎documentation/md_examples.html‎
Lines changed: 15 additions & 10 deletions
diff --git a/‎documentation/md_expectedPerformance.html‎
Lines changed: 11 additions & 11 deletions b/‎documentation/md_expectedPerformance.html‎
Lines changed: 11 additions & 11 deletions
@@ -98,9 +98,9 @@
 <a href="md_examples.html#autotoc_md82"/>
 <a href="md_examples.html#autotoc_md83"/>
 <a href="md_examples.html#autotoc_md84"/>
+<a href="md_examples.html#autotoc_md85"/>
+<a href="md_examples.html#autotoc_md86"/>
 <a href="md_expectedPerformance.html"/>
-<a href="md_expectedPerformance.html#autotoc_md86"/>
-<a href="md_expectedPerformance.html#autotoc_md87"/>
 <a href="md_expectedPerformance.html#autotoc_md88"/>
 <a href="md_expectedPerformance.html#autotoc_md89"/>
 <a href="md_expectedPerformance.html#autotoc_md90"/>
@@ -109,15 +109,15 @@
 <a href="md_expectedPerformance.html#autotoc_md93"/>
 <a href="md_expectedPerformance.html#autotoc_md94"/>
 <a href="md_expectedPerformance.html#autotoc_md95"/>
+<a href="md_expectedPerformance.html#autotoc_md96"/>
+<a href="md_expectedPerformance.html#autotoc_md97"/>
 <a href="md_getting-started.html"/>
 <a href="md_getting-started.html#autotoc_md100"/>
 <a href="md_getting-started.html#autotoc_md101"/>
-<a href="md_getting-started.html#autotoc_md97"/>
-<a href="md_getting-started.html#autotoc_md98"/>
+<a href="md_getting-started.html#autotoc_md102"/>
+<a href="md_getting-started.html#autotoc_md103"/>
 <a href="md_getting-started.html#autotoc_md99"/>
 <a href="md_gpuDebugging.html"/>
-<a href="md_gpuDebugging.html#autotoc_md103"/>
-<a href="md_gpuDebugging.html#autotoc_md104"/>
 <a href="md_gpuDebugging.html#autotoc_md105"/>
 <a href="md_gpuDebugging.html#autotoc_md106"/>
 <a href="md_gpuDebugging.html#autotoc_md107"/>
@@ -126,39 +126,41 @@
 <a href="md_gpuDebugging.html#autotoc_md110"/>
 <a href="md_gpuDebugging.html#autotoc_md111"/>
 <a href="md_gpuDebugging.html#autotoc_md112"/>
+<a href="md_gpuDebugging.html#autotoc_md113"/>
+<a href="md_gpuDebugging.html#autotoc_md114"/>
 <a href="md_gpuParallelization.html"/>
-<a href="md_gpuParallelization.html#autotoc_md115"/>
-<a href="md_gpuParallelization.html#autotoc_md116"/>
 <a href="md_gpuParallelization.html#autotoc_md117"/>
+<a href="md_gpuParallelization.html#autotoc_md118"/>
 <a href="md_gpuParallelization.html#autotoc_md119"/>
 <a href="md_gpuParallelization.html#autotoc_md121"/>
 <a href="md_gpuParallelization.html#autotoc_md123"/>
 <a href="md_gpuParallelization.html#autotoc_md125"/>
+<a href="md_gpuParallelization.html#autotoc_md127"/>
 <a href="md_papers.html"/>
 <a href="md_readme.html"/>
-<a href="md_readme.html#autotoc_md129"/>
-<a href="md_readme.html#autotoc_md130"/>
+<a href="md_readme.html#autotoc_md131"/>
+<a href="md_readme.html#autotoc_md132"/>
 <a href="md_references.html"/>
 <a href="md_running.html"/>
-<a href="md_running.html#autotoc_md133"/>
-<a href="md_running.html#autotoc_md134"/>
 <a href="md_running.html#autotoc_md135"/>
 <a href="md_running.html#autotoc_md136"/>
 <a href="md_running.html#autotoc_md137"/>
 <a href="md_running.html#autotoc_md138"/>
 <a href="md_running.html#autotoc_md139"/>
+<a href="md_running.html#autotoc_md140"/>
+<a href="md_running.html#autotoc_md141"/>
 <a href="md_testing.html"/>
-<a href="md_testing.html#autotoc_md141"/>
-<a href="md_testing.html#autotoc_md142"/>
+<a href="md_testing.html#autotoc_md143"/>
+<a href="md_testing.html#autotoc_md144"/>
 <a href="md_visualization.html"/>
-<a href="md_visualization.html#autotoc_md144"/>
-<a href="md_visualization.html#autotoc_md145"/>
 <a href="md_visualization.html#autotoc_md146"/>
 <a href="md_visualization.html#autotoc_md147"/>
 <a href="md_visualization.html#autotoc_md148"/>
 <a href="md_visualization.html#autotoc_md149"/>
 <a href="md_visualization.html#autotoc_md150"/>
 <a href="md_visualization.html#autotoc_md151"/>
+<a href="md_visualization.html#autotoc_md152"/>
+<a href="md_visualization.html#autotoc_md153"/>
 <a href="pages.html"/>
 </body>
 </html>
@@ -705,22 +705,24 @@ <h2><a class="anchor" id="autotoc_md19"></a>
 <tr class="markdownTableRowOdd">
 <td class="markdownTableBodyRight"><code>qm_wrt</code>   </td><td class="markdownTableBodyCenter">Logical   </td><td class="markdownTableBodyLeft">Add the Q-criterion to the database    </td></tr>
 <tr class="markdownTableRowEven">
-<td class="markdownTableBodyRight"><code>tau_wrt</code>   </td><td class="markdownTableBodyCenter">Logical   </td><td class="markdownTableBodyLeft">Add the elastic stress components to the database    </td></tr>
+<td class="markdownTableBodyRight"><code>liutex_wrt</code>   </td><td class="markdownTableBodyCenter">Logical   </td><td class="markdownTableBodyLeft">Add the Liutex to the database    </td></tr>
 <tr class="markdownTableRowOdd">
-<td class="markdownTableBodyRight"><code>fd_order</code>   </td><td class="markdownTableBodyCenter">Integer   </td><td class="markdownTableBodyLeft">Order of finite differences for computing the vorticity and the numerical Schlieren function [1,2,4]    </td></tr>
+<td class="markdownTableBodyRight"><code>tau_wrt</code>   </td><td class="markdownTableBodyCenter">Logical   </td><td class="markdownTableBodyLeft">Add the elastic stress components to the database    </td></tr>
 <tr class="markdownTableRowEven">
-<td class="markdownTableBodyRight"><code>schlieren_alpha(i)</code>   </td><td class="markdownTableBodyCenter">Real   </td><td class="markdownTableBodyLeft">Intensity of the numerical Schlieren computed via <code>alpha(i)</code>    </td></tr>
+<td class="markdownTableBodyRight"><code>fd_order</code>   </td><td class="markdownTableBodyCenter">Integer   </td><td class="markdownTableBodyLeft">Order of finite differences for computing the vorticity and the numerical Schlieren function [1,2,4]    </td></tr>
 <tr class="markdownTableRowOdd">
-<td class="markdownTableBodyRight"><code>probe_wrt</code>   </td><td class="markdownTableBodyCenter">Logical   </td><td class="markdownTableBodyLeft">Write the flow chosen probes data files for each time step    </td></tr>
+<td class="markdownTableBodyRight"><code>schlieren_alpha(i)</code>   </td><td class="markdownTableBodyCenter">Real   </td><td class="markdownTableBodyLeft">Intensity of the numerical Schlieren computed via <code>alpha(i)</code>    </td></tr>
 <tr class="markdownTableRowEven">
-<td class="markdownTableBodyRight"><code>num_probes</code>   </td><td class="markdownTableBodyCenter">Integer   </td><td class="markdownTableBodyLeft">Number of probes    </td></tr>
+<td class="markdownTableBodyRight"><code>probe_wrt</code>   </td><td class="markdownTableBodyCenter">Logical   </td><td class="markdownTableBodyLeft">Write the flow chosen probes data files for each time step    </td></tr>
 <tr class="markdownTableRowOdd">
-<td class="markdownTableBodyRight"><code>probe(i)%[x,y,z]</code>   </td><td class="markdownTableBodyCenter">Real   </td><td class="markdownTableBodyLeft">Coordinates of probe $i$    </td></tr>
+<td class="markdownTableBodyRight"><code>num_probes</code>   </td><td class="markdownTableBodyCenter">Integer   </td><td class="markdownTableBodyLeft">Number of probes    </td></tr>
 <tr class="markdownTableRowEven">
-<td class="markdownTableBodyRight"><code>output_partial_domain</code>   </td><td class="markdownTableBodyCenter">Logical   </td><td class="markdownTableBodyLeft">Output part of the domain    </td></tr>
+<td class="markdownTableBodyRight"><code>probe(i)%[x,y,z]</code>   </td><td class="markdownTableBodyCenter">Real   </td><td class="markdownTableBodyLeft">Coordinates of probe $i$    </td></tr>
 <tr class="markdownTableRowOdd">
-<td class="markdownTableBodyRight"><code>[x,y,z]_outputbeg</code>   </td><td class="markdownTableBodyCenter">Real   </td><td class="markdownTableBodyLeft">Beginning of the output domain in the [x,y,z]-direction    </td></tr>
+<td class="markdownTableBodyRight"><code>output_partial_domain</code>   </td><td class="markdownTableBodyCenter">Logical   </td><td class="markdownTableBodyLeft">Output part of the domain    </td></tr>
 <tr class="markdownTableRowEven">
+<td class="markdownTableBodyRight"><code>[x,y,z]_outputbeg</code>   </td><td class="markdownTableBodyCenter">Real   </td><td class="markdownTableBodyLeft">Beginning of the output domain in the [x,y,z]-direction    </td></tr>
+<tr class="markdownTableRowOdd">
 <td class="markdownTableBodyRight"><code>[x,y,z]_outputend</code>   </td><td class="markdownTableBodyCenter">Real   </td><td class="markdownTableBodyLeft">End of the output domain in the [x,y,z]-direction   </td></tr>
 </table>
 <p>The table lists formatted database output parameters. The parameters define variables that are outputted from simulation and file types and formats of data as well as options for post-processing.</p>
@@ -734,7 +736,7 @@ <h2><a class="anchor" id="autotoc_md19"></a>
 <li><code>schlieren_alpha(i)</code> specifies the intensity of the numerical Schlieren of $i$-th component.</li>
 <li><code>fd_order</code> specifies the order of the finite difference scheme used to compute the vorticity from the velocity field and the numerical schlieren from the density field using an integer of 1, 2, and 4. <code>fd_order = 1</code>, <code>2</code>, and <code>4</code> correspond to the first, second, and fourth-order finite difference schemes.</li>
 <li><code>probe_wrt</code> activates the output of state variables at coordinates specified by <code>probe(i)%[x;y,z]</code>.</li>
-<li><code>output_partial_domain</code> activates the output of part of the domain specified by <code>[x,y,z]_outputbeg</code> and <code>[x,y,z]_outputend</code>. This is useful for large domains where only a portion of the domain is of interest. It is not supported when <code>precision = 1</code> and <code>format = 1</code>. It also cannot be enabled with <code>flux_wrt</code>, <code>heat_ratio_wrt</code>, <code>pres_inf_wrt</code>, <code>c_wrt</code>, <code>omega_wrt</code>, <code>ib</code>, <code>schlieren_wrt</code>, or <code>qm_wrt</code>.</li>
+<li><code>output_partial_domain</code> activates the output of part of the domain specified by <code>[x,y,z]_outputbeg</code> and <code>[x,y,z]_outputend</code>. This is useful for large domains where only a portion of the domain is of interest. It is not supported when <code>precision = 1</code> and <code>format = 1</code>. It also cannot be enabled with <code>flux_wrt</code>, <code>heat_ratio_wrt</code>, <code>pres_inf_wrt</code>, <code>c_wrt</code>, <code>omega_wrt</code>, <code>ib</code>, <code>schlieren_wrt</code>, <code>qm_wrt</code>, or 'liutex_wrt'.</li>
 </ul>
 <h2><a class="anchor" id="acoustic-source"></a>
 8. Acoustic Source</h2>
 
@@ -278,38 +278,43 @@ <h2><a class="anchor" id="autotoc_md73"></a>
 Initial Condition and Result</h2>
 <p><img src="initial-2D_hardcoded_ic-example.png" alt="" width="45%" class="inline"/> <img src="result-2D_hardcoded_ic-example.png" alt="" width="45%" class="inline"/></p>
 <h1><a class="anchor" id="autotoc_md74"></a>
-Rayleigh-Taylor Instability (3D)</h1>
+3D Turbulent Mixing layer (3D)</h1>
 <h2><a class="anchor" id="autotoc_md75"></a>
+Liutex visualization at transitional state</h2>
+<p><img src="result-3D_turb_mixing-example.png" alt="" height="400" class="inline"/></p>
+<h1><a class="anchor" id="autotoc_md76"></a>
+Rayleigh-Taylor Instability (3D)</h1>
+<h2><a class="anchor" id="autotoc_md77"></a>
 Final Condition and Linear Theory</h2>
 <p><img src="final_condition-3D_rayleigh_taylor-example.png" alt="" height="400" class="inline"/> <img src="linear_theory-3D_rayleigh_taylor-example.png" alt="" height="400" class="inline"/></p>
-<h1><a class="anchor" id="autotoc_md76"></a>
+<h1><a class="anchor" id="autotoc_md78"></a>
 Taylor-Green Vortex (3D)</h1>
 <p>Reference: </p><blockquote class="doxtable">
 <p>Hillewaert, K. (2013). TestCase C3.5 - DNS of the transition of the Taylor-Green vortex, Re=1600 - Introduction and result summary. 2nd International Workshop on high-order methods for CFD. </p>
 </blockquote>
-<h2><a class="anchor" id="autotoc_md77"></a>
+<h2><a class="anchor" id="autotoc_md79"></a>
 Final Condition</h2>
 <p>This figure shows the isosurface with zero q-criterion.</p>
 <p><img src="result-3D_TaylorGreenVortex-example.png" alt="" height="400" class="inline"/></p>
-<h1><a class="anchor" id="autotoc_md78"></a>
+<h1><a class="anchor" id="autotoc_md80"></a>
 Gas Jet (2D)</h1>
-<h2><a class="anchor" id="autotoc_md79"></a>
+<h2><a class="anchor" id="autotoc_md81"></a>
 Final Condition</h2>
 <p><img src="final_condition-2D_jet-example.png" alt="" height="400" class="inline"/></p>
-<h1><a class="anchor" id="autotoc_md80"></a>
+<h1><a class="anchor" id="autotoc_md82"></a>
 1D Multi-Component Inert Shock Tube</h1>
 <p>Reference: </p><blockquote class="doxtable">
 <p>P. J. Martínez Ferrer, R. Buttay, G. Lehnasch, and A. Mura, “A detailed verification procedure for compressible reactive multicomponent Navier–Stokes solvers”, Computers &amp; Fluids, vol. 89, pp. 88–110, Jan. 2014. Accessed: Oct. 13, 2024. [Online]. Available: <a href="https://doi.org/10.1016/j.compfluid.2013.10.014">https://doi.org/10.1016/j.compfluid.2013.10.014</a> </p>
 </blockquote>
-<h2><a class="anchor" id="autotoc_md81"></a>
+<h2><a class="anchor" id="autotoc_md83"></a>
 Initial Condition</h2>
 <p><img src="initial-1D_inert_shocktube-example.png" alt="" height="400" class="inline"/></p>
-<h2><a class="anchor" id="autotoc_md82"></a>
+<h2><a class="anchor" id="autotoc_md84"></a>
 Results</h2>
 <p><img src="result-1D_inert_shocktube-example.png" alt="" height="400" class="inline"/></p>
-<h1><a class="anchor" id="autotoc_md83"></a>
+<h1><a class="anchor" id="autotoc_md85"></a>
 2D IBM CFL dt (2D)</h1>
-<h2><a class="anchor" id="autotoc_md84"></a>
+<h2><a class="anchor" id="autotoc_md86"></a>
 Result</h2>
 <p><img src="result-2D_ibm_cfl_dt-example.png" alt="" height="400" class="inline"/> </p>
 </div></div><!-- contents -->
 
@@ -135,9 +135,9 @@
   <div class="headertitle"><div class="title">Performance</div></div>
 </div><!--header-->
 <div class="contents">
-<div class="textblock"><p><a class="anchor" id="autotoc_md85"></a></p>
+<div class="textblock"><p><a class="anchor" id="autotoc_md87"></a></p>
 <p>MFC has been benchmarked on several CPUs and GPU devices. This page is a summary of these results.</p>
-<h1><a class="anchor" id="autotoc_md86"></a>
+<h1><a class="anchor" id="autotoc_md88"></a>
 Figure of merit: Grind time performance</h1>
 <p>The following table outlines observed performance as nanoseconds per grid point (ns/gp) per equation (eq) per right-hand side (rhs) evaluation (lower is better), also known as the grind time. We solve an example 3D, inviscid, 5-equation model problem with two advected species (8 PDEs) and 8M grid points (158-cubed uniform grid). The numerics are WENO5 finite volume reconstruction and HLLC approximate Riemann solver. This case is located in <code>examples/3D_performance_test</code>. You can run it via <code>./mfc.sh run -n &lt;num_processors&gt; -j $(nproc) ./examples/3D_performance_test/case.py -t pre_process simulation --case-optimization</code> for CPU cases right after building MFC, which will build an optimized version of the code for this case then execute it. For benchmarking GPU devices, you will likely want to use <code>-n &lt;num_gpus&gt;</code> where <code>&lt;num_gpus&gt;</code> should likely be <code>1</code>. If the above does not work on your machine, see the rest of this documentation for other ways to use the <code>./mfc.sh run</code> command.</p>
 <p>Results are for MFC v4.9.3 (July 2024 release), though numbers have not changed meaningfully since then. Similar performance is also seen for other problem configurations, such as the Euler equations (4 PDEs). All results are for the compiler that gave the best performance. Note:</p><ul>
@@ -249,34 +249,34 @@ <h1><a class="anchor" id="autotoc_md86"></a>
 <td class="markdownTableBodyRight">Fujitsu A64FX   </td><td class="markdownTableBodyRight">Arm   </td><td class="markdownTableBodyRight">CPU   </td><td class="markdownTableBodyRight">48 cores   </td><td class="markdownTableBodyRight">63   </td><td class="markdownTableBodyLeft">GNU 13.2.0   </td><td class="markdownTableBodyLeft">SBU Ookami   </td></tr>
 </table>
 <p><b>All grind times are in nanoseconds (ns) per grid point (gp) per equation (eq) per right-hand side (rhs) evaluation, so X ns/gp/eq/rhs. Lower is better.</b></p>
-<h1><a class="anchor" id="autotoc_md87"></a>
+<h1><a class="anchor" id="autotoc_md89"></a>
 Weak scaling</h1>
 <p>Weak scaling results are obtained by increasing the problem size with the number of processes so that work per process remains constant.</p>
-<h2><a class="anchor" id="autotoc_md88"></a>
+<h2><a class="anchor" id="autotoc_md90"></a>
 AMD MI250X GPU</h2>
 <p>MFC weask scales to (at least) 65,536 AMD MI250X GPUs on OLCF Frontier with 96% efficiency. This corresponds to 87% of the entire machine.</p>
 <p><img src="../res/weakScaling/frontier.svg" alt="" style="height: 50%; width:50%; border-radius: 10pt pointer-events: none;" class="inline"/></p>
-<h2><a class="anchor" id="autotoc_md89"></a>
+<h2><a class="anchor" id="autotoc_md91"></a>
 NVIDIA V100 GPU</h2>
 <p>MFC weak scales to (at least) 13,824 V100 NVIDIA V100 GPUs on OLCF Summit with 97% efficiency. This corresponds to 50% of the entire machine.</p>
 <p><img src="../res/weakScaling/summit.svg" alt="" style="height: 50%; width:50%; border-radius: 10pt pointer-events: none;" class="inline"/></p>
-<h2><a class="anchor" id="autotoc_md90"></a>
+<h2><a class="anchor" id="autotoc_md92"></a>
 IBM Power9 CPU</h2>
 <p>MFC Weak scales to 13,824 Power9 CPU cores on OLCF Summit to within 1% of ideal scaling.</p>
 <p><img src="../res/weakScaling/cpuScaling.svg" alt="" style="height: 50%; width:50%; border-radius: 10pt pointer-events: none;" class="inline"/></p>
-<h1><a class="anchor" id="autotoc_md91"></a>
+<h1><a class="anchor" id="autotoc_md93"></a>
 Strong scaling</h1>
 <p>Strong scaling results are obtained by keeping the problem size constant and increasing the number of processes so that work per process decreases.</p>
-<h2><a class="anchor" id="autotoc_md92"></a>
+<h2><a class="anchor" id="autotoc_md94"></a>
 NVIDIA V100 GPU</h2>
 <p>The base case utilizes 8 GPUs with one MPI process per GPU for these tests. The performance is analyzed at two problem sizes: 16M and 64M grid points. The "base case" uses 2M and 8M grid points per process.</p>
-<h3><a class="anchor" id="autotoc_md93"></a>
+<h3><a class="anchor" id="autotoc_md95"></a>
 16M Grid Points</h3>
 <p><img src="../res/strongScaling/strongScaling16.svg" alt="" style="width: 50%; border-radius: 10pt pointer-events: none;" class="inline"/></p>
-<h3><a class="anchor" id="autotoc_md94"></a>
+<h3><a class="anchor" id="autotoc_md96"></a>
 64M Grid Points</h3>
 <p><img src="../res/strongScaling/strongScaling64.svg" alt="" style="width: 50%; border-radius: 10pt pointer-events: none;" class="inline"/></p>
-<h2><a class="anchor" id="autotoc_md95"></a>
+<h2><a class="anchor" id="autotoc_md97"></a>
 IBM Power9 CPU</h2>
 <p>CPU strong scaling tests are done with problem sizes of 16, 32, and 64M grid points, with the base case using 2, 4, and 8M cells per process.</p>
 <p><img src="../res/strongScaling/cpuStrongScaling.svg" alt="" style="width: 50%; border-radius: 10pt pointer-events: none;" class="inline"/> </p>