MFlowCode
diff --git a/‎.cursor/rules/mfc-agent-rules.mdc‎
Lines changed: 126 additions & 31 deletions b/‎.cursor/rules/mfc-agent-rules.mdc‎
Lines changed: 126 additions & 31 deletions
diff --git a/‎.fortlsrc‎
Lines changed: 2 additions & 2 deletions b/‎.fortlsrc‎
Lines changed: 2 additions & 2 deletions
diff --git a/‎.github/workflows/frontier/submit-bench.sh‎
Lines changed: 1 addition & 1 deletion b/‎.github/workflows/frontier/submit-bench.sh‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎.github/workflows/frontier/submit.sh‎
Lines changed: 1 addition & 1 deletion b/‎.github/workflows/frontier/submit.sh‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎.github/workflows/phoenix/submit-bench.sh‎
Lines changed: 6 additions & 6 deletions b/‎.github/workflows/phoenix/submit-bench.sh‎
Lines changed: 6 additions & 6 deletions
diff --git a/‎.github/workflows/phoenix/submit.sh‎
Lines changed: 6 additions & 6 deletions b/‎.github/workflows/phoenix/submit.sh‎
Lines changed: 6 additions & 6 deletions
diff --git a/‎CMakeLists.txt‎
Lines changed: 9 additions & 9 deletions b/‎CMakeLists.txt‎
Lines changed: 9 additions & 9 deletions
diff --git a/‎README.md‎
Lines changed: 2 additions & 0 deletions b/‎README.md‎
Lines changed: 2 additions & 0 deletions
diff --git a/‎benchmarks/5eq_rk3_weno3_hllc/case.py‎
Lines changed: 2 additions & 2 deletions b/‎benchmarks/5eq_rk3_weno3_hllc/case.py‎
Lines changed: 2 additions & 2 deletions
diff --git a/‎benchmarks/hypo_hll/case.py‎
Lines changed: 2 additions & 2 deletions b/‎benchmarks/hypo_hll/case.py‎
Lines changed: 2 additions & 2 deletions
@@ -4,29 +4,27 @@ alwaysApply: true
 ---
 
 # 0  Purpose & Scope
-Consolidated guidance for the MFC exascale, many-physics solver.  
-Written primarily for Fortran/Fypp; the OpenACC and style sections matter only when
-`.fpp` / `.f90` files are in view.
+Consolidated guidance for the MFC exascale, many-physics solver.
+Written primarily for Fortran/Fypp; the GPU and style sections matter only when `.fpp` / `.f90` files are in view.
 
 ---
 
 # 1  Global Project Context (always)
-- **Project**: *MFC* is modern Fortran 2008+ generated with **Fypp**.  
-  - Sources `src/`, tests `tests/`, examples `examples/`.  
-  - Most sources are `.fpp`; CMake transpiles them to `.f90`.  
-- **Fypp macros** live in `src/<subprogram>/include/` you should scan these first.  
-  `<subprogram>` ∈ {`simulation`,`common`,`pre_process`,`post_process`}.  
-- Only `simulation` (+ its `common` calls) is GPU-accelerated via **OpenACC**.  
-- Assume free-form Fortran 2008+, `implicit none`, explicit `intent`, and modern
-  intrinsics.  
-- Prefer `module … contains … subroutine foo()`; avoid `COMMON` blocks and
-  file-level `include` files.  
-- **Read the full codebase and docs *before* changing code.**  
-  Docs: <https://mflowcode.github.io/documentation/md_readme.html> and the respository root `README.md`.  
+- **Project**: *MFC* is modern Fortran 2008+ generated with **Fypp**.
+  - Sources `src/`, tests `tests/`, examples `examples/`.
+  - Most sources are `.fpp`; CMake transpiles them to `.f90`.
+- **Fypp macros** live in `src/<subprogram>/include/` you should scan these first.
+  `<subprogram>` ∈ {`simulation`,`common`,`pre_process`,`post_process`}.
+- Only `simulation` (+ its `common` calls) is GPU-accelerated via **OpenACC** or **OpenMP**.
+- Assume free-form Fortran 2008+, `implicit none`, explicit `intent`, and modern intrinsics.
+- Prefer `module … contains … subroutine foo()`; avoid `COMMON` blocks and file-level `include` files.
+- **Read the full codebase and docs *before* changing code.**
+  - Docs: <https://mflowcode.github.io/documentation/md_readme.html> and the repository root `README.md`.
 
 ### Incremental-change workflow
-1. Draft a step-by-step plan.  
-2. After each step, build:  
+
+1. Draft a step-by-step plan.
+2. After each step, build:
    ```bash
    ./mfc.sh build -t pre_process simulation -j $(nproc)
     ```
@@ -49,34 +47,131 @@ Written primarily for Fortran/Fypp; the OpenACC and style sections matter only w
   * Subroutine → `s_<verb>_<noun>` (e.g. `s_compute_flux`)
   * Function   → `f_<verb>_<noun>`
 * Private helpers stay in the module; avoid nested procedures.
-* **Size limits**: subroutine ≤ 500 lines, helper ≤ 150, function ≤ 100,
-  module/file ≤ 1000.
-* ≤ 6 arguments per routine; otherwise pass a derived-type “params” struct.
+* **Size limits**: subroutine ≤ 500 lines, helper ≤ 150, function ≤ 100, module/file ≤ 1000.
+* ≤ 6 arguments per routine; otherwise pass a derived-type "params" struct.
 * No `goto` (except unavoidable legacy); no global state (`COMMON`, `save`).
-* Every variable: `intent(in|out|inout)` + appropriate `dimension` / `allocatable`
-  / `pointer`.
+* Every variable: `intent(in|out|inout)` + appropriate `dimension` / `allocatable` / `pointer`.
 * Use `s_mpi_abort(<msg>)` for errors, not `stop`.
-* Mark OpenACC-callable helpers that are called from OpenACC parallel loops immediately after declaration:
+* Mark GPU-callable helpers that are called from GPU parallel loops immediately after declaration:
   ```fortran
   subroutine s_flux_update(...)
-    !$acc routine seq
+    $:GPU_ROUTINE(function_name='s_flux_update', parallelism='[seq]')
     ...
   end subroutine
   ```
 
 ---
 
-# 3  OpenACC Programming Guidelines (for kernels)
+# 3  File & Module Structure
 
-Wrap tight loops with
+- **File Naming**:
+  - `.fpp` files: Fypp preprocessed files that get translated to `.f90`
+  - Modules are named with `m_` prefix followed by feature name: `m_helper_basic`, `m_viscous`
+  - Primary program file is named `p_main.fpp`
+
+- **Module Layout**:
+  - Start with Fypp include for macros: `#:include 'macros.fpp'`
+  - Header comments using `!>` style documentation
+  - `module` declaration with name matching filename
+  - `use` statements for dependencies
+  - `implicit none` statement
+  - `private` declaration followed by explicit `public` exports
+  - `contains` section
+  - Implementation of subroutines and functions
+
+---
+
+# 4  Fypp Macros
+
+- **Fypp Directives**:
+  - Start with `#:` (e.g., `#:include`, `#:def`, `#:enddef`)
+  - Macros defined in `include/*.fpp` files
+  - Used for code generation, conditional compilation, and GPU offloading
+
+---
 
+# 5  FYPP Macros for GPU Acceleration Programming Guidelines (for GPU kernels)
+
+- Do not use OpenACC or OpenMP directives directly.
+- Instead, use the FYPP macros contained in `src/common/include/parallel_macros.fpp`
+- Documentation on how to use the Fypp macros for GPU offloading is available at https://mflowcode.github.io/documentation/md_gpuParallelization.html
+
+Wrap tight loops with
 ```fortran
-!$acc parallel loop gang vector default(present) reduction(...)
+$:GPU_PARALLEL_FOR(private='[...]', copy='[...]')
 ```
-* Add `collapse(n)` to merge nested loops when safe.
-* Declare loop-local variables with `private(...)`.
+* Add `collapse=n` to merge nested loops when safe.
+* Declare loop-local variables with `private='[...]'`.
 * Allocate large arrays with `managed` or move them into a persistent
-  `!$acc enter data` region at start-up.
+  `$:GPU_ENTER_DATA(...)` region at start-up.
 * **Do not** place `stop` / `error stop` inside device code.
-* Must compile with Cray `ftn` and NVIDIA `nvfortran` for GPU offloading; also build CPU-only with
+* Must compile with Cray `ftn` or NVIDIA `nvfortran` for GPU offloading; also build CPU-only with
   GNU `gfortran` and Intel `ifx`/`ifort`.
+
+- Example GPU macros include the below, among others:
+  - `$:GPU_ROUTINE(parallelism='[seq]')` - Marks GPU-callable routines
+  - `$:GPU_PARALLEL_LOOP(collapse=N)` - Parallelizes loops
+  - `$:GPU_LOOP(parallelism='[seq]')` - Marks sequential loops
+  - `$:GPU_UPDATE(device='[var1,var2]')` - Updates device data
+  - `$:GPU_ENTER_DATA(copyin='[var]')` - Copies data to device
+  - `$:GPU_EXIT_DATA(delete='[var]')` - Removes data from device
+
+---
+
+# 6  Documentation Style
+
+- **Subroutine/Function Documentation**:
+  ```fortran
+  !> This procedure <description>
+  !! @param param_name Description of the parameter
+  !! @return Description of the return value (for functions)
+  ```
+which conforms to the Doxygen Fortran format.
+
+# 7  Error Handling
+
+- **Assertions**:
+  - Use the fypp `ASSERT` macro for validating conditions
+  - Example: `@:ASSERT(predicate, message)`
+
+- **Error Reporting**:
+  - Use `s_mpi_abort(error_message)` for error termination, not `stop`
+  - No `stop` / `error stop` inside device code
+
+# 8  Memory Management
+
+- **Allocation/Deallocation**:
+  - Use fypp macro `@:ALLOCATE(var1, var2)` macro for device-aware allocation
+  - Use fypp macro `@:DEALLOCATE(var1, var2)` macro for device-aware deallocation
+
+# 9. Additional Observed Patterns
+
+- **Derived Types**:
+  - Extensive use of derived types for encapsulation
+  - Use pointers within derived types (e.g., `pointer, dimension(:,:,:) => null()`)
+  - Clear documentation of derived type components
+
+- **Pure & Elemental Functions**:
+  - Use `pure` and `elemental` attributes for side-effect-free functions
+  - Combine them for operations on arrays (`pure elemental function`)
+
+- **Precision Handling**:
+  - Use `wp` (working precision) parameter from `m_precision_select`
+  - Never hardcode precision with `real*8` or similar
+
+- **Loop Optimization**:
+  - Favor array operations over explicit loops when possible
+  - Use `collapse=N` directive to optimize nested loops
+
+# 10. Fortran Practices to Avoid
+
+- **Fixed Format**: Only free-form Fortran is used
+  - No column-position dependent code
+
+- **Older Intrinsics**: Avoid outdated Fortran features like:
+  - `equivalence` statements
+  - `data` statements (use initialization expressions)
+  - Character*N (use `character(len=N)` instead)
+
+- **Using same variable for multiple purposes**: Maintain single responsibility
+  - Each variable should have one clear purpose
@@ -29,7 +29,7 @@
         "MFC_DOUBLE_PRECISION": 1
     },
     "lowercase_intrinsics": true,
-    "debug_log": true,
+    "debug_log": false,
     "disable_diagnostics": false,
     "use_signature_help": true,
     "variable_hover": true,
@@ -93,4 +93,4 @@
         "**/m_nvtx*",
         "**/syscheck.fpp"
     ]
-} 
+} 
@@ -32,7 +32,7 @@ sbatch <<EOT
 #SBATCH -A CFD154                  # charge account
 #SBATCH -N 1                       # Number of nodes required
 $sbatch_device_opts
-#SBATCH -t 01:59:00                # Duration of the job (Ex: 15 mins)
+#SBATCH -t 02:59:00                # Duration of the job (Ex: 15 mins)
 #SBATCH -o$job_slug.out            # Combined output and error messages file
 #SBATCH -p extended                # Extended partition for shorter queues
 #SBATCH -W                         # Do not exit until the submitted job terminates.
 
@@ -33,7 +33,7 @@ sbatch <<EOT
 #SBATCH -A CFD154                  # charge account
 #SBATCH -N 1                       # Number of nodes required
 $sbatch_device_opts
-#SBATCH -t 01:59:00                # Duration of the job (Ex: 15 mins)
+#SBATCH -t 02:59:00                # Duration of the job (Ex: 15 mins)
 #SBATCH -o$job_slug.out            # Combined output and error messages file
 #SBATCH -p extended                # Extended partition for shorter queues
 #SBATCH -W                         # Do not exit until the submitted job terminates.
 
@@ -69,7 +69,7 @@ JOBID=$(sbatch <<-EOT | awk '{print $4}'
 EOT
 )
 
-echo "🚀 Submitted SLURM job $JOBID"
+echo "Submitted: SLURM job $JOBID"
 
 # if this wrapper is killed/canceled, make sure SLURM job is cleaned up
 trap '[[ -n "${JOBID:-}" ]] && scancel "$JOBID" >/dev/null 2>&1 || :' EXIT
@@ -86,22 +86,22 @@ while :; do
 
   # If it’s one of SLURM’s terminal states, break immediately
   case "$STATE" in
-    COMPLETED|FAILED|CANCELLED|TIMEOUT)
-      echo "✅ SLURM job $JOBID reached terminal state: $STATE"
+    COMPLETED|FAILED|CANCELLED|TIMEOUT|PREEMPTED)
+      echo "Completed: SLURM job $JOBID reached terminal state: $STATE"
       break
       ;;
     "")
-      echo "✅ SLURM job $JOBID no longer in queue; assuming finished"
+      echo "Completed: SLURM job $JOBID no longer in queue; assuming finished"
       break
       ;;
     *)
-      echo "⏳ SLURM job $JOBID state: $STATE"
+      echo "Waiting: SLURM job $JOBID state: $STATE"
       sleep 10
       ;;
   esac
 done
 
 # Now retrieve the exit code and exit with it
 EXIT_CODE=$(sacct -j "$JOBID" --noheader --format=ExitCode | head -1 | cut -d: -f1)
-echo "🔚 SLURM job $JOBID exit code: $EXIT_CODE"
+echo "Completed: SLURM job $JOBID exit code: $EXIT_CODE"
 exit "$EXIT_CODE"
@@ -62,7 +62,7 @@ JOBID=$(sbatch <<-EOT | awk '{print $4}'
 EOT
 )
 
-echo "🚀 Submitted SLURM job $JOBID"
+echo "Submitted: SLURM job $JOBID"
 
 # if this wrapper is killed/canceled, make sure SLURM job is cleaned up
 trap '[[ -n "${JOBID:-}" ]] && scancel "$JOBID" >/dev/null 2>&1 || :' EXIT
@@ -79,22 +79,22 @@ while :; do
 
   # If it’s one of SLURM’s terminal states, break immediately
   case "$STATE" in
-    COMPLETED|FAILED|CANCELLED|TIMEOUT)
-      echo "✅ SLURM job $JOBID reached terminal state: $STATE"
+    COMPLETED|FAILED|CANCELLED|TIMEOUT|PREEMPTED)
+      echo "Completed: SLURM job $JOBID reached terminal state: $STATE"
       break
       ;;
     "")
-      echo "✅ SLURM job $JOBID no longer in queue; assuming finished"
+      echo "Completed: SLURM job $JOBID no longer in queue; assuming finished"
       break
       ;;
     *)
-      echo "⏳ SLURM job $JOBID state: $STATE"
+      echo "Waiting: SLURM job $JOBID state: $STATE"
       sleep 10
       ;;
   esac
 done
 
 # Now retrieve the exit code and exit with it
 EXIT_CODE=$(sacct -j "$JOBID" --noheader --format=ExitCode | head -1 | cut -d: -f1)
-echo "🔚 SLURM job $JOBID exit code: $EXIT_CODE"
+echo "Completed: SLURM job $JOBID exit code: $EXIT_CODE"
 exit "$EXIT_CODE"
@@ -135,17 +135,17 @@ if (CMAKE_Fortran_COMPILER_ID STREQUAL "GNU")
     if (CMAKE_BUILD_TYPE STREQUAL "Debug")
         add_compile_options(
             -Wall
-            -Wextra	
+            -Wextra
             -fcheck=all,no-array-temps
             -fbacktrace
             -fimplicit-none
             -fsignaling-nans
             -finit-real=snan
             -finit-integer=-99999999
-            -Wintrinsic-shadow	
-            -Wunderflow	
-            -Wrealloc-lhs	
-            -Wsurprising	
+            -Wintrinsic-shadow
+            -Wunderflow
+            -Wrealloc-lhs
+            -Wsurprising
 	    )
     endif()
 
@@ -163,6 +163,7 @@ elseif (CMAKE_Fortran_COMPILER_ID STREQUAL "Cray")
         "SHELL:-h acc_model=auto_async_none"
         "SHELL: -h acc_model=no_fast_addr"
         "SHELL: -h list=adm"
+        "SHELL: -munsafe-fp-atomics" # Not unsafe for operations we do
     )
 
     add_link_options("SHELL:-hkeepfiles")
@@ -172,7 +173,6 @@ elseif (CMAKE_Fortran_COMPILER_ID STREQUAL "Cray")
                 "SHELL:-h acc_model=auto_async_none"
                 "SHELL: -h acc_model=no_fast_addr"
                 "SHELL: -K trap=fp" "SHELL: -G2"
-
         )
         add_link_options("SHELL: -K trap=fp" "SHELL: -G2")
     endif()
@@ -200,10 +200,10 @@ elseif ((CMAKE_Fortran_COMPILER_ID STREQUAL "NVHPC") OR (CMAKE_Fortran_COMPILER_
     if (CMAKE_BUILD_TYPE STREQUAL "Debug")
         add_compile_options(
             $<$<COMPILE_LANGUAGE:Fortran>:-O0>
-            $<$<COMPILE_LANGUAGE:Fortran>:-C> 
+            $<$<COMPILE_LANGUAGE:Fortran>:-C>
             $<$<COMPILE_LANGUAGE:Fortran>:-g>
-            $<$<COMPILE_LANGUAGE:Fortran>:-traceback> 
-            $<$<COMPILE_LANGUAGE:Fortran>:-Minform=inform> 
+            $<$<COMPILE_LANGUAGE:Fortran>:-traceback>
+            $<$<COMPILE_LANGUAGE:Fortran>:-Minform=inform>
             $<$<COMPILE_LANGUAGE:Fortran>:-Mbounds>
         )
     endif()
 
@@ -155,6 +155,8 @@ They are organized below.
 * Runge-Kutta orders 1-3 (SSP TVD), adaptive time stepping
 * RK4-5 operator splitting for Euler-Lagrange modeling
 * Interface sharpening (THINC-like)
+* Information geometric regularization (IGR)
+    * Shock capturing without WENO and Riemann solvers
 
 ### Large-scale and accelerated simulation
 
 
@@ -188,8 +188,8 @@
             "cyl_coord": "F",
             "dt": dt,
             "t_step_start": 0,
-            "t_step_stop": int(30 * (95 * size + 5)),
-            "t_step_save": int(30 * (95 * size + 5)),
+            "t_step_stop": int(20 * (5 * size + 5)),
+            "t_step_save": int(20 * (5 * size + 5)),
             # Simulation Algorithm Parameters
             "num_patches": 3,
             "model_eqns": 2,
 
@@ -41,8 +41,8 @@
             "p": Nz,
             "dt": 1e-8,
             "t_step_start": 0,
-            "t_step_stop": int(30 * (95 * size + 5)),
-            "t_step_save": int(30 * (95 * size + 5)),
+            "t_step_stop": int(20 * (5 * size + 5)),
+            "t_step_save": int(20 * (5 * size + 5)),
             # Simulation Algorithm Parameters
             "num_patches": 2,
             "model_eqns": 2,