MFlowCode
diff --git a/‎.cursor/rules/mfc-agent-rules.mdc‎
Lines changed: 82 additions & 0 deletions b/‎.cursor/rules/mfc-agent-rules.mdc‎
Lines changed: 82 additions & 0 deletions
diff --git a/‎.github/pull_request_template.md‎
Lines changed: 1 addition & 1 deletion b/‎.github/pull_request_template.md‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎.github/workflows/frontier/build.sh‎
Lines changed: 6 additions & 1 deletion b/‎.github/workflows/frontier/build.sh‎
Lines changed: 6 additions & 1 deletion
diff --git a/‎.github/workflows/frontier/submit.sh‎
Lines changed: 14 additions & 1 deletion b/‎.github/workflows/frontier/submit.sh‎
Lines changed: 14 additions & 1 deletion
diff --git a/‎.github/workflows/frontier/test.sh‎
Lines changed: 5 additions & 2 deletions b/‎.github/workflows/frontier/test.sh‎
Lines changed: 5 additions & 2 deletions
diff --git a/‎.github/workflows/phoenix/bench.sh‎
Lines changed: 7 additions & 2 deletions b/‎.github/workflows/phoenix/bench.sh‎
Lines changed: 7 additions & 2 deletions
diff --git a/‎.github/workflows/phoenix/submit.sh‎
Lines changed: 1 addition & 1 deletion b/‎.github/workflows/phoenix/submit.sh‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎.github/workflows/test.yml‎
Lines changed: 1 addition & 4 deletions b/‎.github/workflows/test.yml‎
Lines changed: 1 addition & 4 deletions
diff --git a/‎CMakeLists.txt‎
Lines changed: 5 additions & 1 deletion b/‎CMakeLists.txt‎
Lines changed: 5 additions & 1 deletion
diff --git a/‎docs/documentation/case.md‎
Lines changed: 2 additions & 6 deletions b/‎docs/documentation/case.md‎
Lines changed: 2 additions & 6 deletions
@@ -0,0 +1,82 @@
+---
+description: Full MFC project rules – consolidated for Agent Mode
+alwaysApply: true
+---
+
+# 0  Purpose & Scope
+Consolidated guidance for the MFC exascale, many-physics solver.  
+Written primarily for Fortran/Fypp; the OpenACC and style sections matter only when
+`.fpp` / `.f90` files are in view.
+
+---
+
+# 1  Global Project Context (always)
+- **Project**: *MFC* is modern Fortran 2008+ generated with **Fypp**.  
+  - Sources `src/`, tests `tests/`, examples `examples/`.  
+  - Most sources are `.fpp`; CMake transpiles them to `.f90`.  
+- **Fypp macros** live in `src/<subprogram>/include/` you should scan these first.  
+  `<subprogram>` ∈ {`simulation`,`common`,`pre_process`,`post_process`}.  
+- Only `simulation` (+ its `common` calls) is GPU-accelerated via **OpenACC**.  
+- Assume free-form Fortran 2008+, `implicit none`, explicit `intent`, and modern
+  intrinsics.  
+- Prefer `module … contains … subroutine foo()`; avoid `COMMON` blocks and
+  file-level `include` files.  
+- **Read the full codebase and docs *before* changing code.**  
+  Docs: <https://mflowcode.github.io/documentation/md_readme.html> and the respository root `README.md`.  
+
+### Incremental-change workflow
+1. Draft a step-by-step plan.  
+2. After each step, build:  
+   ```bash
+   ./mfc.sh build -t pre_process simulation -j $(nproc)
+    ```
+3. If it compiles, run focused tests:
+   ```bash
+   ./mfc.sh test -j $(nproc) -f EA8FA07E -t 9E2CA336
+   ```
+4. Roll back & fix if a step fails.
+
+* Do not run ./mfc.sh test -j $(nproc) without any other arguments (it takes too long to run all tests).
+
+---
+
+# 2  Style & Naming Conventions (for \*.fpp / \*.f90)
+
+* **Indent 2 spaces**; continuation lines align under `&`.
+* Lower-case keywords and intrinsics (`do`, `end subroutine`, …).
+* **Modules**: `m_<feature>` (e.g. `m_transport`).
+* **Public procedures**:
+  * Subroutine → `s_<verb>_<noun>` (e.g. `s_compute_flux`)
+  * Function   → `f_<verb>_<noun>`
+* Private helpers stay in the module; avoid nested procedures.
+* **Size limits**: subroutine ≤ 500 lines, helper ≤ 150, function ≤ 100,
+  module/file ≤ 1000.
+* ≤ 6 arguments per routine; otherwise pass a derived-type “params” struct.
+* No `goto` (except unavoidable legacy); no global state (`COMMON`, `save`).
+* Every variable: `intent(in|out|inout)` + appropriate `dimension` / `allocatable`
+  / `pointer`.
+* Use `s_mpi_abort(<msg>)` for errors, not `stop`.
+* Mark OpenACC-callable helpers that are called from OpenACC parallel loops immediately after declaration:
+  ```fortran
+  subroutine s_flux_update(...)
+    !$acc routine seq
+    ...
+  end subroutine
+  ```
+
+---
+
+# 3  OpenACC Programming Guidelines (for kernels)
+
+Wrap tight loops with
+
+```fortran
+!$acc parallel loop gang vector default(present) reduction(...)
+```
+* Add `collapse(n)` to merge nested loops when safe.
+* Declare loop-local variables with `private(...)`.
+* Allocate large arrays with `managed` or move them into a persistent
+  `!$acc enter data` region at start-up.
+* **Do not** place `stop` / `error stop` inside device code.
+* Must compile with Cray `ftn` and NVIDIA `nvfortran` for GPU offloading; also build CPU-only with
+  GNU `gfortran` and Intel `ifx`/`ifort`.
@@ -54,5 +54,5 @@ To make sure the code is performing as expected on GPU devices, I have:
 - [ ] Ran the code on MI200+ GPUs and ensure the new features performed as expected (the GPU results match the CPU results)
 - [ ] Enclosed the new feature via `nvtx` ranges so that they can be identified in profiles
 - [ ] Ran a Nsight Systems profile using `./mfc.sh run XXXX --gpu -t simulation --nsys`, and have attached the output file (`.nsys-rep`) and plain text results to this PR
-- [ ] Ran an Omniperf profile using `./mfc.sh run XXXX --gpu -t simulation --omniperf`, and have attached the output file and plain text results to this PR.
+- [ ] Ran a Rocprof Systems profile using `./mfc.sh run XXXX --gpu -t simulation --rsys --hip-trace`, and have attached the output file and plain text results to this PR.
 - [ ] Ran my code using various numbers of different GPUs (1, 2, and 8, for example) in parallel and made sure that the results scale similarly to what happens if you run without the new code/feature
@@ -1,4 +1,9 @@
 #!/bin/bash
 
+build_opts=""
+if [ "$1" == "gpu" ]; then
+    build_opts="--gpu"
+fi
+
 . ./mfc.sh load -c f -m g
-./mfc.sh test --dry-run -j 8 --gpu
+./mfc.sh test --dry-run -j 8 $build_opts
@@ -13,16 +13,29 @@ else
     exit 1
 fi
 
+if [ "$2" == "cpu" ]; then
+    sbatch_device_opts="\
+#SBATCH -n 32                       # Number of cores required"
+elif [ "$2" == "gpu" ]; then
+    sbatch_device_opts="\
+#SBATCH -n 8                       # Number of cores required"
+else
+    usage
+    exit 1
+fi
+
+
 job_slug="`basename "$1" | sed 's/\.sh$//' | sed 's/[^a-zA-Z0-9]/-/g'`-$2"
 
 sbatch <<EOT
 #!/bin/bash
 #SBATCH -JMFC-$job_slug            # Job name
 #SBATCH -A CFD154                  # charge account
 #SBATCH -N 1                       # Number of nodes required
-#SBATCH -n 8                       # Number of cores required
+$sbatch_device_opts
 #SBATCH -t 01:59:00                # Duration of the job (Ex: 15 mins)
 #SBATCH -o$job_slug.out            # Combined output and error messages file
+#SBATCH -p extended                # Extended partition for shorter queues
 #SBATCH -q debug                   # Use debug QOS - only one job per user allowed in queue!
 #SBATCH -W                         # Do not exit until the submitted job terminates.
 
 
@@ -3,5 +3,8 @@
 gpus=`rocm-smi --showid | awk '{print $1}' | grep -Eo '[0-9]+' | uniq | tr '\n' ' '`
 ngpus=`echo "$gpus" | tr -d '[:space:]' | wc -c`
 
-./mfc.sh test --max-attempts 3 -j $ngpus -- -c frontier
-
+if [ "$job_device" == "gpu" ]; then
+    ./mfc.sh test --max-attempts 3 -j $ngpus -- -c frontier
+else
+    ./mfc.sh test --max-attempts 3 -j 32 -- -c frontier
+fi
@@ -8,8 +8,13 @@ if [ "$job_device" == "gpu" ]; then
     device_opts="--gpu -g $gpu_ids"
 fi
 
+mkdir -p /storage/scratch1/6/sbryngelson3/mytmp_build
+export TMPDIR=/storage/scratch1/6/sbryngelson3/mytmp_build
+
 if ["$job_device" == "gpu"]; then
-    ./mfc.sh bench --mem 12 -j $(nproc) -o "$job_slug.yaml" -- -c phoenix $device_opts -n $n_ranks
+    ./mfc.sh bench --mem 12 -j $(nproc) -o "$job_slug.yaml" -- -c phoenix-bench $device_opts -n $n_ranks
 else
-    ./mfc.sh bench --mem 1 -j $(nproc) -o "$job_slug.yaml" -- -c phoenix $device_opts -n $n_ranks
+    ./mfc.sh bench --mem 1 -j $(nproc) -o "$job_slug.yaml" -- -c phoenix-bench $device_opts -n $n_ranks
 fi
+
+unset TMPDIR
@@ -42,7 +42,7 @@ sbatch <<EOT
 #SBATCH --account=gts-sbryngelson3 # charge account
 #SBATCH -N1                        # Number of nodes required
 $sbatch_device_opts
-#SBATCH -t 02:00:00                # Duration of the job (Ex: 15 mins)
+#SBATCH -t 03:00:00                # Duration of the job (Ex: 15 mins)
 #SBATCH -q embers                  # QOS Name
 #SBATCH -o$job_slug.out            # Combined output and error messages file
 #SBATCH -W                         # Do not exit until the submitted job terminates.
 
@@ -97,9 +97,6 @@ jobs:
       matrix:
         device: ['cpu', 'gpu']
         lbl: ['gt', 'frontier']
-        exclude:
-          - device:   cpu
-            lbl: frontier
     runs-on:
       group:  phoenix
       labels: ${{ matrix.lbl }}
@@ -116,7 +113,7 @@ jobs:
 
       - name: Build
         if:   matrix.lbl == 'frontier'
-        run:  bash .github/workflows/frontier/build.sh
+        run:  bash .github/workflows/frontier/build.sh ${{ matrix.device }}
 
       - name: Test
         if:   matrix.lbl == 'frontier'
 
@@ -135,13 +135,17 @@ if (CMAKE_Fortran_COMPILER_ID STREQUAL "GNU")
     if (CMAKE_BUILD_TYPE STREQUAL "Debug")
         add_compile_options(
             -Wall
+            -Wextra	
             -fcheck=all,no-array-temps
             -fbacktrace
             -fimplicit-none
-            #-ffpe-trap=invalid,zero,denormal,overflow
             -fsignaling-nans
             -finit-real=snan
             -finit-integer=-99999999
+            -Wintrinsic-shadow	
+            -Wunderflow	
+            -Wrealloc-lhs	
+            -Wsurprising	
 	    )
     endif()
 
 
@@ -436,7 +436,7 @@ The effect and use of the source term are assessed by [Schmidmayer et al., 2019]
 - `time_stepper` specifies the order of the Runge-Kutta (RK) time integration scheme that is used for temporal integration in simulation, from the 1st to 5th order by corresponding integer.
 Note that `time_stepper = 3` specifies the total variation diminishing (TVD), third order RK scheme ([Gottlieb and Shu, 1998](references.md)).
 
-- `adap_dt` activates the Strang operator splitting scheme which splits flux and source terms in time marching, and an adaptive time stepping strategy is implemented for the source term. It requires ``bubbles = 'T'``, ``polytropic = 'T'``, ``adv_n = 'T'`` and `time_stepper = 3`.
+- `adap_dt` activates the Strang operator splitting scheme which splits flux and source terms in time marching, and an adaptive time stepping strategy is implemented for the source term. It requires ``bubbles_euler = 'T'``, ``polytropic = 'T'``, ``adv_n = 'T'`` and `time_stepper = 3`. Additionally, it can be used with ``bubbles_lagrange = 'T'`` and `time_stepper = 3`
 
 - `weno_order` specifies the order of WENO scheme that is used for spatial reconstruction of variables by an integer of 1, 3, 5, and 7, that correspond to the 1st, 3rd, 5th, and 7th order, respectively.
 
@@ -461,7 +461,7 @@ It is recommended to set `weno_eps` to $10^{-6}$ for WENO-JS, and to $10^{-40}$
 `riemann_solver = 1`, `2`, and `3` correspond to HLL, HLLC, and Exact Riemann solver, respectively ([Toro, 2013](references.md)).
 `riemann_solver = 4` is only for MHD simulations. It resolves 5 of the full seven-wave structure of the MHD equations ([Miyoshi and Kusano, 2005](references.md)).
 
-- `low_Mach` specifies the choice of the low Mach number correction scheme for the HLLC Riemann solver. `low_Mach = 0` is default value and does not apply any correction scheme. `low_Mach = 1` and `2` apply the anti-dissipation pressure correction method ([Chen et al., 2022](references.md)) and the improved velocity reconstruction method ([Thornber et al., 2008](references.md)). This feature requires `riemann_solver = 2` and `model_eqns = 2`.
+- `low_Mach` specifies the choice of the low Mach number correction scheme for the HLLC Riemann solver. `low_Mach = 0` is default value and does not apply any correction scheme. `low_Mach = 1` and `2` apply the anti-dissipation pressure correction method ([Chen et al., 2022](references.md)) and the improved velocity reconstruction method ([Thornber et al., 2008](references.md)). This feature requires `model_eqns = 2` or `3`. `low_Mach = 1` works for `riemann_solver = 1` and `2`, but `low_Mach = 2` only works for `riemann_solver = 2`.
 
 - `avg_state` specifies the choice of the method to compute averaged variables at the cell-boundaries from the left and the right states in the Riemann solver by an integer of 1 or 2.
 `avg_state = 1` and `2` correspond to Roe- and arithmetic averages, respectively.
@@ -790,8 +790,6 @@ When ``polytropic = 'F'``, the gas compression is modeled as non-polytropic due
 | `x0`                  | Real    | Reference length                                          |
 | `Thost`               | Real    | Temperature of the surrounding liquid (host)              |
 | `diffcoefvap`         | Real    | Vapor diffusivity in the gas                              |
-| `rkck_adap_dt`        | Logical | Activates the adaptive rkck time stepping algorithm       |
-| `rkck_tolerance`      | Real    | Admissible error truncation tolerance in the rkck stepper  |
 
 - `nBubs_glb` Total number of bubbles. Their initial conditions need to be specified in the ./input/lag_bubbles.dat file. See the example cases for additional information.
 
@@ -805,8 +803,6 @@ When ``polytropic = 'F'``, the gas compression is modeled as non-polytropic due
 
 - `massTransfer_model` Activates the mass transfer model at the bubble's interface based on ([Preston et al., 2007](references.md)).
 
-- `rkck_adap_dt` Activates the adaptive 4th/5th order Runge—Kutta–Cash–Karp (RKCK) time-stepping algorithm (requires `time_stepper ==4`). A maximum error between the 4th and 5th order Runge-Kutta-Cash-Karp solutions for the same time step size is calculated. If the error is smaller than a tolerance (`rkck_tolerance`), then the algorithm employs the 5th order solution, while if not, both eulerian/lagrangian variables are re-calculated with a smaller time step size.
-
 ### 10. Velocity Field Setup
 
 | Parameter              | Type    | Description |