MFlowCode
diff --git a/‎.cursor/rules/mfc-agent-rules.mdc‎
Lines changed: 119 additions & 26 deletions b/‎.cursor/rules/mfc-agent-rules.mdc‎
Lines changed: 119 additions & 26 deletions
diff --git a/‎.fortls.json‎
Lines changed: 0 additions & 94 deletions b/‎.fortls.json‎
Lines changed: 0 additions & 94 deletions
diff --git a/‎.fortlsrc‎
Lines changed: 4 additions & 17 deletions b/‎.fortlsrc‎
Lines changed: 4 additions & 17 deletions
diff --git a/‎.github/workflows/frontier/submit.sh‎
Lines changed: 1 addition & 1 deletion b/‎.github/workflows/frontier/submit.sh‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎CMakeLists.txt‎
Lines changed: 0 additions & 11 deletions b/‎CMakeLists.txt‎
Lines changed: 0 additions & 11 deletions
@@ -4,29 +4,27 @@ alwaysApply: true
 ---
 
 # 0  Purpose & Scope
-Consolidated guidance for the MFC exascale, many-physics solver.  
-Written primarily for Fortran/Fypp; the OpenACC and style sections matter only when
-`.fpp` / `.f90` files are in view.
+Consolidated guidance for the MFC exascale, many-physics solver.
+Written primarily for Fortran/Fypp; the GPU and style sections matter only when `.fpp` / `.f90` files are in view.
 
 ---
 
 # 1  Global Project Context (always)
-- **Project**: *MFC* is modern Fortran 2008+ generated with **Fypp**.  
-  - Sources `src/`, tests `tests/`, examples `examples/`.  
-  - Most sources are `.fpp`; CMake transpiles them to `.f90`.  
-- **Fypp macros** live in `src/<subprogram>/include/` you should scan these first.  
-  `<subprogram>` ∈ {`simulation`,`common`,`pre_process`,`post_process`}.  
-- Only `simulation` (+ its `common` calls) is GPU-accelerated via **OpenACC** or **OpenMP**.  
-- Assume free-form Fortran 2008+, `implicit none`, explicit `intent`, and modern
-  intrinsics.  
-- Prefer `module … contains … subroutine foo()`; avoid `COMMON` blocks and
-  file-level `include` files.  
-- **Read the full codebase and docs *before* changing code.**  
-  Docs: <https://mflowcode.github.io/documentation/md_readme.html> and the respository root `README.md`.  
+- **Project**: *MFC* is modern Fortran 2008+ generated with **Fypp**.
+  - Sources `src/`, tests `tests/`, examples `examples/`.
+  - Most sources are `.fpp`; CMake transpiles them to `.f90`.
+- **Fypp macros** live in `src/<subprogram>/include/` you should scan these first.
+  `<subprogram>` ∈ {`simulation`,`common`,`pre_process`,`post_process`}.
+- Only `simulation` (+ its `common` calls) is GPU-accelerated via **OpenACC** or **OpenMP**.
+- Assume free-form Fortran 2008+, `implicit none`, explicit `intent`, and modern intrinsics.
+- Prefer `module … contains … subroutine foo()`; avoid `COMMON` blocks and file-level `include` files.
+- **Read the full codebase and docs *before* changing code.**
+  - Docs: <https://mflowcode.github.io/documentation/md_readme.html> and the repository root `README.md`.
 
 ### Incremental-change workflow
-1. Draft a step-by-step plan.  
-2. After each step, build:  
+
+1. Draft a step-by-step plan.
+2. After each step, build:
    ```bash
    ./mfc.sh build -t pre_process simulation -j $(nproc)
     ```
@@ -49,12 +47,10 @@ Written primarily for Fortran/Fypp; the OpenACC and style sections matter only w
   * Subroutine → `s_<verb>_<noun>` (e.g. `s_compute_flux`)
   * Function   → `f_<verb>_<noun>`
 * Private helpers stay in the module; avoid nested procedures.
-* **Size limits**: subroutine ≤ 500 lines, helper ≤ 150, function ≤ 100,
-  module/file ≤ 1000.
-* ≤ 6 arguments per routine; otherwise pass a derived-type “params” struct.
+* **Size limits**: subroutine ≤ 500 lines, helper ≤ 150, function ≤ 100, module/file ≤ 1000.
+* ≤ 6 arguments per routine; otherwise pass a derived-type "params" struct.
 * No `goto` (except unavoidable legacy); no global state (`COMMON`, `save`).
-* Every variable: `intent(in|out|inout)` + appropriate `dimension` / `allocatable`
-  / `pointer`.
+* Every variable: `intent(in|out|inout)` + appropriate `dimension` / `allocatable` / `pointer`.
 * Use `s_mpi_abort(<msg>)` for errors, not `stop`.
 * Mark GPU-callable helpers that are called from GPU parallel loops immediately after declaration:
   ```fortran
@@ -66,12 +62,41 @@ Written primarily for Fortran/Fypp; the OpenACC and style sections matter only w
 
 ---
 
-# 3  FYPP Macros for GPU acceleration Pogramming Guidelines (for kernels)
+# 3  File & Module Structure
 
-Do not directly use OpenACC or OpenMP directives directly. Instead, use the FYPP macros contained in src/common/include/parallel_macros.fpp
+- **File Naming**:
+  - `.fpp` files: Fypp preprocessed files that get translated to `.f90`
+  - Modules are named with `m_` prefix followed by feature name: `m_helper_basic`, `m_viscous`
+  - Primary program file is named `p_main.fpp`
 
-Wrap tight loops with
+- **Module Layout**:
+  - Start with Fypp include for macros: `#:include 'macros.fpp'`
+  - Header comments using `!>` style documentation
+  - `module` declaration with name matching filename
+  - `use` statements for dependencies
+  - `implicit none` statement
+  - `private` declaration followed by explicit `public` exports
+  - `contains` section
+  - Implementation of subroutines and functions
+
+---
+
+# 4  Fypp Macros
+
+- **Fypp Directives**:
+  - Start with `#:` (e.g., `#:include`, `#:def`, `#:enddef`)
+  - Macros defined in `include/*.fpp` files
+  - Used for code generation, conditional compilation, and GPU offloading
+
+---
 
+# 5  FYPP Macros for GPU Acceleration Programming Guidelines (for GPU kernels)
+
+- Do not use OpenACC or OpenMP directives directly.
+- Instead, use the FYPP macros contained in `src/common/include/parallel_macros.fpp`
+- Documentation on how to use the Fypp macros for GPU offloading is available at https://mflowcode.github.io/documentation/md_gpuParallelization.html
+
+Wrap tight loops with
 ```fortran
 $:GPU_PARALLEL_FOR(private='[...]', copy='[...]')
 ```
@@ -80,5 +105,73 @@ $:GPU_PARALLEL_FOR(private='[...]', copy='[...]')
 * Allocate large arrays with `managed` or move them into a persistent
   `$:GPU_ENTER_DATA(...)` region at start-up.
 * **Do not** place `stop` / `error stop` inside device code.
-* Must compile with Cray `ftn` and NVIDIA `nvfortran` for GPU offloading; also build CPU-only with
+* Must compile with Cray `ftn` or NVIDIA `nvfortran` for GPU offloading; also build CPU-only with
   GNU `gfortran` and Intel `ifx`/`ifort`.
+
+- Example GPU macros include the below, among others:
+  - `$:GPU_ROUTINE(parallelism='[seq]')` - Marks GPU-callable routines
+  - `$:GPU_PARALLEL_LOOP(collapse=N)` - Parallelizes loops
+  - `$:GPU_LOOP(parallelism='[seq]')` - Marks sequential loops
+  - `$:GPU_UPDATE(device='[var1,var2]')` - Updates device data
+  - `$:GPU_ENTER_DATA(copyin='[var]')` - Copies data to device
+  - `$:GPU_EXIT_DATA(delete='[var]')` - Removes data from device
+
+---
+
+# 6  Documentation Style
+
+- **Subroutine/Function Documentation**:
+  ```fortran
+  !> This procedure <description>
+  !! @param param_name Description of the parameter
+  !! @return Description of the return value (for functions)
+  ```
+which conforms to the Doxygen Fortran format.
+
+# 7  Error Handling
+
+- **Assertions**:
+  - Use the fypp `ASSERT` macro for validating conditions
+  - Example: `@:ASSERT(predicate, message)`
+
+- **Error Reporting**:
+  - Use `s_mpi_abort(error_message)` for error termination, not `stop`
+  - No `stop` / `error stop` inside device code
+
+# 8  Memory Management
+
+- **Allocation/Deallocation**:
+  - Use fypp macro `@:ALLOCATE(var1, var2)` macro for device-aware allocation
+  - Use fypp macro `@:DEALLOCATE(var1, var2)` macro for device-aware deallocation
+
+# 9. Additional Observed Patterns
+
+- **Derived Types**:
+  - Extensive use of derived types for encapsulation
+  - Use pointers within derived types (e.g., `pointer, dimension(:,:,:) => null()`)
+  - Clear documentation of derived type components
+
+- **Pure & Elemental Functions**:
+  - Use `pure` and `elemental` attributes for side-effect-free functions
+  - Combine them for operations on arrays (`pure elemental function`)
+
+- **Precision Handling**:
+  - Use `wp` (working precision) parameter from `m_precision_select`
+  - Never hardcode precision with `real*8` or similar
+
+- **Loop Optimization**:
+  - Favor array operations over explicit loops when possible
+  - Use `collapse=N` directive to optimize nested loops
+
+# 10. Fortran Practices to Avoid
+
+- **Fixed Format**: Only free-form Fortran is used
+  - No column-position dependent code
+
+- **Older Intrinsics**: Avoid outdated Fortran features like:
+  - `equivalence` statements
+  - `data` statements (use initialization expressions)
+  - Character*N (use `character(len=N)` instead)
+
+- **Using same variable for multiple purposes**: Maintain single responsibility
+  - Each variable should have one clear purpose
@@ -26,7 +26,9 @@
     "pp_suffixes": [".fpp"],
     "pp_defs": {
         "MFC": 1,
-        "MFC_DOUBLE_PRECISION": 1
+        "MFC_SINGLE_PRECISION": 1,
+        "MFC_OPENACC": 1,
+        "MFC_MPI": 1
     },
     "lowercase_intrinsics": true,
     "debug_log": false,
@@ -60,26 +62,11 @@
     "disable_diagnostics_for_external_modules": true,
     "max_line_length": -1,
     "max_comment_line_length": -1,
-    "symbol_skip_mem": [
-        "mpi_*"
-    ],
     "disable_var_diagnostics": false,
     "disable_fypp": false,
     "fypp_strict": false,
-    "error_suppression_list": [
-        "include-not-found",
-        "mod-not-found",
-        "module-not-found",
-        "declared-twice",
-        "no-matching-declaration",
-        "invalid-parent",
-        "parsing-error",
-        "fypp-error",
-        "preprocessor-error",
-        "implicit-type"
-    ],
     "incremental_sync": false,
-    "debug_parser": false,
+    "debug_parser": true,
     "skip_parse_errors": true,
     "disable_parser": [
         "src/post_process/m_data_output.fpp",
 
@@ -33,7 +33,7 @@ sbatch <<EOT
 #SBATCH -A CFD154                  # charge account
 #SBATCH -N 1                       # Number of nodes required
 $sbatch_device_opts
-#SBATCH -t 02:59:00                # Duration of the job (Ex: 15 mins)
+#SBATCH -t 03:59:00                # Duration of the job (Ex: 15 mins)
 #SBATCH -o$job_slug.out            # Combined output and error messages file
 #SBATCH -p extended                # Extended partition for shorter queues
 #SBATCH -W                         # Do not exit until the submitted job terminates.
 
@@ -479,17 +479,6 @@ function(MFC_SETUP_TARGET)
                     "-foffload-options=-lgfortran\ -lm"
                     "-fno-exceptions")
             elseif(CMAKE_Fortran_COMPILER_ID STREQUAL "NVHPC" OR CMAKE_Fortran_COMPILER_ID STREQUAL "PGI")
-                find_package(cuTENSOR)
-                if (NOT cuTENSOR_FOUND)
-                    message(WARNING
-                        "Failed to locate the NVIDIA cuTENSOR library. MFC will be "
-                        "built without support for it, disallowing the use of "
-                        "cu_tensor=T. This can result in degraded performance.")
-                else()
-                    target_link_libraries     (${a_target} PRIVATE cuTENSOR::cuTENSOR)
-                    target_compile_definitions(${a_target} PRIVATE MFC_cuTENSOR)
-                endif()
-
                 foreach (cc ${MFC_CUDA_CC})
                     target_compile_options(${a_target}
                         PRIVATE -gpu=cc${cc}