1- ---
2- description: Full MFC project rules – consolidated for Agent Mode
3- alwaysApply: true
4- ---
1+ ----
2+ - description: Full MFC project rules – consolidated for Agent Mode
3+ - alwaysApply: true
4+ ----
55
66# 0 Purpose & Scope
77Consolidated guidance for the MFC exascale, many-physics solver.
8- Written primarily for Fortran/Fypp; the OpenACC and style sections matter only when `.fpp` / `.f90` files are in view.
8+ Written primarily for Fortran/Fypp; the GPU and style sections matter only when `.fpp` / `.f90` files are in view.
99
1010---
1111
@@ -15,15 +15,14 @@ Written primarily for Fortran/Fypp; the OpenACC and style sections matter only w
1515 - Most sources are `.fpp`; CMake transpiles them to `.f90`.
1616- **Fypp macros** live in `src/<subprogram>/include/` you should scan these first.
1717 `<subprogram>` ∈ {`simulation`,`common`,`pre_process`,`post_process`}.
18- - Only `simulation` (+ its `common` calls) is GPU-accelerated via **OpenACC**.
19- - Assume free-form Fortran 2008+, `implicit none`, explicit `intent`, and modern
20- intrinsics.
21- - Prefer `module … contains … subroutine foo()`; avoid `COMMON` blocks and
22- file-level `include` files.
18+ - Only `simulation` (+ its `common` calls) is GPU-accelerated via **OpenACC** or **OpenMP**.
19+ - Assume free-form Fortran 2008+, `implicit none`, explicit `intent`, and modern intrinsics.
20+ - Prefer `module … contains … subroutine foo()`; avoid `COMMON` blocks and file-level `include` files.
2321- **Read the full codebase and docs *before* changing code.**
24- Docs: <https://mflowcode.github.io/documentation/md_readme.html> and the repository root `README.md`.
22+ - Docs: <https://mflowcode.github.io/documentation/md_readme.html> and the respository root `README.md`.
2523
2624### Incremental-change workflow
25+
27261. Draft a step-by-step plan.
28272. After each step, build:
2928 ```bash
@@ -48,34 +47,35 @@ Written primarily for Fortran/Fypp; the OpenACC and style sections matter only w
4847 * Subroutine → `s_<verb>_<noun>` (e.g. `s_compute_flux`)
4948 * Function → `f_<verb>_<noun>`
5049* Private helpers stay in the module; avoid nested procedures.
51- * **Size limits**: subroutine ≤ 500 lines, helper ≤ 150, function ≤ 100,
52- module/file ≤ 1000.
50+ * **Size limits**: subroutine ≤ 500 lines, helper ≤ 150, function ≤ 100, module/file ≤ 1000.
5351* ≤ 6 arguments per routine; otherwise pass a derived-type "params" struct.
5452* No `goto` (except unavoidable legacy); no global state (`COMMON`, `save`).
55- * Every variable: `intent(in|out|inout)` + appropriate `dimension` / `allocatable`
56- / `pointer`.
53+ * Every variable: `intent(in|out|inout)` + appropriate `dimension` / `allocatable` / `pointer`.
5754* Use `s_mpi_abort(<msg>)` for errors, not `stop`.
58- * Mark OpenACC -callable helpers that are called from OpenACC parallel loops immediately after declaration:
55+ * Mark GPU -callable helpers that are called from GPU parallel loops immediately after declaration:
5956 ```fortran
6057 subroutine s_flux_update(...)
61- !$acc routine seq
58+ $:GPU_ROUTINE(function_name='s_flux_update', parallelism='[ seq]')
6259 ...
6360 end subroutine
6461 ```
6562
6663---
6764
68- # 3 OpenACC Programming Guidelines (for kernels)
65+ # 3 FYPP Macros for GPU acceleration Pogramming Guidelines (for GPU kernels)
66+
67+ Do not directly use OpenACC or OpenMP directives directly.
68+ Instead, use the FYPP macros contained in src/common/include/parallel_macros.fpp
6969
7070Wrap tight loops with
7171
7272```fortran
73- !$acc parallel loop gang vector default(present) reduction( ...)
73+ $:GPU_PARALLEL_FOR(private='[...]', copy='[ ...]' )
7474```
75- * Add `collapse(n) ` to merge nested loops when safe.
76- * Declare loop-local variables with `private( ...) `.
75+ * Add `collapse=n ` to merge nested loops when safe.
76+ * Declare loop-local variables with `private='[ ...]' `.
7777* Allocate large arrays with `managed` or move them into a persistent
78- `!$acc enter data ` region at start-up.
78+ `$:GPU_ENTER_DATA(...) ` region at start-up.
7979* **Do not** place `stop` / `error stop` inside device code.
8080* Must compile with Cray `ftn` and NVIDIA `nvfortran` for GPU offloading; also build CPU-only with
8181 GNU `gfortran` and Intel `ifx`/`ifort`.
@@ -101,18 +101,23 @@ Wrap tight loops with
101101
102102# 5 Fypp Macros and GPU Acceleration
103103
104+ ## Use of Fypp
104105- **Fypp Directives**:
105106 - Start with `#:` (e.g., `#:include`, `#:def`, `#:enddef`)
106107 - Macros defined in `include/*.fpp` files
107108 - Used for code generation, conditional compilation, and GPU offloading
108109
109- - **GPU Macros**:
110- - `$:GPU_ROUTINE(parallelism='[seq]')` - Marks GPU-callable routines
111- - `$:GPU_PARALLEL_LOOP(collapse=N)` - Parallelizes loops
112- - `$:GPU_LOOP(parallelism='[seq]')` - Marks sequential loops
113- - `$:GPU_UPDATE(device='[var1,var2]')` - Updates device data
114- - `$:GPU_ENTER_DATA(copyin='[var]')` - Copies data to device
115- - `$:GPU_EXIT_DATA(delete='[var]')` - Removes data from device
110+ ## Some examples
111+
112+ Documentation on how to use the Fypp macros for GPU offloading is available at https://mflowcode.github.io/documentation/md_gpuParallelization.html
113+
114+ Some examples include:
115+ - `$:GPU_ROUTINE(parallelism='[seq]')` - Marks GPU-callable routines
116+ - `$:GPU_PARALLEL_LOOP(collapse=N)` - Parallelizes loops
117+ - `$:GPU_LOOP(parallelism='[seq]')` - Marks sequential loops
118+ - `$:GPU_UPDATE(device='[var1,var2]')` - Updates device data
119+ - `$:GPU_ENTER_DATA(copyin='[var]')` - Copies data to device
120+ - `$:GPU_EXIT_DATA(delete='[var]')` - Removes data from device
116121
117122# 6 Documentation Style
118123
@@ -122,11 +127,12 @@ Wrap tight loops with
122127 !! @param param_name Description of the parameter
123128 !! @return Description of the return value (for functions)
124129 ```
130+ which conforms to the Doxygen Fortran format.
125131
126132# 7 Error Handling
127133
128134- **Assertions**:
129- - Use `ASSERT` macro for validating conditions
135+ - Use the fypp `ASSERT` macro for validating conditions
130136 - Example: `@:ASSERT(predicate, message)`
131137
132138- **Error Reporting**:
@@ -136,8 +142,8 @@ Wrap tight loops with
136142# 8 Memory Management
137143
138144- **Allocation/Deallocation**:
139- - Use `@:ALLOCATE(var1, var2)` macro for device-aware allocation
140- - Use `@:DEALLOCATE(var1, var2)` macro for device-aware deallocation
145+ - Use fypp macro `@:ALLOCATE(var1, var2)` macro for device-aware allocation
146+ - Use fypp macro `@:DEALLOCATE(var1, var2)` macro for device-aware deallocation
141147
142148# 9. Additional Observed Patterns
143149
@@ -156,7 +162,7 @@ Wrap tight loops with
156162
157163- **Loop Optimization**:
158164 - Favor array operations over explicit loops when possible
159- - Use `collapse(N) ` directive to optimize nested loops
165+ - Use `collapse=N ` directive to optimize nested loops
160166
161167# 10. Fortran Practices to Avoid
162168
0 commit comments