@@ -4,29 +4,27 @@ alwaysApply: true
44---
55
66# 0 Purpose & Scope
7- Consolidated guidance for the MFC exascale, many-physics solver.
8- Written primarily for Fortran/Fypp; the OpenACC and style sections matter only when
9- `.fpp` / `.f90` files are in view.
7+ Consolidated guidance for the MFC exascale, many-physics solver.
8+ Written primarily for Fortran/Fypp; the GPU and style sections matter only when `.fpp` / `.f90` files are in view.
109
1110---
1211
1312# 1 Global Project Context (always)
14- - **Project**: *MFC* is modern Fortran 2008+ generated with **Fypp**.
15- - Sources `src/`, tests `tests/`, examples `examples/`.
16- - Most sources are `.fpp`; CMake transpiles them to `.f90`.
17- - **Fypp macros** live in `src/<subprogram>/include/` you should scan these first.
18- `<subprogram>` ∈ {`simulation`,`common`,`pre_process`,`post_process`}.
19- - Only `simulation` (+ its `common` calls) is GPU-accelerated via **OpenACC**.
20- - Assume free-form Fortran 2008+, `implicit none`, explicit `intent`, and modern
21- intrinsics.
22- - Prefer `module … contains … subroutine foo()`; avoid `COMMON` blocks and
23- file-level `include` files.
24- - **Read the full codebase and docs *before* changing code.**
25- Docs: <https://mflowcode.github.io/documentation/md_readme.html> and the respository root `README.md`.
13+ - **Project**: *MFC* is modern Fortran 2008+ generated with **Fypp**.
14+ - Sources `src/`, tests `tests/`, examples `examples/`.
15+ - Most sources are `.fpp`; CMake transpiles them to `.f90`.
16+ - **Fypp macros** live in `src/<subprogram>/include/` you should scan these first.
17+ `<subprogram>` ∈ {`simulation`,`common`,`pre_process`,`post_process`}.
18+ - Only `simulation` (+ its `common` calls) is GPU-accelerated via **OpenACC** or **OpenMP**.
19+ - Assume free-form Fortran 2008+, `implicit none`, explicit `intent`, and modern intrinsics.
20+ - Prefer `module … contains … subroutine foo()`; avoid `COMMON` blocks and file-level `include` files.
21+ - **Read the full codebase and docs *before* changing code.**
22+ - Docs: <https://mflowcode.github.io/documentation/md_readme.html> and the repository root `README.md`.
2623
2724### Incremental-change workflow
28- 1. Draft a step-by-step plan.
29- 2. After each step, build:
25+
26+ 1. Draft a step-by-step plan.
27+ 2. After each step, build:
3028 ```bash
3129 ./mfc.sh build -t pre_process simulation -j $(nproc)
3230 ```
@@ -49,34 +47,131 @@ Written primarily for Fortran/Fypp; the OpenACC and style sections matter only w
4947 * Subroutine → `s_<verb>_<noun>` (e.g. `s_compute_flux`)
5048 * Function → `f_<verb>_<noun>`
5149* Private helpers stay in the module; avoid nested procedures.
52- * **Size limits**: subroutine ≤ 500 lines, helper ≤ 150, function ≤ 100,
53- module/file ≤ 1000.
54- * ≤ 6 arguments per routine; otherwise pass a derived-type “params” struct.
50+ * **Size limits**: subroutine ≤ 500 lines, helper ≤ 150, function ≤ 100, module/file ≤ 1000.
51+ * ≤ 6 arguments per routine; otherwise pass a derived-type "params" struct.
5552* No `goto` (except unavoidable legacy); no global state (`COMMON`, `save`).
56- * Every variable: `intent(in|out|inout)` + appropriate `dimension` / `allocatable`
57- / `pointer`.
53+ * Every variable: `intent(in|out|inout)` + appropriate `dimension` / `allocatable` / `pointer`.
5854* Use `s_mpi_abort(<msg>)` for errors, not `stop`.
59- * Mark OpenACC -callable helpers that are called from OpenACC parallel loops immediately after declaration:
55+ * Mark GPU -callable helpers that are called from GPU parallel loops immediately after declaration:
6056 ```fortran
6157 subroutine s_flux_update(...)
62- !$acc routine seq
58+ $:GPU_ROUTINE(function_name='s_flux_update', parallelism='[ seq]')
6359 ...
6460 end subroutine
6561 ```
6662
6763---
6864
69- # 3 OpenACC Programming Guidelines (for kernels)
65+ # 3 File & Module Structure
7066
71- Wrap tight loops with
67+ - **File Naming**:
68+ - `.fpp` files: Fypp preprocessed files that get translated to `.f90`
69+ - Modules are named with `m_` prefix followed by feature name: `m_helper_basic`, `m_viscous`
70+ - Primary program file is named `p_main.fpp`
71+
72+ - **Module Layout**:
73+ - Start with Fypp include for macros: `#:include 'macros.fpp'`
74+ - Header comments using `!>` style documentation
75+ - `module` declaration with name matching filename
76+ - `use` statements for dependencies
77+ - `implicit none` statement
78+ - `private` declaration followed by explicit `public` exports
79+ - `contains` section
80+ - Implementation of subroutines and functions
81+
82+ ---
83+
84+ # 4 Fypp Macros
85+
86+ - **Fypp Directives**:
87+ - Start with `#:` (e.g., `#:include`, `#:def`, `#:enddef`)
88+ - Macros defined in `include/*.fpp` files
89+ - Used for code generation, conditional compilation, and GPU offloading
90+
91+ ---
7292
93+ # 5 FYPP Macros for GPU Acceleration Programming Guidelines (for GPU kernels)
94+
95+ - Do not use OpenACC or OpenMP directives directly.
96+ - Instead, use the FYPP macros contained in `src/common/include/parallel_macros.fpp`
97+ - Documentation on how to use the Fypp macros for GPU offloading is available at https://mflowcode.github.io/documentation/md_gpuParallelization.html
98+
99+ Wrap tight loops with
73100```fortran
74- !$acc parallel loop gang vector default(present) reduction( ...)
101+ $:GPU_PARALLEL_FOR(private='[...]', copy='[ ...]' )
75102```
76- * Add `collapse(n) ` to merge nested loops when safe.
77- * Declare loop-local variables with `private( ...) `.
103+ * Add `collapse=n ` to merge nested loops when safe.
104+ * Declare loop-local variables with `private='[ ...]' `.
78105* Allocate large arrays with `managed` or move them into a persistent
79- `!$acc enter data ` region at start-up.
106+ `$:GPU_ENTER_DATA(...) ` region at start-up.
80107* **Do not** place `stop` / `error stop` inside device code.
81- * Must compile with Cray `ftn` and NVIDIA `nvfortran` for GPU offloading; also build CPU-only with
108+ * Must compile with Cray `ftn` or NVIDIA `nvfortran` for GPU offloading; also build CPU-only with
82109 GNU `gfortran` and Intel `ifx`/`ifort`.
110+
111+ - Example GPU macros include the below, among others:
112+ - `$:GPU_ROUTINE(parallelism='[seq]')` - Marks GPU-callable routines
113+ - `$:GPU_PARALLEL_LOOP(collapse=N)` - Parallelizes loops
114+ - `$:GPU_LOOP(parallelism='[seq]')` - Marks sequential loops
115+ - `$:GPU_UPDATE(device='[var1,var2]')` - Updates device data
116+ - `$:GPU_ENTER_DATA(copyin='[var]')` - Copies data to device
117+ - `$:GPU_EXIT_DATA(delete='[var]')` - Removes data from device
118+
119+ ---
120+
121+ # 6 Documentation Style
122+
123+ - **Subroutine/Function Documentation**:
124+ ```fortran
125+ !> This procedure <description>
126+ !! @param param_name Description of the parameter
127+ !! @return Description of the return value (for functions)
128+ ```
129+ which conforms to the Doxygen Fortran format.
130+
131+ # 7 Error Handling
132+
133+ - **Assertions**:
134+ - Use the fypp `ASSERT` macro for validating conditions
135+ - Example: `@:ASSERT(predicate, message)`
136+
137+ - **Error Reporting**:
138+ - Use `s_mpi_abort(error_message)` for error termination, not `stop`
139+ - No `stop` / `error stop` inside device code
140+
141+ # 8 Memory Management
142+
143+ - **Allocation/Deallocation**:
144+ - Use fypp macro `@:ALLOCATE(var1, var2)` macro for device-aware allocation
145+ - Use fypp macro `@:DEALLOCATE(var1, var2)` macro for device-aware deallocation
146+
147+ # 9. Additional Observed Patterns
148+
149+ - **Derived Types**:
150+ - Extensive use of derived types for encapsulation
151+ - Use pointers within derived types (e.g., `pointer, dimension(:,:,:) => null()`)
152+ - Clear documentation of derived type components
153+
154+ - **Pure & Elemental Functions**:
155+ - Use `pure` and `elemental` attributes for side-effect-free functions
156+ - Combine them for operations on arrays (`pure elemental function`)
157+
158+ - **Precision Handling**:
159+ - Use `wp` (working precision) parameter from `m_precision_select`
160+ - Never hardcode precision with `real*8` or similar
161+
162+ - **Loop Optimization**:
163+ - Favor array operations over explicit loops when possible
164+ - Use `collapse=N` directive to optimize nested loops
165+
166+ # 10. Fortran Practices to Avoid
167+
168+ - **Fixed Format**: Only free-form Fortran is used
169+ - No column-position dependent code
170+
171+ - **Older Intrinsics**: Avoid outdated Fortran features like:
172+ - `equivalence` statements
173+ - `data` statements (use initialization expressions)
174+ - Character*N (use `character(len=N)` instead)
175+
176+ - **Using same variable for multiple purposes**: Maintain single responsibility
177+ - Each variable should have one clear purpose
0 commit comments