Skip to content
Merged
75 changes: 37 additions & 38 deletions .cursor/rules/mfc-agent-rules.mdc
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
----
-description: Full MFC project rules – consolidated for Agent Mode
-alwaysApply: true
----
---
description: Full MFC project rules – consolidated for Agent Mode
alwaysApply: true
---

# 0 Purpose & Scope
Consolidated guidance for the MFC exascale, many-physics solver.
Expand All @@ -19,7 +19,7 @@ Written primarily for Fortran/Fypp; the GPU and style sections matter only when
- Assume free-form Fortran 2008+, `implicit none`, explicit `intent`, and modern intrinsics.
- Prefer `module … contains … subroutine foo()`; avoid `COMMON` blocks and file-level `include` files.
- **Read the full codebase and docs *before* changing code.**
- Docs: <https://mflowcode.github.io/documentation/md_readme.html> and the respository root `README.md`.
- Docs: <https://mflowcode.github.io/documentation/md_readme.html> and the repository root `README.md`.

### Incremental-change workflow

Expand Down Expand Up @@ -62,27 +62,7 @@ Written primarily for Fortran/Fypp; the GPU and style sections matter only when

---

# 3 FYPP Macros for GPU acceleration Pogramming Guidelines (for GPU kernels)

Do not directly use OpenACC or OpenMP directives directly.
Instead, use the FYPP macros contained in src/common/include/parallel_macros.fpp

Wrap tight loops with

```fortran
$:GPU_PARALLEL_FOR(private='[...]', copy='[...]')
```
* Add `collapse=n` to merge nested loops when safe.
* Declare loop-local variables with `private='[...]'`.
* Allocate large arrays with `managed` or move them into a persistent
`$:GPU_ENTER_DATA(...)` region at start-up.
* **Do not** place `stop` / `error stop` inside device code.
* Must compile with Cray `ftn` and NVIDIA `nvfortran` for GPU offloading; also build CPU-only with
GNU `gfortran` and Intel `ifx`/`ifort`.

---

# 4 File & Module Structure
# 3 File & Module Structure

- **File Naming**:
- `.fpp` files: Fypp preprocessed files that get translated to `.f90`
Expand All @@ -99,25 +79,44 @@ $:GPU_PARALLEL_FOR(private='[...]', copy='[...]')
- `contains` section
- Implementation of subroutines and functions

# 5 Fypp Macros and GPU Acceleration
---

# 4 Fypp Macros

## Use of Fypp
- **Fypp Directives**:
- Start with `#:` (e.g., `#:include`, `#:def`, `#:enddef`)
- Macros defined in `include/*.fpp` files
- Used for code generation, conditional compilation, and GPU offloading

## Some examples
---

Documentation on how to use the Fypp macros for GPU offloading is available at https://mflowcode.github.io/documentation/md_gpuParallelization.html
# 5 FYPP Macros for GPU Acceleration Programming Guidelines (for GPU kernels)

Some examples include:
- `$:GPU_ROUTINE(parallelism='[seq]')` - Marks GPU-callable routines
- `$:GPU_PARALLEL_LOOP(collapse=N)` - Parallelizes loops
- `$:GPU_LOOP(parallelism='[seq]')` - Marks sequential loops
- `$:GPU_UPDATE(device='[var1,var2]')` - Updates device data
- `$:GPU_ENTER_DATA(copyin='[var]')` - Copies data to device
- `$:GPU_EXIT_DATA(delete='[var]')` - Removes data from device
- Do not use OpenACC or OpenMP directives directly.
- Instead, use the FYPP macros contained in `src/common/include/parallel_macros.fpp`
- Documentation on how to use the Fypp macros for GPU offloading is available at https://mflowcode.github.io/documentation/md_gpuParallelization.html

Wrap tight loops with
```fortran
$:GPU_PARALLEL_FOR(private='[...]', copy='[...]')
```
* Add `collapse=n` to merge nested loops when safe.
* Declare loop-local variables with `private='[...]'`.
* Allocate large arrays with `managed` or move them into a persistent
`$:GPU_ENTER_DATA(...)` region at start-up.
* **Do not** place `stop` / `error stop` inside device code.
* Must compile with Cray `ftn` or NVIDIA `nvfortran` for GPU offloading; also build CPU-only with
GNU `gfortran` and Intel `ifx`/`ifort`.

- Example GPU macros include the below, among others:
- `$:GPU_ROUTINE(parallelism='[seq]')` - Marks GPU-callable routines
- `$:GPU_PARALLEL_LOOP(collapse=N)` - Parallelizes loops
- `$:GPU_LOOP(parallelism='[seq]')` - Marks sequential loops
- `$:GPU_UPDATE(device='[var1,var2]')` - Updates device data
- `$:GPU_ENTER_DATA(copyin='[var]')` - Copies data to device
- `$:GPU_EXIT_DATA(delete='[var]')` - Removes data from device

---

# 6 Documentation Style

Expand All @@ -136,7 +135,7 @@ which conforms to the Doxygen Fortran format.
- Example: `@:ASSERT(predicate, message)`

- **Error Reporting**:
- Use `s_mpi_abort(<msg>)` for error termination, not `stop`
- Use `s_mpi_abort(error_message)` for error termination, not `stop`
- No `stop` / `error stop` inside device code

# 8 Memory Management
Expand Down
Loading
Loading