Skip to content

Commit c9a76d2

Browse files
author
Benjamin Wilfong
committed
merge conflict resolution
2 parents b4e0272 + 16de11c commit c9a76d2

File tree

185 files changed

+13254
-2338
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

185 files changed

+13254
-2338
lines changed

.cursor/rules/mfc-agent-rules.mdc

Lines changed: 126 additions & 31 deletions
Original file line numberDiff line numberDiff line change
@@ -4,29 +4,27 @@ alwaysApply: true
44
---
55

66
# 0 Purpose & Scope
7-
Consolidated guidance for the MFC exascale, many-physics solver.
8-
Written primarily for Fortran/Fypp; the OpenACC and style sections matter only when
9-
`.fpp` / `.f90` files are in view.
7+
Consolidated guidance for the MFC exascale, many-physics solver.
8+
Written primarily for Fortran/Fypp; the GPU and style sections matter only when `.fpp` / `.f90` files are in view.
109

1110
---
1211

1312
# 1 Global Project Context (always)
14-
- **Project**: *MFC* is modern Fortran 2008+ generated with **Fypp**.
15-
- Sources `src/`, tests `tests/`, examples `examples/`.
16-
- Most sources are `.fpp`; CMake transpiles them to `.f90`.
17-
- **Fypp macros** live in `src/<subprogram>/include/` you should scan these first.
18-
`<subprogram>` ∈ {`simulation`,`common`,`pre_process`,`post_process`}.
19-
- Only `simulation` (+ its `common` calls) is GPU-accelerated via **OpenACC**.
20-
- Assume free-form Fortran 2008+, `implicit none`, explicit `intent`, and modern
21-
intrinsics.
22-
- Prefer `module … contains … subroutine foo()`; avoid `COMMON` blocks and
23-
file-level `include` files.
24-
- **Read the full codebase and docs *before* changing code.**
25-
Docs: <https://mflowcode.github.io/documentation/md_readme.html> and the respository root `README.md`.
13+
- **Project**: *MFC* is modern Fortran 2008+ generated with **Fypp**.
14+
- Sources `src/`, tests `tests/`, examples `examples/`.
15+
- Most sources are `.fpp`; CMake transpiles them to `.f90`.
16+
- **Fypp macros** live in `src/<subprogram>/include/` you should scan these first.
17+
`<subprogram>` ∈ {`simulation`,`common`,`pre_process`,`post_process`}.
18+
- Only `simulation` (+ its `common` calls) is GPU-accelerated via **OpenACC** or **OpenMP**.
19+
- Assume free-form Fortran 2008+, `implicit none`, explicit `intent`, and modern intrinsics.
20+
- Prefer `module … contains … subroutine foo()`; avoid `COMMON` blocks and file-level `include` files.
21+
- **Read the full codebase and docs *before* changing code.**
22+
- Docs: <https://mflowcode.github.io/documentation/md_readme.html> and the repository root `README.md`.
2623

2724
### Incremental-change workflow
28-
1. Draft a step-by-step plan.
29-
2. After each step, build:
25+
26+
1. Draft a step-by-step plan.
27+
2. After each step, build:
3028
```bash
3129
./mfc.sh build -t pre_process simulation -j $(nproc)
3230
```
@@ -49,34 +47,131 @@ Written primarily for Fortran/Fypp; the OpenACC and style sections matter only w
4947
* Subroutine → `s_<verb>_<noun>` (e.g. `s_compute_flux`)
5048
* Function → `f_<verb>_<noun>`
5149
* Private helpers stay in the module; avoid nested procedures.
52-
* **Size limits**: subroutine ≤ 500 lines, helper ≤ 150, function ≤ 100,
53-
module/file ≤ 1000.
54-
* ≤ 6 arguments per routine; otherwise pass a derived-type “params” struct.
50+
* **Size limits**: subroutine ≤ 500 lines, helper ≤ 150, function ≤ 100, module/file ≤ 1000.
51+
* ≤ 6 arguments per routine; otherwise pass a derived-type "params" struct.
5552
* No `goto` (except unavoidable legacy); no global state (`COMMON`, `save`).
56-
* Every variable: `intent(in|out|inout)` + appropriate `dimension` / `allocatable`
57-
/ `pointer`.
53+
* Every variable: `intent(in|out|inout)` + appropriate `dimension` / `allocatable` / `pointer`.
5854
* Use `s_mpi_abort(<msg>)` for errors, not `stop`.
59-
* Mark OpenACC-callable helpers that are called from OpenACC parallel loops immediately after declaration:
55+
* Mark GPU-callable helpers that are called from GPU parallel loops immediately after declaration:
6056
```fortran
6157
subroutine s_flux_update(...)
62-
!$acc routine seq
58+
$:GPU_ROUTINE(function_name='s_flux_update', parallelism='[seq]')
6359
...
6460
end subroutine
6561
```
6662

6763
---
6864

69-
# 3 OpenACC Programming Guidelines (for kernels)
65+
# 3 File & Module Structure
7066

71-
Wrap tight loops with
67+
- **File Naming**:
68+
- `.fpp` files: Fypp preprocessed files that get translated to `.f90`
69+
- Modules are named with `m_` prefix followed by feature name: `m_helper_basic`, `m_viscous`
70+
- Primary program file is named `p_main.fpp`
71+
72+
- **Module Layout**:
73+
- Start with Fypp include for macros: `#:include 'macros.fpp'`
74+
- Header comments using `!>` style documentation
75+
- `module` declaration with name matching filename
76+
- `use` statements for dependencies
77+
- `implicit none` statement
78+
- `private` declaration followed by explicit `public` exports
79+
- `contains` section
80+
- Implementation of subroutines and functions
81+
82+
---
83+
84+
# 4 Fypp Macros
85+
86+
- **Fypp Directives**:
87+
- Start with `#:` (e.g., `#:include`, `#:def`, `#:enddef`)
88+
- Macros defined in `include/*.fpp` files
89+
- Used for code generation, conditional compilation, and GPU offloading
90+
91+
---
7292

93+
# 5 FYPP Macros for GPU Acceleration Programming Guidelines (for GPU kernels)
94+
95+
- Do not use OpenACC or OpenMP directives directly.
96+
- Instead, use the FYPP macros contained in `src/common/include/parallel_macros.fpp`
97+
- Documentation on how to use the Fypp macros for GPU offloading is available at https://mflowcode.github.io/documentation/md_gpuParallelization.html
98+
99+
Wrap tight loops with
73100
```fortran
74-
!$acc parallel loop gang vector default(present) reduction(...)
101+
$:GPU_PARALLEL_FOR(private='[...]', copy='[...]')
75102
```
76-
* Add `collapse(n)` to merge nested loops when safe.
77-
* Declare loop-local variables with `private(...)`.
103+
* Add `collapse=n` to merge nested loops when safe.
104+
* Declare loop-local variables with `private='[...]'`.
78105
* Allocate large arrays with `managed` or move them into a persistent
79-
`!$acc enter data` region at start-up.
106+
`$:GPU_ENTER_DATA(...)` region at start-up.
80107
* **Do not** place `stop` / `error stop` inside device code.
81-
* Must compile with Cray `ftn` and NVIDIA `nvfortran` for GPU offloading; also build CPU-only with
108+
* Must compile with Cray `ftn` or NVIDIA `nvfortran` for GPU offloading; also build CPU-only with
82109
GNU `gfortran` and Intel `ifx`/`ifort`.
110+
111+
- Example GPU macros include the below, among others:
112+
- `$:GPU_ROUTINE(parallelism='[seq]')` - Marks GPU-callable routines
113+
- `$:GPU_PARALLEL_LOOP(collapse=N)` - Parallelizes loops
114+
- `$:GPU_LOOP(parallelism='[seq]')` - Marks sequential loops
115+
- `$:GPU_UPDATE(device='[var1,var2]')` - Updates device data
116+
- `$:GPU_ENTER_DATA(copyin='[var]')` - Copies data to device
117+
- `$:GPU_EXIT_DATA(delete='[var]')` - Removes data from device
118+
119+
---
120+
121+
# 6 Documentation Style
122+
123+
- **Subroutine/Function Documentation**:
124+
```fortran
125+
!> This procedure <description>
126+
!! @param param_name Description of the parameter
127+
!! @return Description of the return value (for functions)
128+
```
129+
which conforms to the Doxygen Fortran format.
130+
131+
# 7 Error Handling
132+
133+
- **Assertions**:
134+
- Use the fypp `ASSERT` macro for validating conditions
135+
- Example: `@:ASSERT(predicate, message)`
136+
137+
- **Error Reporting**:
138+
- Use `s_mpi_abort(error_message)` for error termination, not `stop`
139+
- No `stop` / `error stop` inside device code
140+
141+
# 8 Memory Management
142+
143+
- **Allocation/Deallocation**:
144+
- Use fypp macro `@:ALLOCATE(var1, var2)` macro for device-aware allocation
145+
- Use fypp macro `@:DEALLOCATE(var1, var2)` macro for device-aware deallocation
146+
147+
# 9. Additional Observed Patterns
148+
149+
- **Derived Types**:
150+
- Extensive use of derived types for encapsulation
151+
- Use pointers within derived types (e.g., `pointer, dimension(:,:,:) => null()`)
152+
- Clear documentation of derived type components
153+
154+
- **Pure & Elemental Functions**:
155+
- Use `pure` and `elemental` attributes for side-effect-free functions
156+
- Combine them for operations on arrays (`pure elemental function`)
157+
158+
- **Precision Handling**:
159+
- Use `wp` (working precision) parameter from `m_precision_select`
160+
- Never hardcode precision with `real*8` or similar
161+
162+
- **Loop Optimization**:
163+
- Favor array operations over explicit loops when possible
164+
- Use `collapse=N` directive to optimize nested loops
165+
166+
# 10. Fortran Practices to Avoid
167+
168+
- **Fixed Format**: Only free-form Fortran is used
169+
- No column-position dependent code
170+
171+
- **Older Intrinsics**: Avoid outdated Fortran features like:
172+
- `equivalence` statements
173+
- `data` statements (use initialization expressions)
174+
- Character*N (use `character(len=N)` instead)
175+
176+
- **Using same variable for multiple purposes**: Maintain single responsibility
177+
- Each variable should have one clear purpose

.fortlsrc

Lines changed: 83 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,83 @@
1+
{
2+
"source_dirs": [
3+
"src/",
4+
"src/common/",
5+
"src/simulation/",
6+
"src/pre_process/",
7+
"src/post_process/"
8+
],
9+
"excl_paths": [
10+
"benchmarks/",
11+
"examples/",
12+
"tests/",
13+
"misc/",
14+
"src/pre_process/include/2dHardcodedIC.fpp",
15+
"src/pre_process/include/3dHardcodedIC.fpp",
16+
"src/pre_process/include/ExtrusionHardcodedIC.fpp",
17+
"**/m_nvtx*",
18+
"**/syscheck.fpp"
19+
],
20+
"include_dirs": [
21+
"src/common/include/",
22+
"src/simulation/include/",
23+
"src/pre_process/include/",
24+
"src/post_process/include/"
25+
],
26+
"pp_suffixes": [".fpp"],
27+
"pp_defs": {
28+
"MFC": 1,
29+
"MFC_SINGLE_PRECISION": 1,
30+
"MFC_OPENACC": 1,
31+
"MFC_MPI": 1
32+
},
33+
"lowercase_intrinsics": true,
34+
"debug_log": false,
35+
"disable_diagnostics": false,
36+
"use_signature_help": true,
37+
"variable_hover": true,
38+
"hover_signature": true,
39+
"enable_code_actions": true,
40+
"mod_dirs": [
41+
"build/pre_process/",
42+
"build/simulation/",
43+
"build/post_process/",
44+
"build/common/"
45+
],
46+
"ext_mod_dirs": [
47+
"/usr/include/",
48+
"/usr/local/include/",
49+
"/opt/homebrew/include/"
50+
],
51+
"implicit_external_mods": [
52+
"mpi",
53+
"m_thermochem",
54+
"m_variables_conversion",
55+
"hipfort",
56+
"hipfort_check",
57+
"hipfort_hipfft",
58+
"cutensorex",
59+
"silo_f9x",
60+
"m_model"
61+
],
62+
"disable_diagnostics_for_external_modules": true,
63+
"max_line_length": -1,
64+
"max_comment_line_length": -1,
65+
"disable_var_diagnostics": false,
66+
"disable_fypp": false,
67+
"fypp_strict": false,
68+
"incremental_sync": false,
69+
"debug_parser": true,
70+
"skip_parse_errors": true,
71+
"disable_parser": [
72+
"src/post_process/m_data_output.fpp",
73+
"src/pre_process/include/ExtrusionHardcodedIC.fpp",
74+
"src/pre_process/m_checker.fpp",
75+
"src/pre_process/include/2dHardcodedIC.fpp",
76+
"src/pre_process/include/3dHardcodedIC.fpp",
77+
"src/simulation/m_qbmm.fpp",
78+
"src/common/m_variables_conversion.fpp",
79+
"src/simulation/m_global_parameters.fpp",
80+
"**/m_nvtx*",
81+
"**/syscheck.fpp"
82+
]
83+
}

.github/workflows/bench.yml

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -97,10 +97,11 @@ jobs:
9797
run: |
9898
cat pr/bench-${{ matrix.device }}.* 2>/dev/null || true
9999
cat master/bench-${{ matrix.device }}.* 2>/dev/null || true
100-
101-
- name: Archive Logs
100+
101+
# All other runners (non-Phoenix) just run without special env
102+
- name: Archive Logs (Frontier)
103+
if: always() && matrix.cluster != 'phoenix'
102104
uses: actions/upload-artifact@v4
103-
if: always()
104105
with:
105106
name: ${{ matrix.cluster }}-${{ matrix.device }}
106107
path: |

.github/workflows/frontier/submit-bench.sh

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -32,7 +32,7 @@ sbatch <<EOT
3232
#SBATCH -A CFD154 # charge account
3333
#SBATCH -N 1 # Number of nodes required
3434
$sbatch_device_opts
35-
#SBATCH -t 03:59:00 # Duration of the job (Ex: 15 mins)
35+
#SBATCH -t 02:59:00 # Duration of the job (Ex: 15 mins)
3636
#SBATCH -o$job_slug.out # Combined output and error messages file
3737
#SBATCH -p extended # Extended partition for shorter queues
3838
#SBATCH -W # Do not exit until the submitted job terminates.

.github/workflows/lint-source.yml

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -34,6 +34,10 @@ jobs:
3434
find ./src -type f -not -name '*nvtx*' -exec sh -c 'fortitude check "$1" | grep -v E001' _ {} \;
3535
find ./src -type f -not -name '*nvtx*' -exec sh -c 'fortitude check "$1" | grep -v E001' _ {} \; | wc -l | xargs -I{} sh -c '[ {} -gt 0 ] && exit 1 || exit 0'
3636
37+
- name: Looking for raw directives
38+
run: |
39+
! grep -iR '!\$acc\|!\$omp' --exclude="parallel_macros.fpp" --exclude="syscheck.fpp" ./src/*
40+
3741
- name: No double precision intrinsics
3842
run: |
3943
! grep -iR 'double_precision\|dsqrt\|dexp\|dlog\|dble\|dabs\|double\ precision\|real(8)\|real(4)\|dprod\|dmin\|dmax\|dfloat\|dreal\|dcos\|dsin\|dtan\|dsign\|dtanh\|dsinh\|dcosh\|d0' --exclude-dir=syscheck --exclude="*nvtx*" --exclude="*precision_select*" ./src/*

.github/workflows/phoenix/bench.sh

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,8 @@
22

33
n_ranks=12
44

5-
if [ "$job_device" = "gpu" ]; then
5+
echo "My benchmarking device is:" $device
6+
if [ "$device" = "gpu" ]; then
67
n_ranks=$(nvidia-smi -L | wc -l) # number of GPUs on node
78
gpu_ids=$(seq -s ' ' 0 $(($n_ranks-1))) # 0,1,2,...,gpu_count-1
89
device_opts="--gpu -g $gpu_ids"
@@ -15,7 +16,7 @@ mkdir -p $currentdir
1516

1617
export TMPDIR=$currentdir
1718

18-
if [ "$job_device" = "gpu" ]; then
19+
if [ "$device" = "gpu" ]; then
1920
./mfc.sh bench --mem 12 -j $(nproc) -o "$job_slug.yaml" -- -c phoenix-bench $device_opts -n $n_ranks
2021
else
2122
./mfc.sh bench --mem 1 -j $(nproc) -o "$job_slug.yaml" -- -c phoenix-bench $device_opts -n $n_ranks

0 commit comments

Comments
 (0)