Skip to content

Commit b365477

Browse files
authored
Merge branch 'master' into mixlayer
2 parents 98e854c + b263cf3 commit b365477

File tree

162 files changed

+8068
-7414
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

162 files changed

+8068
-7414
lines changed

.cursor/rules/mfc-agent-rules.mdc

Lines changed: 82 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,82 @@
1+
---
2+
description: Full MFC project rules – consolidated for Agent Mode
3+
alwaysApply: true
4+
---
5+
6+
# 0 Purpose & Scope
7+
Consolidated guidance for the MFC exascale, many-physics solver.
8+
Written primarily for Fortran/Fypp; the OpenACC and style sections matter only when
9+
`.fpp` / `.f90` files are in view.
10+
11+
---
12+
13+
# 1 Global Project Context (always)
14+
- **Project**: *MFC* is modern Fortran 2008+ generated with **Fypp**.
15+
- Sources `src/`, tests `tests/`, examples `examples/`.
16+
- Most sources are `.fpp`; CMake transpiles them to `.f90`.
17+
- **Fypp macros** live in `src/<subprogram>/include/` you should scan these first.
18+
`<subprogram>` ∈ {`simulation`,`common`,`pre_process`,`post_process`}.
19+
- Only `simulation` (+ its `common` calls) is GPU-accelerated via **OpenACC**.
20+
- Assume free-form Fortran 2008+, `implicit none`, explicit `intent`, and modern
21+
intrinsics.
22+
- Prefer `module … contains … subroutine foo()`; avoid `COMMON` blocks and
23+
file-level `include` files.
24+
- **Read the full codebase and docs *before* changing code.**
25+
Docs: <https://mflowcode.github.io/documentation/md_readme.html> and the respository root `README.md`.
26+
27+
### Incremental-change workflow
28+
1. Draft a step-by-step plan.
29+
2. After each step, build:
30+
```bash
31+
./mfc.sh build -t pre_process simulation -j $(nproc)
32+
```
33+
3. If it compiles, run focused tests:
34+
```bash
35+
./mfc.sh test -j $(nproc) -f EA8FA07E -t 9E2CA336
36+
```
37+
4. Roll back & fix if a step fails.
38+
39+
* Do not run ./mfc.sh test -j $(nproc) without any other arguments (it takes too long to run all tests).
40+
41+
---
42+
43+
# 2 Style & Naming Conventions (for \*.fpp / \*.f90)
44+
45+
* **Indent 2 spaces**; continuation lines align under `&`.
46+
* Lower-case keywords and intrinsics (`do`, `end subroutine`, …).
47+
* **Modules**: `m_<feature>` (e.g. `m_transport`).
48+
* **Public procedures**:
49+
* Subroutine → `s_<verb>_<noun>` (e.g. `s_compute_flux`)
50+
* Function → `f_<verb>_<noun>`
51+
* Private helpers stay in the module; avoid nested procedures.
52+
* **Size limits**: subroutine ≤ 500 lines, helper ≤ 150, function ≤ 100,
53+
module/file ≤ 1000.
54+
* ≤ 6 arguments per routine; otherwise pass a derived-type “params” struct.
55+
* No `goto` (except unavoidable legacy); no global state (`COMMON`, `save`).
56+
* Every variable: `intent(in|out|inout)` + appropriate `dimension` / `allocatable`
57+
/ `pointer`.
58+
* Use `s_mpi_abort(<msg>)` for errors, not `stop`.
59+
* Mark OpenACC-callable helpers that are called from OpenACC parallel loops immediately after declaration:
60+
```fortran
61+
subroutine s_flux_update(...)
62+
!$acc routine seq
63+
...
64+
end subroutine
65+
```
66+
67+
---
68+
69+
# 3 OpenACC Programming Guidelines (for kernels)
70+
71+
Wrap tight loops with
72+
73+
```fortran
74+
!$acc parallel loop gang vector default(present) reduction(...)
75+
```
76+
* Add `collapse(n)` to merge nested loops when safe.
77+
* Declare loop-local variables with `private(...)`.
78+
* Allocate large arrays with `managed` or move them into a persistent
79+
`!$acc enter data` region at start-up.
80+
* **Do not** place `stop` / `error stop` inside device code.
81+
* Must compile with Cray `ftn` and NVIDIA `nvfortran` for GPU offloading; also build CPU-only with
82+
GNU `gfortran` and Intel `ifx`/`ifort`.

.github/pull_request_template.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -54,5 +54,5 @@ To make sure the code is performing as expected on GPU devices, I have:
5454
- [ ] Ran the code on MI200+ GPUs and ensure the new features performed as expected (the GPU results match the CPU results)
5555
- [ ] Enclosed the new feature via `nvtx` ranges so that they can be identified in profiles
5656
- [ ] Ran a Nsight Systems profile using `./mfc.sh run XXXX --gpu -t simulation --nsys`, and have attached the output file (`.nsys-rep`) and plain text results to this PR
57-
- [ ] Ran an Omniperf profile using `./mfc.sh run XXXX --gpu -t simulation --omniperf`, and have attached the output file and plain text results to this PR.
57+
- [ ] Ran a Rocprof Systems profile using `./mfc.sh run XXXX --gpu -t simulation --rsys --hip-trace`, and have attached the output file and plain text results to this PR.
5858
- [ ] Ran my code using various numbers of different GPUs (1, 2, and 8, for example) in parallel and made sure that the results scale similarly to what happens if you run without the new code/feature
Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,9 @@
11
#!/bin/bash
22

3+
build_opts=""
4+
if [ "$1" == "gpu" ]; then
5+
build_opts="--gpu"
6+
fi
7+
38
. ./mfc.sh load -c f -m g
4-
./mfc.sh test --dry-run -j 8 --gpu
9+
./mfc.sh test --dry-run -j 8 $build_opts

.github/workflows/frontier/submit.sh

Lines changed: 14 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -13,16 +13,29 @@ else
1313
exit 1
1414
fi
1515

16+
if [ "$2" == "cpu" ]; then
17+
sbatch_device_opts="\
18+
#SBATCH -n 32 # Number of cores required"
19+
elif [ "$2" == "gpu" ]; then
20+
sbatch_device_opts="\
21+
#SBATCH -n 8 # Number of cores required"
22+
else
23+
usage
24+
exit 1
25+
fi
26+
27+
1628
job_slug="`basename "$1" | sed 's/\.sh$//' | sed 's/[^a-zA-Z0-9]/-/g'`-$2"
1729

1830
sbatch <<EOT
1931
#!/bin/bash
2032
#SBATCH -JMFC-$job_slug # Job name
2133
#SBATCH -A CFD154 # charge account
2234
#SBATCH -N 1 # Number of nodes required
23-
#SBATCH -n 8 # Number of cores required
35+
$sbatch_device_opts
2436
#SBATCH -t 01:59:00 # Duration of the job (Ex: 15 mins)
2537
#SBATCH -o$job_slug.out # Combined output and error messages file
38+
#SBATCH -p extended # Extended partition for shorter queues
2639
#SBATCH -q debug # Use debug QOS - only one job per user allowed in queue!
2740
#SBATCH -W # Do not exit until the submitted job terminates.
2841

.github/workflows/frontier/test.sh

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -3,5 +3,8 @@
33
gpus=`rocm-smi --showid | awk '{print $1}' | grep -Eo '[0-9]+' | uniq | tr '\n' ' '`
44
ngpus=`echo "$gpus" | tr -d '[:space:]' | wc -c`
55

6-
./mfc.sh test --max-attempts 3 -j $ngpus -- -c frontier
7-
6+
if [ "$job_device" == "gpu" ]; then
7+
./mfc.sh test --max-attempts 3 -j $ngpus -- -c frontier
8+
else
9+
./mfc.sh test --max-attempts 3 -j 32 -- -c frontier
10+
fi

.github/workflows/phoenix/bench.sh

Lines changed: 7 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -8,8 +8,13 @@ if [ "$job_device" == "gpu" ]; then
88
device_opts="--gpu -g $gpu_ids"
99
fi
1010

11+
mkdir -p /storage/scratch1/6/sbryngelson3/mytmp_build
12+
export TMPDIR=/storage/scratch1/6/sbryngelson3/mytmp_build
13+
1114
if ["$job_device" == "gpu"]; then
12-
./mfc.sh bench --mem 12 -j $(nproc) -o "$job_slug.yaml" -- -c phoenix $device_opts -n $n_ranks
15+
./mfc.sh bench --mem 12 -j $(nproc) -o "$job_slug.yaml" -- -c phoenix-bench $device_opts -n $n_ranks
1316
else
14-
./mfc.sh bench --mem 1 -j $(nproc) -o "$job_slug.yaml" -- -c phoenix $device_opts -n $n_ranks
17+
./mfc.sh bench --mem 1 -j $(nproc) -o "$job_slug.yaml" -- -c phoenix-bench $device_opts -n $n_ranks
1518
fi
19+
20+
unset TMPDIR

.github/workflows/phoenix/submit.sh

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -42,7 +42,7 @@ sbatch <<EOT
4242
#SBATCH --account=gts-sbryngelson3 # charge account
4343
#SBATCH -N1 # Number of nodes required
4444
$sbatch_device_opts
45-
#SBATCH -t 02:00:00 # Duration of the job (Ex: 15 mins)
45+
#SBATCH -t 03:00:00 # Duration of the job (Ex: 15 mins)
4646
#SBATCH -q embers # QOS Name
4747
#SBATCH -o$job_slug.out # Combined output and error messages file
4848
#SBATCH -W # Do not exit until the submitted job terminates.

.github/workflows/test.yml

Lines changed: 1 addition & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -97,9 +97,6 @@ jobs:
9797
matrix:
9898
device: ['cpu', 'gpu']
9999
lbl: ['gt', 'frontier']
100-
exclude:
101-
- device: cpu
102-
lbl: frontier
103100
runs-on:
104101
group: phoenix
105102
labels: ${{ matrix.lbl }}
@@ -116,7 +113,7 @@ jobs:
116113

117114
- name: Build
118115
if: matrix.lbl == 'frontier'
119-
run: bash .github/workflows/frontier/build.sh
116+
run: bash .github/workflows/frontier/build.sh ${{ matrix.device }}
120117

121118
- name: Test
122119
if: matrix.lbl == 'frontier'

CMakeLists.txt

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -135,13 +135,17 @@ if (CMAKE_Fortran_COMPILER_ID STREQUAL "GNU")
135135
if (CMAKE_BUILD_TYPE STREQUAL "Debug")
136136
add_compile_options(
137137
-Wall
138+
-Wextra
138139
-fcheck=all,no-array-temps
139140
-fbacktrace
140141
-fimplicit-none
141-
#-ffpe-trap=invalid,zero,denormal,overflow
142142
-fsignaling-nans
143143
-finit-real=snan
144144
-finit-integer=-99999999
145+
-Wintrinsic-shadow
146+
-Wunderflow
147+
-Wrealloc-lhs
148+
-Wsurprising
145149
)
146150
endif()
147151

docs/documentation/case.md

Lines changed: 2 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -436,7 +436,7 @@ The effect and use of the source term are assessed by [Schmidmayer et al., 2019]
436436
- `time_stepper` specifies the order of the Runge-Kutta (RK) time integration scheme that is used for temporal integration in simulation, from the 1st to 5th order by corresponding integer.
437437
Note that `time_stepper = 3` specifies the total variation diminishing (TVD), third order RK scheme ([Gottlieb and Shu, 1998](references.md)).
438438

439-
- `adap_dt` activates the Strang operator splitting scheme which splits flux and source terms in time marching, and an adaptive time stepping strategy is implemented for the source term. It requires ``bubbles = 'T'``, ``polytropic = 'T'``, ``adv_n = 'T'`` and `time_stepper = 3`.
439+
- `adap_dt` activates the Strang operator splitting scheme which splits flux and source terms in time marching, and an adaptive time stepping strategy is implemented for the source term. It requires ``bubbles_euler = 'T'``, ``polytropic = 'T'``, ``adv_n = 'T'`` and `time_stepper = 3`. Additionally, it can be used with ``bubbles_lagrange = 'T'`` and `time_stepper = 3`
440440

441441
- `weno_order` specifies the order of WENO scheme that is used for spatial reconstruction of variables by an integer of 1, 3, 5, and 7, that correspond to the 1st, 3rd, 5th, and 7th order, respectively.
442442

@@ -461,7 +461,7 @@ It is recommended to set `weno_eps` to $10^{-6}$ for WENO-JS, and to $10^{-40}$
461461
`riemann_solver = 1`, `2`, and `3` correspond to HLL, HLLC, and Exact Riemann solver, respectively ([Toro, 2013](references.md)).
462462
`riemann_solver = 4` is only for MHD simulations. It resolves 5 of the full seven-wave structure of the MHD equations ([Miyoshi and Kusano, 2005](references.md)).
463463

464-
- `low_Mach` specifies the choice of the low Mach number correction scheme for the HLLC Riemann solver. `low_Mach = 0` is default value and does not apply any correction scheme. `low_Mach = 1` and `2` apply the anti-dissipation pressure correction method ([Chen et al., 2022](references.md)) and the improved velocity reconstruction method ([Thornber et al., 2008](references.md)). This feature requires `riemann_solver = 2` and `model_eqns = 2`.
464+
- `low_Mach` specifies the choice of the low Mach number correction scheme for the HLLC Riemann solver. `low_Mach = 0` is default value and does not apply any correction scheme. `low_Mach = 1` and `2` apply the anti-dissipation pressure correction method ([Chen et al., 2022](references.md)) and the improved velocity reconstruction method ([Thornber et al., 2008](references.md)). This feature requires `model_eqns = 2` or `3`. `low_Mach = 1` works for `riemann_solver = 1` and `2`, but `low_Mach = 2` only works for `riemann_solver = 2`.
465465

466466
- `avg_state` specifies the choice of the method to compute averaged variables at the cell-boundaries from the left and the right states in the Riemann solver by an integer of 1 or 2.
467467
`avg_state = 1` and `2` correspond to Roe- and arithmetic averages, respectively.
@@ -790,8 +790,6 @@ When ``polytropic = 'F'``, the gas compression is modeled as non-polytropic due
790790
| `x0` | Real | Reference length |
791791
| `Thost` | Real | Temperature of the surrounding liquid (host) |
792792
| `diffcoefvap` | Real | Vapor diffusivity in the gas |
793-
| `rkck_adap_dt` | Logical | Activates the adaptive rkck time stepping algorithm |
794-
| `rkck_tolerance` | Real | Admissible error truncation tolerance in the rkck stepper |
795793

796794
- `nBubs_glb` Total number of bubbles. Their initial conditions need to be specified in the ./input/lag_bubbles.dat file. See the example cases for additional information.
797795

@@ -805,8 +803,6 @@ When ``polytropic = 'F'``, the gas compression is modeled as non-polytropic due
805803

806804
- `massTransfer_model` Activates the mass transfer model at the bubble's interface based on ([Preston et al., 2007](references.md)).
807805

808-
- `rkck_adap_dt` Activates the adaptive 4th/5th order Runge—Kutta–Cash–Karp (RKCK) time-stepping algorithm (requires `time_stepper ==4`). A maximum error between the 4th and 5th order Runge-Kutta-Cash-Karp solutions for the same time step size is calculated. If the error is smaller than a tolerance (`rkck_tolerance`), then the algorithm employs the 5th order solution, while if not, both eulerian/lagrangian variables are re-calculated with a smaller time step size.
809-
810806
### 10. Velocity Field Setup
811807

812808
| Parameter | Type | Description |

0 commit comments

Comments
 (0)