Skip to content

Commit b01d6fa

Browse files
authored
Merge branch 'master' into Diffusion
2 parents 100309e + eb61616 commit b01d6fa

35 files changed

+1655
-1307
lines changed

.github/workflows/frontier/build.sh

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,6 @@ if [ "$2" == "bench" ]; then
1313
./mfc.sh run "$dir/case.py" --case-optimization -j 8 --dry-run $build_opts
1414
done
1515
else
16-
./mfc.sh test -a --dry-run --rdma-mpi --generate -j 8 $build_opts
16+
./mfc.sh test -a --dry-run --rdma-mpi -j 8 $build_opts
1717
fi
1818

.github/workflows/frontier/test.sh

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,5 +6,5 @@ ngpus=`echo "$gpus" | tr -d '[:space:]' | wc -c`
66
if [ "$job_device" = "gpu" ]; then
77
./mfc.sh test -a --rdma-mpi --max-attempts 3 -j $ngpus -- -c frontier
88
else
9-
./mfc.sh test -a --rdma-mpi --max-attempts 3 -j 32 -- -c frontier
9+
./mfc.sh test -a --max-attempts 3 -j 32 -- -c frontier
1010
fi

.github/workflows/test.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -53,7 +53,7 @@ jobs:
5353
run: |
5454
brew update
5555
brew upgrade
56-
brew install coreutils python cmake fftw hdf5 gcc@15 boost open-mpi lapack
56+
brew install coreutils python fftw hdf5 gcc@15 boost open-mpi lapack
5757
echo "FC=gfortran-15" >> $GITHUB_ENV
5858
echo "BOOST_INCLUDE=/opt/homebrew/include/" >> $GITHUB_ENV
5959

.pr_agent.toml

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -4,12 +4,12 @@
44
pr_commands = ["/describe", "/review", "/improve"]
55

66
[pr_reviewer] # (all fields optional)
7-
num_max_findings = 5 # how many items to surface
8-
require_tests_review = true
7+
num_max_findings = 10 # how many items to surface
8+
require_tests_review = true
99
extra_instructions = """
1010
Focus on duplicate code, the possibility of bugs, and if the PR added appropriate tests if it added a simulation feature.
1111
"""
1212

1313
[pr_code_suggestions]
14-
commitable_code_suggestions = false # purely advisory, no write ops
15-
apply_suggestions_checkbox = false # hides the “Apply/Chat” boxes
14+
commitable_code_suggestions = true
15+
apply_suggestions_checkbox = true

README.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -28,7 +28,7 @@
2828
**Welcome!**
2929
MFC simulates compressible multi-phase flows, [among other things](#what-else-can-this-thing-do).
3030
It uses metaprogramming to stay short and portable (~20K lines).
31-
MFC conducted the largest known, open CFD simulation at <a href="https://arxiv.org/abs/2505.07392" target="_blank">101 trillion grid points</a> (as of July 2025).
31+
MFC conducted the largest known, open CFD simulation at <a href="https://arxiv.org/abs/2505.07392" target="_blank">200 trillion grid points</a>, and 1 quadrillion degrees of freedom (as of September 2025), and is a 2025 Gordon Bell Prize finalist.
3232

3333
<p align="center">
3434
<a href="https://doi.org/10.48550/arXiv.2503.07953" target="_blank">
@@ -187,7 +187,7 @@ They are organized below.
187187

188188
* GPU compatible on NVIDIA ([P/V/A/H]100, GH200, etc.) and AMD (MI[1/2/3]00+) GPU and APU hardware
189189
* Ideal weak scaling to 100% of the largest GPU and superchip supercomputers
190-
* \>36K AMD APUs (MI300A) on [LLNL El Capitan](https://hpc.llnl.gov/hardware/compute-platforms/el-capitan)
190+
* \>43K AMD APUs (MI300A) on [LLNL El Capitan](https://hpc.llnl.gov/hardware/compute-platforms/el-capitan)
191191
* \>3K AMD APUs (MI300A) on [LLNL Tuolumne](https://hpc.llnl.gov/hardware/compute-platforms/tuolumne)
192192
* \>33K AMD GPUs (MI250X) on [OLCF Frontier](https://www.olcf.ornl.gov/frontier/)
193193
* \>10K NVIDIA GPUs (V100) on [OLCF Summit](https://www.olcf.ornl.gov/summit/)
@@ -199,7 +199,7 @@ They are organized below.
199199

200200
* [Fypp](https://fypp.readthedocs.io/en/stable/fypp.html) metaprogramming for code readability, performance, and portability
201201
* Continuous Integration (CI)
202-
* > 500 Regression tests with each PR.
202+
* \>500 Regression tests with each PR.
203203
* Performed with GNU (GCC), Intel (oneAPI), Cray (CCE), and NVIDIA (NVHPC) compilers on NVIDIA and AMD GPUs.
204204
* Line-level test coverage reports via [Codecov](https://app.codecov.io/gh/MFlowCode/MFC) and `gcov`
205205
* Benchmarking to avoid performance regressions and identify speed-ups

examples/scaling/FRONTIER_BENCH.md

Lines changed: 92 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,92 @@
1+
# Description
2+
3+
The scripts and case file in this directory are set up to benchmarking strong
4+
and weak scaling performance as well as single device absolute performance on
5+
OLCF Frontier. The case file is for a three dimensional, two fluid liquid--gas
6+
problem without viscosity or surface tension. The scripts contained here have
7+
been tested for the default node counts and problem sizes in the scripts. The
8+
reference data in `reference.dat` also makes use of the default node counts and
9+
problem sizes and will need to be regenerated if either changes. The benchmarks
10+
can be run with the following steps:
11+
12+
## Getting the code
13+
14+
The code is hosted on GitHub and can be cloned with the following command:
15+
16+
```bash
17+
git clone [email protected]:MFlowCode/MFC.git; cd MFC; chmod u+x examples/scaling/*.sh;
18+
```
19+
20+
The above command clones the repository, changes directories in the repository
21+
root, and makes the benchmark scripts executable.
22+
23+
## Running the benchmarks
24+
25+
### Step 1: Building
26+
27+
The code for the benchmarks is built with the following command
28+
```
29+
./examples/scaling/build.sh
30+
```
31+
32+
### Step 2: Running
33+
34+
The benchmarks can be run in their default configuration with the following
35+
```
36+
./examples/scaling/submit_all.sh --account <account_name>
37+
```
38+
By default this will submit the following jobs for benchmarking
39+
40+
| Job | Nodes | Description |
41+
| ------------------ | ----- | ------------------------------------------------------------------- |
42+
| `MFC-W-16-64` | 16 | Weak scaling calculation with a ~64GB problem per GCD on 16 nodes |
43+
| `MFC-W-128-64` | 128 | Weak scaling calculation with a ~64GB problem per GCD on 128 nodes |
44+
| `MFC-W-1024-64` | 1024 | Weak scaling calculation with a ~64GB problem per GCD on 1024 nodes |
45+
| `MFC-W-8192-64` | 8192 | Weak scaling calculation with a ~64GB problem per GCD on 8192 nodes |
46+
| `MFC-S-8-4096` | 8 | Strong scaling calculation with a ~4096GB problem on 8 nodes |
47+
| `MFC-S-64-4096` | 64 | Strong scaling calculation with a ~4096GB problem on 64 nodes |
48+
| `MFC-S-512-4096` | 512 | Strong scaling calculation with a ~4096GB problem on 512 nodes |
49+
| `MFC-S-4096-4096` | 4096 | Strong scaling calculation with a ~4096GB problem on 4096 nodes |
50+
| `MFC-G-8` | 1 | Single device grind time calculation with ~8GB per GCD |
51+
| `MFC-G-16` | 1 | Single device grind time calculation with ~16GB per GCD |
52+
| `MFC-G-32` | 1 | Single device grind time calculation with ~32GB per GCD |
53+
| `MFC-G-64` | 1 | Single device grind time calculation with ~64GB per GCD |
54+
Strong and weak scaling cases run `pre_process` once and then run `simulation`
55+
with and without GPU-aware MPI in a single job. Individual benchmarks can be run
56+
by calling the `submit_[strong,weak,grind].sh` scripts directly, or modifying
57+
the `submit_all.sh` script to fit your needs.
58+
59+
#### Modifying the benchmarks
60+
The submitted jobs can be modified by appending options to the `submit_all.sh`
61+
script. For examples, appending
62+
```
63+
--nodes "1,2,4,8"
64+
```
65+
to the `submit_strong.sh` and `submit_weak.sh` scripts will run the strong and
66+
weak scaling benchmarks on 1, 2, 4, and 8 nodes. Appending
67+
```
68+
--mem "x,y"
69+
```
70+
will modify the approximate problem size in terms of GB of memory
71+
(see the `submit_[strong,weak,grind].sh` for details on what this number refers
72+
to for the different types of tests).
73+
74+
### Step 3: Post processing
75+
76+
The log files can be post processed into a more human readable format with
77+
```
78+
python3 examples/scaling/analyze.py
79+
```
80+
This Python script generates a table of results in the command line with
81+
comparison to the reference data in `reference.dat`. The `rel_perf` column
82+
compares the raw run times of the current results to the reference data.
83+
Relative performance numbers small than 1.0 indicate a speedup and numbers larger
84+
than one indicate a slowdown relative to the reference data. The selected problem
85+
sizes are intended to be comparable to the tiny, small, medium, and large labels
86+
used by the SpecHPC benchmark.
87+
88+
## Common errors
89+
90+
The only common failure point identified during testing were "text file busy"
91+
errors causing job failures. These errors are intermittent and are usually
92+
resolved by resubmitting the test.

examples/scaling/README.md

Lines changed: 3 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,10 @@
1-
# Strong- & Weak-scaling
1+
# Scaling and Performance test
22

33
The scaling case can exercise both weak- and strong-scaling. It
44
adjusts itself depending on the number of requested ranks.
55

6-
This directory also contains a collection of scripts used to test strong-scaling
7-
on OLCF Frontier. They required modifying MFC to collect some metrics but are
8-
meant to serve as a reference to users wishing to run similar experiments.
6+
This directory also contains a collection of scripts used to test strong and weak
7+
scaling on OLCF Frontier.
98

109
## Weak Scaling
1110

examples/scaling/analyze.py

Lines changed: 177 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,177 @@
1+
import os, re
2+
import pandas as pd
3+
from io import StringIO
4+
5+
6+
def parse_time_avg(path):
7+
last_val = None
8+
pattern = re.compile(r"Time Avg =\s*([0-9.E+-]+)")
9+
with open(path) as f:
10+
for line in f:
11+
match = pattern.search(line)
12+
if match:
13+
last_val = float(match.group(1))
14+
return last_val
15+
16+
17+
def parse_grind_time(path):
18+
last_val = None
19+
pattern = re.compile(r"Performance: \s*([0-9.E+-]+)")
20+
with open(path) as f:
21+
for line in f:
22+
match = pattern.search(line)
23+
if match:
24+
last_val = float(match.group(1))
25+
return last_val
26+
27+
28+
def parse_reference_file(filename):
29+
with open(filename) as f:
30+
content = f.read()
31+
32+
records = []
33+
blocks = re.split(r"\n(?=Weak|Strong|Grind)", content.strip())
34+
35+
for block in blocks:
36+
lines = block.strip().splitlines()
37+
header = lines[0].strip()
38+
body = "\n".join(lines[1:])
39+
40+
df = pd.read_csv(StringIO(body), delim_whitespace=True)
41+
42+
if header.startswith("Weak Scaling"):
43+
# Parse metadata from header
44+
mem_match = re.search(r"Memory: ~(\d+)GB", header)
45+
rdma_match = re.search(r"RDMA: (\w)", header)
46+
memory = int(mem_match.group(1)) if mem_match else None
47+
rdma = rdma_match.group(1) if rdma_match else None
48+
49+
for _, row in df.iterrows():
50+
records.append({"scaling": "weak", "nodes": int(row["nodes"]), "memory": memory, "rdma": rdma, "phase": "sim", "time_avg": row["time_avg"], "efficiency": row["efficiency"]})
51+
52+
elif header.startswith("Strong Scaling"):
53+
mem_match = re.search(r"Memory: ~(\d+)GB", header)
54+
rdma_match = re.search(r"RDMA: (\w)", header)
55+
memory = int(mem_match.group(1)) if mem_match else None
56+
rdma = rdma_match.group(1) if rdma_match else None
57+
58+
for _, row in df.iterrows():
59+
records.append(
60+
{
61+
"scaling": "strong",
62+
"nodes": int(row["nodes"]),
63+
"memory": memory,
64+
"rdma": rdma,
65+
"phase": "sim",
66+
"time_avg": row["time_avg"],
67+
"speedup": row["speedup"],
68+
"efficiency": row["efficiency"],
69+
}
70+
)
71+
72+
elif header.startswith("Grind Time"):
73+
for _, row in df.iterrows():
74+
records.append({"scaling": "grind", "memory": int(row["memory"]), "grind_time": row["grind_time"]})
75+
76+
return pd.DataFrame(records)
77+
78+
79+
# Get log files and filter for simulation logs
80+
files = os.listdir("examples/scaling/logs/")
81+
files = [f for f in files if "sim" in f]
82+
83+
records = []
84+
for fname in files:
85+
# Remove extension
86+
parts = fname.replace(".out", "").split("-")
87+
scaling, nodes, memory, rdma, phase = parts
88+
records.append({"scaling": scaling, "nodes": int(nodes), "memory": int(memory), "rdma": rdma, "phase": phase, "file": fname})
89+
90+
df = pd.DataFrame(records)
91+
92+
ref_data = parse_reference_file("examples/scaling/reference.dat")
93+
94+
print()
95+
96+
weak_df = df[df["scaling"] == "weak"]
97+
strong_df = df[df["scaling"] == "strong"]
98+
grind_df = df[df["scaling"] == "grind"]
99+
100+
weak_ref_df = ref_data[ref_data["scaling"] == "weak"]
101+
strong_ref_df = ref_data[ref_data["scaling"] == "strong"]
102+
grind_ref_df = ref_data[ref_data["scaling"] == "grind"]
103+
104+
weak_scaling_mem = weak_df["memory"].unique()
105+
weak_scaling_rdma = weak_df["rdma"].unique()
106+
107+
for mem in weak_scaling_mem:
108+
for rdma in weak_scaling_rdma:
109+
subset = weak_df[(weak_df["memory"] == mem) & (weak_df["rdma"] == rdma)]
110+
subset = subset.sort_values(by="nodes")
111+
ref = weak_ref_df[(weak_ref_df["memory"] == mem) & (weak_ref_df["rdma"] == rdma) & (weak_ref_df["nodes"].isin(subset["nodes"]))]
112+
ref = ref.sort_values(by="nodes")
113+
114+
times = []
115+
for _, row in subset.iterrows():
116+
time_avg = parse_time_avg(os.path.join("examples/scaling/logs", row["file"]))
117+
times.append(time_avg)
118+
119+
subset = subset.copy()
120+
ref = ref.copy()
121+
subset["time_avg"] = times
122+
base_time = subset.iloc[0]["time_avg"]
123+
124+
subset["efficiency"] = base_time / subset["time_avg"]
125+
subset["rel_perf"] = subset["time_avg"] / ref["time_avg"].values
126+
print(f"Weak Scaling - Memory: ~{mem}GB, RDMA: {rdma}")
127+
print(subset[["nodes", "time_avg", "efficiency", "rel_perf"]].to_string(index=False))
128+
print()
129+
130+
strong_scaling_mem = strong_df["memory"].unique()
131+
strong_scaling_rdma = strong_df["rdma"].unique()
132+
133+
for mem in strong_scaling_mem:
134+
for rdma in strong_scaling_rdma:
135+
subset = strong_df[(strong_df["memory"] == mem) & (strong_df["rdma"] == rdma)]
136+
subset = subset.sort_values(by="nodes")
137+
138+
ref = strong_ref_df[(strong_ref_df["memory"] == mem) & (strong_ref_df["rdma"] == rdma) & (strong_ref_df["nodes"].isin(subset["nodes"]))]
139+
ref = ref.sort_values(by="nodes")
140+
141+
times = []
142+
for _, row in subset.iterrows():
143+
time_avg = parse_time_avg(os.path.join("examples/scaling/logs", row["file"]))
144+
times.append(time_avg)
145+
146+
subset = subset.copy()
147+
ref = ref.copy()
148+
subset["time_avg"] = times
149+
base_time = subset.iloc[0]["time_avg"]
150+
151+
subset["speedup"] = base_time / subset["time_avg"]
152+
subset["efficiency"] = base_time / ((subset["nodes"] / subset.iloc[0]["nodes"]) * subset["time_avg"])
153+
subset["rel_perf"] = subset["time_avg"] / ref["time_avg"].values
154+
print(f"Strong Scaling - Memory: ~{mem}GB, RDMA: {rdma}")
155+
print(subset[["nodes", "time_avg", "speedup", "efficiency", "rel_perf"]].to_string(index=False))
156+
print()
157+
158+
if not grind_df.empty:
159+
grind_mem = grind_df["memory"].unique()
160+
subset = grind_df.sort_values(by="memory")
161+
ref = grind_ref_df[(grind_ref_df["memory"].isin(subset["memory"]))]
162+
ref = ref.sort_values(by="memory")
163+
164+
times = []
165+
for _, row in subset.iterrows():
166+
grind_time = parse_grind_time(os.path.join("examples/scaling/logs", row["file"]))
167+
times.append(grind_time)
168+
169+
subset = subset.copy()
170+
ref = ref.copy()
171+
172+
subset["grind_time"] = times
173+
subset["rel_perf"] = subset["grind_time"] / ref["grind_time"].values
174+
print(f"Grind Time - Single Device")
175+
print(subset[["memory", "grind_time", "rel_perf"]].to_string(index=False))
176+
177+
print()

examples/scaling/build.sh

100644100755
Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,6 @@
11
#!/bin/bash
22

3+
. ./mfc.sh load -c f -m g
4+
35
./mfc.sh build -t pre_process simulation --case-optimization -i examples/scaling/case.py \
4-
-j 8 --gpu --mpi --no-debug -- -s strong -m 512
6+
-j 8 --gpu --mpi --no-debug -- -s strong -m 512

0 commit comments

Comments
 (0)