Skip to content

Commit 8339fc0

Browse files
authored
Update expectedPerformance.md
1 parent 73c455e commit 8339fc0

File tree

1 file changed

+33
-31
lines changed

1 file changed

+33
-31
lines changed
Lines changed: 33 additions & 31 deletions
Original file line numberDiff line numberDiff line change
@@ -1,76 +1,78 @@
11
# Performance Results
22

3-
MFC has been extensively benchmarked on both CPUs and GPUs. A summary of these results follow.
3+
MFC has been extensively benchmarked on CPUs and GPU devices.
4+
A summary of these results follows.
45

56
## Expected time-steps/hour
67

7-
The following table outlines expected performance in terms of number of time-steps per hour
8-
(rounded to the nearest hundred) for various problem sizes and hardware for a inviscid, 6-equation,
9-
3D simulation. CPU results utilize an entire die.
8+
The following table outlines expected performance in terms of the number of time steps per hour
9+
(rounded to the nearest hundred) for various problem sizes (grid cells) and hardware for an inviscid, 6-equation (`model_eqns' : 3`), 3D simulation.
10+
CPU results utilize an entire die.
1011

1112
| Hardware | # Ranks | 1M Cells | 4M Cells | 8M Cells | Compiler | Computer |
1213
| ---: | :----: | :----: | :---: | :---: | :----: | :--- |
13-
| Nvidia V100 | 1 | 88.5k | 18.7k | N/A | NVHPC 22.11 | PACE Phoenix |
14-
| Nvidia V100 | 1 | 78.8k | 18.8k | N/A | NVHPC 22.11 | OLCF Summit |
15-
| Nvidia A100 | 1 | 114.4k | 34.6k | 16.5k | NVHPC 23.5 | Wingtip |
16-
| AMD MI250x | 1 | 77.5k | 22.3k | 11.2k | CCE 16.0.1 | OLCF Frontier |
14+
| NVIDIA V100 | 1 | 88.5k | 18.7k | N/A | NVHPC 22.11 | PACE Phoenix |
15+
| NVIDIA V100 | 1 | 78.8k | 18.8k | N/A | NVHPC 22.11 | OLCF Summit |
16+
| NVIDIA A100 | 1 | 114.4k | 34.6k | 16.5k | NVHPC 23.5 | Wingtip |
17+
| AMD MI250X | 1 | 77.5k | 22.3k | 11.2k | CCE 16.0.1 | OLCF Frontier |
1718
| Intel Xeon Gold 6226 | 12 | 2.5k | 0.7k | 0.4k | GNU 10.3.0 | PACE Phoenix |
1819
| Apple Silicon M2 | 6 | 2.8k | 0.6k | 0.2k | GNU 13.2.0 | N/A |
1920

2021
If `'model_eqns' : 3` is replaced by `'model_eqns' : 2`, an inviscid 5-equation model is used.
21-
The following table outlines expected performance in terms of number of time-steps per hour
22-
(rounded to the nearest hundred) for various problem sizes and hardware for a inviscid, 5-equation,
23-
3D simulation. CPU results utilize an entire die.
22+
The following table outlines expected performance in terms of the number of time-steps per hour (rounded to the nearest hundred) for various problem sizes and hardware for an inviscid, 5-equation,
23+
3D simulation.
24+
CPU results utilize an entire die.
2425

2526
| Hardware | # Ranks | 1M Cells | 4M Cells | 8M Cells | Compiler | Computer |
2627
| ---: | :----: | :----: | :---: | :---: | :----: | :--- |
27-
| Nvidia V100 | 1 | 113.4k | 26.2k | 13.0k | NVHPC 22.11 | PACE Phoenix |
28-
| Nvidia V100 | 1 | 107.7k | 26.3k | 13.1k | NVHPC 22.11 | OLCF Summit |
29-
| Nvidia A100 | 1 | 153.5k | 48.0k | 22.5k | NVHPC 23.5 | Wingtip |
30-
| AMD MI250x | 1 | 104.2k | 31.0k | 14.8k | CCE 16.0.1 | OLCF Frontier |
28+
| NVIDIA V100 | 1 | 113.4k | 26.2k | 13.0k | NVHPC 22.11 | PACE Phoenix |
29+
| NVIDIA V100 | 1 | 107.7k | 26.3k | 13.1k | NVHPC 22.11 | OLCF Summit |
30+
| NVIDIA A100 | 1 | 153.5k | 48.0k | 22.5k | NVHPC 23.5 | Wingtip |
31+
| AMD MI250X | 1 | 104.2k | 31.0k | 14.8k | CCE 16.0.1 | OLCF Frontier |
3132
| Intel Xeon Gold 6226 | 12 | 5.4k | 1.6k | 0.8k | GNU 10.3.0 | PACE Phoenix |
3233
| Apple Silicon M2 | 6 | 3.7k | 11.0k | 0.3k | GNU 13.2.0 | N/A |
3334

3435
## Weak scaling
3536

36-
Strong scaling results are obtained by increasing the problem size with the number of processes
37-
so that work per process remains constant.
37+
Strong scaling results are obtained by increasing the problem size with the number of processes so that work per process remains constant.
3838

3939
### AMD MI250X GPU
40-
MFC weask scales to 65,536 AMD MI250X GPUs on OLCF Frontier with 96% efficiency. This corresponds to 87% of the entire machine.
40+
41+
MFC weask scales to (at least) 65,536 AMD MI250X GPUs on OLCF Frontier with 96% efficiency.
42+
This corresponds to 87% of the entire machine.
4143

4244
<img src="../res/weakScaling/frontier.svg" style="height: 50%; width:50%; border-radius: 10pt"/>
4345

44-
### Nvidia V100 GPU
45-
MFC weak scales to 13,824 V100 Nvidia V100 GPUs on OLCF Summit with 97% efficiency. This corresponds to 50% of the entire machine.
46+
### NVIDIA V100 GPU
47+
48+
MFC weak scales to (at least) 13,824 V100 NVIDIA V100 GPUs on OLCF Summit with 97% efficiency.
49+
This corresponds to 50% of the entire machine.
4650

4751
<img src="../res/weakScaling/summit.svg" style="height: 50%; width:50%; border-radius: 10pt"/>
4852

49-
### IMB Power9 CPU
53+
### IBM Power9 CPU
5054
MFC Weak scales to 13,824 Power9 CPU cores on OLCF Summit to within 1% of ideal scaling.
5155

5256
<img src="../res/weakScaling/cpuScaling.svg" style="height: 50%; width:50%; border-radius: 10pt"/>
5357

5458
## Strong scaling
5559

56-
Strong scaling results are obtained by keeping the problem size constant and increasing
57-
the number of process so that work per process decreases.
60+
Strong scaling results are obtained by keeping the problem size constant and increasing the number of processes so that work per process decreases.
5861

59-
### Nvidia V100 GPU
62+
### NVIDIA V100 GPU
6063

61-
For these tests, the base case utilizes 8 GPUs with one MPI process per GPU. The performance
62-
is analyzed at two different problem sizes of 16 and 64M grid points, with the base case using
63-
2 and 8M grid points per process.
64+
For these tests, the base case utilizes 8 GPUs with one MPI process per GPU.
65+
The performance is analyzed at two different problem sizes of 16M and 64M grid points, with the base case using 2M and 8M grid points per process.
6466

6567
#### 16M Grid Points
68+
6669
<img src="../res/strongScaling/strongScaling16.svg" style="width: 50%; border-radius: 10pt"/>
6770

6871
#### 64M Grid Points
6972
<img src="../res/strongScaling/strongScaling64.svg" style="width: 50%; border-radius: 10pt"/>
7073

71-
### IBM Power 9 CPU
74+
### IBM Power9 CPU
7275

73-
CPU strong scaling tests are done with problem sizes of 16, 32, and 64M grid points, with the
74-
base case using 2, 4, and 8M cells per process.
76+
CPU strong scaling tests are done with problem sizes of 16, 32, and 64M grid points, with the base case using 2, 4, and 8M cells per process.
7577

76-
<img src="../res/strongScaling/cpuStrongScaling.svg" style="width: 50%; border-radius: 10pt"/>
78+
<img src="../res/strongScaling/cpuStrongScaling.svg" style="width: 50%; border-radius: 10pt"/>

0 commit comments

Comments
 (0)