|
| 1 | +# Performance Results |
| 2 | + |
| 3 | +MFC has been extensively benchmarked on CPUs and GPU devices. |
| 4 | +A summary of these results follows. |
| 5 | + |
| 6 | +## Expected time-steps/hour |
| 7 | + |
| 8 | +The following table outlines expected performance in terms of the number of time steps per hour |
| 9 | +(rounded to the nearest hundred) for various problem sizes (grid cells) and hardware for an inviscid, 6-equation (`model_eqns' : 3`), 3D simulation. |
| 10 | +CPU results utilize an entire die. |
| 11 | + |
| 12 | +| Hardware | # Ranks | 1M Cells | 4M Cells | 8M Cells | Compiler | Computer | |
| 13 | +| ---: | :----: | :----: | :---: | :---: | :----: | :--- | |
| 14 | +| NVIDIA V100 | 1 | 88.5k | 18.7k | N/A | NVHPC 22.11 | PACE Phoenix | |
| 15 | +| NVIDIA V100 | 1 | 78.8k | 18.8k | N/A | NVHPC 22.11 | OLCF Summit | |
| 16 | +| NVIDIA A100 | 1 | 114.4k | 34.6k | 16.5k | NVHPC 23.5 | Wingtip | |
| 17 | +| AMD MI250X | 1 | 77.5k | 22.3k | 11.2k | CCE 16.0.1 | OLCF Frontier | |
| 18 | +| Intel Xeon Gold 6226 | 12 | 2.5k | 0.7k | 0.4k | GNU 10.3.0 | PACE Phoenix | |
| 19 | +| Apple Silicon M2 | 6 | 2.8k | 0.6k | 0.2k | GNU 13.2.0 | N/A | |
| 20 | + |
| 21 | +If `'model_eqns' : 3` is replaced by `'model_eqns' : 2`, an inviscid 5-equation model is used. |
| 22 | +The following table outlines expected performance in terms of the number of time-steps per hour (rounded to the nearest hundred) for various problem sizes and hardware for an inviscid, 5-equation, |
| 23 | +3D simulation. |
| 24 | +CPU results utilize an entire die. |
| 25 | + |
| 26 | +| Hardware | # Ranks | 1M Cells | 4M Cells | 8M Cells | Compiler | Computer | |
| 27 | +| ---: | :----: | :----: | :---: | :---: | :----: | :--- | |
| 28 | +| NVIDIA V100 | 1 | 113.4k | 26.2k | 13.0k | NVHPC 22.11 | PACE Phoenix | |
| 29 | +| NVIDIA V100 | 1 | 107.7k | 26.3k | 13.1k | NVHPC 22.11 | OLCF Summit | |
| 30 | +| NVIDIA A100 | 1 | 153.5k | 48.0k | 22.5k | NVHPC 23.5 | Wingtip | |
| 31 | +| AMD MI250X | 1 | 104.2k | 31.0k | 14.8k | CCE 16.0.1 | OLCF Frontier | |
| 32 | +| Intel Xeon Gold 6226 | 12 | 5.4k | 1.6k | 0.8k | GNU 10.3.0 | PACE Phoenix | |
| 33 | +| Apple Silicon M2 | 6 | 3.7k | 11.0k | 0.3k | GNU 13.2.0 | N/A | |
| 34 | + |
| 35 | +## Weak scaling |
| 36 | + |
| 37 | +Weak scaling results are obtained by increasing the problem size with the number of processes so that work per process remains constant. |
| 38 | + |
| 39 | +### AMD MI250X GPU |
| 40 | + |
| 41 | +MFC weask scales to (at least) 65,536 AMD MI250X GPUs on OLCF Frontier with 96% efficiency. |
| 42 | +This corresponds to 87% of the entire machine. |
| 43 | + |
| 44 | +<img src="../res/weakScaling/frontier.svg" style="height: 50%; width:50%; border-radius: 10pt"/> |
| 45 | + |
| 46 | +### NVIDIA V100 GPU |
| 47 | + |
| 48 | +MFC weak scales to (at least) 13,824 V100 NVIDIA V100 GPUs on OLCF Summit with 97% efficiency. |
| 49 | +This corresponds to 50% of the entire machine. |
| 50 | + |
| 51 | +<img src="../res/weakScaling/summit.svg" style="height: 50%; width:50%; border-radius: 10pt"/> |
| 52 | + |
| 53 | +### IBM Power9 CPU |
| 54 | +MFC Weak scales to 13,824 Power9 CPU cores on OLCF Summit to within 1% of ideal scaling. |
| 55 | + |
| 56 | +<img src="../res/weakScaling/cpuScaling.svg" style="height: 50%; width:50%; border-radius: 10pt"/> |
| 57 | + |
| 58 | +## Strong scaling |
| 59 | + |
| 60 | +Strong scaling results are obtained by keeping the problem size constant and increasing the number of processes so that work per process decreases. |
| 61 | + |
| 62 | +### NVIDIA V100 GPU |
| 63 | + |
| 64 | +For these tests, the base case utilizes 8 GPUs with one MPI process per GPU. |
| 65 | +The performance is analyzed at two different problem sizes of 16M and 64M grid points, with the base case using 2M and 8M grid points per process. |
| 66 | + |
| 67 | +#### 16M Grid Points |
| 68 | + |
| 69 | +<img src="../res/strongScaling/strongScaling16.svg" style="width: 50%; border-radius: 10pt"/> |
| 70 | + |
| 71 | +#### 64M Grid Points |
| 72 | +<img src="../res/strongScaling/strongScaling64.svg" style="width: 50%; border-radius: 10pt"/> |
| 73 | + |
| 74 | +### IBM Power9 CPU |
| 75 | + |
| 76 | +CPU strong scaling tests are done with problem sizes of 16, 32, and 64M grid points, with the base case using 2, 4, and 8M cells per process. |
| 77 | + |
| 78 | +<img src="../res/strongScaling/cpuStrongScaling.svg" style="width: 50%; border-radius: 10pt"/> |
0 commit comments