Skip to content

Commit 364e3cf

Browse files
AiredaleDevBucketofJavasbryngelson
authored
Finalize #411, which addresses the performance counter and statistics. (#432)
Co-authored-by: Chladni <[email protected]> Co-authored-by: Spencer Bryngelson <[email protected]>
1 parent 4f89f33 commit 364e3cf

File tree

8 files changed

+150
-57
lines changed

8 files changed

+150
-57
lines changed

docs/documentation/expectedPerformance.md

Lines changed: 10 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -5,21 +5,22 @@ This page shows a summary of these results.
55

66
## Expected time-steps/hour
77

8-
The following table outlines observed performance as nanoseconds per grid point (ns/GP) per right-hand side evaluation (lower is better).
8+
The following table outlines observed performance as nanoseconds per grid point (ns/GP) per equation (eq) per right-hand side (rhs) evaluation (lower is better).
99
We solve an example 3D, inviscid, 5-equation model problem with two advected species (a total of 8 PDEs).
1010
The numerics are WENO5 and the HLLC approximate Riemann solver.
11+
This case is located in `examples/3D_performance_test`.
1112
We report results for various numbers of grid points per CPU die (or GPU device) and hardware.
1213

1314
| Hardware | | 1M GPs | 4M GPs | 8M GPs | Compiler | Computer |
1415
| ---: | :----: | :----: | :---: | :---: | :----: | :--- |
15-
| NVIDIA V100 | 1 device | 96 | 104 | 104 | NVHPC 22.11 | PACE Phoenix |
16-
| NVIDIA V100 | 1 device | 101 | 104 | 104 | NVHPC 22.11 | OLCF Summit |
17-
| NVIDIA A100 | 1 device | 71 | 56 | 59 | NVHPC 23.5 | Wingtip |
18-
| AMD MI250X | 1 GCD | 108 | 90 | 96 | CCE 16.0.1 | OLCF Frontier |
19-
| Intel Xeon Gold 6226 | 12 cores | 1963 | 1688 | 1686 | GNU 10.3.0 | PACE Phoenix |
20-
| Apple M2 | 6 cores | 2919 | 245 | 4500 | GNU 13.2.0 | N/A |
21-
22-
__All results are in nanoseconds (ns) per grid point (gp) per right-hand side (rhs) evaluation. Lower is better.__
16+
| NVIDIA V100 | 1 device | 12.0 | 13.0 | 13.0 | NVHPC 22.11 | PACE Phoenix |
17+
| NVIDIA V100 | 1 device | 12.6 | 13.0 | 13.0 | NVHPC 22.11 | OLCF Summit |
18+
| NVIDIA A100 | 1 device | 8.9 | 7.0 | 7.4 | NVHPC 23.5 | Wingtip |
19+
| AMD MI250X | 1 GCD | 13.5 | 11.3 | 12 | CCE 16.0.1 | OLCF Frontier |
20+
| Intel Xeon Gold 6226 | 12 cores | 245 | 211 | 211 | GNU 10.3.0 | PACE Phoenix |
21+
| Apple M2 | 6 cores | 365 | 306 | 563 | GNU 13.2.0 | N/A |
22+
23+
__All results are in nanoseconds (ns) per grid point (gp) per equation (eq) per right-hand side (rhs) evaluation, so X ns/gp/eq/rhs. Lower is better.__
2324

2425
## Weak scaling
2526

Lines changed: 120 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,120 @@
1+
#!/usr/bin/env python3
2+
3+
import json
4+
5+
# Configuring case dictionary
6+
print(json.dumps({
7+
# Logistics ================================================================
8+
'run_time_info' : 'T',
9+
# ==========================================================================
10+
11+
# Computational Domain Parameters ==========================================
12+
'x_domain%beg' : 0.E+00,
13+
'x_domain%end' : 4.E-03/1.E-03,
14+
'y_domain%beg' : 0.E+00,
15+
'y_domain%end' : 4.E-03/1.E-03,
16+
'z_domain%beg' : 0.E+00,
17+
'z_domain%end' : 4.E-03/1.E-03,
18+
'stretch_x' : 'T',
19+
'a_x' : 4.E+00,
20+
'x_a' : -1.5E-03/1.E-03,
21+
'x_b' : 1.5E-03/1.E-03,
22+
'stretch_y' : 'T',
23+
'a_y' : 4.E+00,
24+
'y_a' : -1.5E-03/1.E-03,
25+
'y_b' : 1.5E-03/1.E-03,
26+
'stretch_z' : 'T',
27+
'a_z' : 4.E+00,
28+
'z_a' : -1.5E-03/1.E-03,
29+
'z_b' : 1.5E-03/1.E-03,
30+
'cyl_coord' : 'F',
31+
'm' : 200,
32+
'n' : 200,
33+
'p' : 200,
34+
'dt' : 0.2E-09/1.E-03,
35+
't_step_start' : 0,
36+
't_step_stop' : 30,
37+
't_step_save' : 30,
38+
# ==========================================================================
39+
40+
# Simulation Algorithm Parameters ==========================================
41+
'num_patches' : 2,
42+
'model_eqns' : 2,
43+
'alt_soundspeed' : 'F',
44+
'num_fluids' : 2,
45+
'adv_alphan' : 'T',
46+
'mpp_lim' : 'T',
47+
'mixture_err' : 'T',
48+
'time_stepper' : 3,
49+
'weno_order' : 5,
50+
'weno_eps' : 1.E-16,
51+
'weno_Re_flux' : 'F',
52+
'weno_avg' : 'F',
53+
'avg_state' : 2,
54+
'mapped_weno' : 'T',
55+
'null_weights' : 'F',
56+
'mp_weno' : 'F',
57+
'riemann_solver' : 2,
58+
'wave_speeds' : 1,
59+
'bc_x%beg' : -2,
60+
'bc_x%end' : -6,
61+
'bc_y%beg' : -2,
62+
'bc_y%end' : -6,
63+
'bc_z%beg' : -2,
64+
'bc_z%end' : -6,
65+
# ==========================================================================
66+
67+
# Formatted Database Files Structure Parameters ============================
68+
'format' : 1,
69+
'precision' : 2,
70+
'prim_vars_wrt' :'T',
71+
'parallel_io' :'T',
72+
# ==========================================================================
73+
74+
# Patch 1: High pressured water ============================================
75+
'patch_icpp(1)%geometry' : 9,
76+
'patch_icpp(1)%x_centroid' : 80.E-03/1.E-03,
77+
'patch_icpp(1)%y_centroid' : 80.E-03/1.E-03,
78+
'patch_icpp(1)%z_centroid' : 80.E-03/1.E-03,
79+
'patch_icpp(1)%length_x' : 160.E-03/1.E-03,
80+
'patch_icpp(1)%length_y' : 160.E-03/1.E-03,
81+
'patch_icpp(1)%length_z' : 160.E-03/1.E-03,
82+
'patch_icpp(1)%vel(1)' : 0.E+00,
83+
'patch_icpp(1)%vel(2)' : 0.E+00,
84+
'patch_icpp(1)%vel(3)' : 0.E+00,
85+
'patch_icpp(1)%pres' : 1.E+05,
86+
'patch_icpp(1)%alpha_rho(1)' : 1000.E+00,
87+
'patch_icpp(1)%alpha_rho(2)' : 0.1E+00,
88+
'patch_icpp(1)%alpha(1)' : 0.9E+00,
89+
'patch_icpp(1)%alpha(2)' : 0.1E+00,
90+
# ==========================================================================
91+
92+
# Patch 3: Air bubble ======================================================
93+
'patch_icpp(2)%geometry' : 8,
94+
'patch_icpp(2)%smoothen' : 'T',
95+
'patch_icpp(2)%smooth_patch_id' : 1,
96+
'patch_icpp(2)%smooth_coeff' : 0.5E+00,
97+
'patch_icpp(2)%x_centroid' : 0.E+00,
98+
'patch_icpp(2)%y_centroid' : 0.E+00,
99+
'patch_icpp(2)%z_centroid' : 0.E+00,
100+
'patch_icpp(2)%radius' : 1.E-03/1.E-03,
101+
'patch_icpp(2)%alter_patch(1)' : 'T',
102+
'patch_icpp(2)%vel(1)' : 0.E+00,
103+
'patch_icpp(2)%vel(2)' : 0.E+00,
104+
'patch_icpp(2)%vel(3)' : 0.E+00,
105+
'patch_icpp(2)%pres' : 1.E+03,
106+
'patch_icpp(2)%alpha_rho(1)' : 100.E+00,
107+
'patch_icpp(2)%alpha_rho(2)' : 0.9E+00,
108+
'patch_icpp(2)%alpha(1)' : 0.1E+00,
109+
'patch_icpp(2)%alpha(2)' : 0.9E+00,
110+
# ==========================================================================
111+
112+
# Fluids Physical Parameters ===============================================
113+
'fluid_pp(1)%gamma' : 1.E+00/(4.4E+00-1.E+00),
114+
'fluid_pp(1)%pi_inf' : 4.4E+00*6.E+08/(4.4E+00-1.E+00),
115+
'fluid_pp(2)%gamma' : 1.E+00/(1.4E+00-1.E+00),
116+
'fluid_pp(2)%pi_inf' : 0.E+00,
117+
# ==========================================================================
118+
}))
119+
120+
# ==============================================================================

src/pre_process/m_patches.fpp

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1544,8 +1544,6 @@ contains
15441544
radius = patch_ib(patch_id)%radius
15451545
end if
15461546
1547-
print *, x_centroid, y_centroid, z_centroid, radius
1548-
15491547
! Initializing the pseudo volume fraction value to 1. The value will
15501548
! be modified as the patch is laid out on the grid, but only in the
15511549
! case that smoothing of the spherical patch's boundary is enabled.

src/pre_process/m_start_up.fpp

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -861,10 +861,10 @@ contains
861861
time_final = 0d0
862862
if (num_procs == 1) then
863863
time_final = time_avg
864-
print *, "Final Time", time_final
864+
print *, "Elapsed Time", time_final
865865
else
866866
time_final = maxval(proc_time)
867-
print *, "Final Time", time_final
867+
print *, "Elapsed Time", time_final
868868
end if
869869
inquire (FILE='pre_time_data.dat', EXIST=file_exists)
870870
if (file_exists) then

src/simulation/m_rhs.fpp

Lines changed: 11 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -636,12 +636,15 @@ contains
636636

637637
end subroutine s_initialize_rhs_module ! -------------------------------
638638

639-
subroutine s_compute_rhs(q_cons_vf, q_prim_vf, rhs_vf, pb, rhs_pb, mv, rhs_mv, t_step) ! -------
639+
subroutine s_compute_rhs(q_cons_vf, q_prim_vf, rhs_vf, pb, rhs_pb, mv, rhs_mv, t_step, time_avg) ! -------
640640

641641
type(scalar_field), dimension(sys_size), intent(INOUT) :: q_cons_vf
642642
type(scalar_field), dimension(sys_size), intent(INOUT) :: q_prim_vf
643643
type(scalar_field), dimension(sys_size), intent(INOUT) :: rhs_vf
644644
real(kind(0d0)), dimension(startx:, starty:, startz:, 1:, 1:), intent(INOUT) :: pb, mv
645+
real(kind(0d0)), intent(INOUT) :: time_avg
646+
real(kind(0d0)) :: t_start, t_finish
647+
real(kind(0d0)) :: gp_sum
645648
real(kind(0d0)), dimension(startx:, starty:, startz:, 1:, 1:), intent(INOUT) :: rhs_pb, rhs_mv
646649
integer, intent(IN) :: t_step
647650

@@ -676,7 +679,7 @@ contains
676679
! ==================================================================
677680

678681
!$acc update device(ix, iy, iz)
679-
682+
call cpu_time(t_start)
680683
! Association/Population of Working Variables ======================
681684
!$acc parallel loop collapse(4) gang vector default(present)
682685
do i = 1, sys_size
@@ -919,7 +922,6 @@ contains
919922
! END: Additional physics and source terms =========================
920923

921924
end do
922-
923925
if (ib) then
924926
!$acc parallel loop collapse(3) gang vector default(present)
925927
do l = 0, p
@@ -975,9 +977,13 @@ contains
975977
end do
976978
end do
977979
end do
978-
979980
end if
980-
981+
call cpu_time(t_finish)
982+
if (t_step >= 4) then
983+
time_avg = (abs(t_finish - t_start)/((ix%end - ix%beg)*(iy%end - iy%beg)*(iz%end - iz%beg)) + (t_step - 4)*time_avg)/(t_step - 3)
984+
else
985+
time_avg = 0d0
986+
end if
981987
! ==================================================================
982988

983989
end subroutine s_compute_rhs ! -----------------------------------------

src/simulation/m_start_up.fpp

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1128,12 +1128,11 @@ contains
11281128
if (num_procs == 1) then
11291129
time_final = time_avg
11301130
io_time_final = io_time_avg
1131-
print *, "Final Time", time_final
11321131
else
11331132
time_final = maxval(proc_time)
11341133
io_time_final = maxval(io_proc_time)
1135-
print *, "Final Time", time_final
11361134
end if
1135+
print *, "Performance: ", time_final*1.0d9/sys_size, " ns/gp/eq/rhs"
11371136
inquire (FILE='time_data.dat', EXIST=file_exists)
11381137
if (file_exists) then
11391138
open (1, file='time_data.dat', position='append', status='old')

src/simulation/m_time_steppers.fpp

Lines changed: 6 additions & 36 deletions
Original file line numberDiff line numberDiff line change
@@ -286,16 +286,13 @@ contains
286286
real(kind(0d0)), intent(INOUT) :: time_avg
287287

288288
integer :: i, j, k, l, q!< Generic loop iterator
289-
real(kind(0d0)) :: start, finish
290289
real(kind(0d0)) :: nR3bar
291290

292291
! Stage 1 of 1 =====================================================
293292

294-
call cpu_time(start)
295-
296293
call nvtxStartRange("Time_Step")
297294

298-
call s_compute_rhs(q_cons_ts(1)%vf, q_prim_vf, rhs_vf, pb_ts(1)%sf, rhs_pb, mv_ts(1)%sf, rhs_mv, t_step)
295+
call s_compute_rhs(q_cons_ts(1)%vf, q_prim_vf, rhs_vf, pb_ts(1)%sf, rhs_pb, mv_ts(1)%sf, rhs_mv, t_step, time_avg)
299296

300297
if (ib .and. t_step == 1) then
301298
if (qbmm .and. .not. polytropic) then
@@ -387,14 +384,6 @@ contains
387384

388385
call nvtxEndRange
389386

390-
call cpu_time(finish)
391-
392-
if (t_step >= 4) then
393-
time_avg = (abs(finish - start) + (t_step - 4)*time_avg)/(t_step - 3)
394-
else
395-
time_avg = 0d0
396-
end if
397-
398387
! ==================================================================
399388

400389
end subroutine s_1st_order_tvd_rk ! ------------------------------------
@@ -416,7 +405,7 @@ contains
416405

417406
call nvtxStartRange("Time_Step")
418407

419-
call s_compute_rhs(q_cons_ts(1)%vf, q_prim_vf, rhs_vf, pb_ts(1)%sf, rhs_pb, mv_ts(1)%sf, rhs_mv, t_step)
408+
call s_compute_rhs(q_cons_ts(1)%vf, q_prim_vf, rhs_vf, pb_ts(1)%sf, rhs_pb, mv_ts(1)%sf, rhs_mv, t_step, time_avg)
420409

421410
if (ib .and. t_step == 1) then
422411
if (qbmm .and. .not. polytropic) then
@@ -503,7 +492,7 @@ contains
503492

504493
! Stage 2 of 2 =====================================================
505494

506-
call s_compute_rhs(q_cons_ts(2)%vf, q_prim_vf, rhs_vf, pb_ts(2)%sf, rhs_pb, mv_ts(2)%sf, rhs_mv, t_step)
495+
call s_compute_rhs(q_cons_ts(2)%vf, q_prim_vf, rhs_vf, pb_ts(2)%sf, rhs_pb, mv_ts(2)%sf, rhs_mv, t_step, time_avg)
507496

508497
!$acc parallel loop collapse(4) gang vector default(present)
509498
do i = 1, sys_size
@@ -574,13 +563,6 @@ contains
574563
call nvtxEndRange
575564

576565
call cpu_time(finish)
577-
578-
if (t_step >= 4) then
579-
time_avg = (abs(finish - start) + (t_step - 4)*time_avg)/(t_step - 3)
580-
else
581-
time_avg = 0d0
582-
end if
583-
584566
! ==================================================================
585567

586568
end subroutine s_2nd_order_tvd_rk ! ------------------------------------
@@ -605,7 +587,7 @@ contains
605587
call nvtxStartRange("Time_Step")
606588
end if
607589

608-
call s_compute_rhs(q_cons_ts(1)%vf, q_prim_vf, rhs_vf, pb_ts(1)%sf, rhs_pb, mv_ts(1)%sf, rhs_mv, t_step)
590+
call s_compute_rhs(q_cons_ts(1)%vf, q_prim_vf, rhs_vf, pb_ts(1)%sf, rhs_pb, mv_ts(1)%sf, rhs_mv, t_step, time_avg)
609591

610592
if (ib .and. t_step == 1) then
611593
if (qbmm .and. .not. polytropic) then
@@ -693,7 +675,7 @@ contains
693675

694676
! Stage 2 of 3 =====================================================
695677

696-
call s_compute_rhs(q_cons_ts(2)%vf, q_prim_vf, rhs_vf, pb_ts(2)%sf, rhs_pb, mv_ts(2)%sf, rhs_mv, t_step)
678+
call s_compute_rhs(q_cons_ts(2)%vf, q_prim_vf, rhs_vf, pb_ts(2)%sf, rhs_pb, mv_ts(2)%sf, rhs_mv, t_step, time_avg)
697679

698680
!$acc parallel loop collapse(4) gang vector default(present)
699681
do i = 1, sys_size
@@ -764,7 +746,7 @@ contains
764746
! ==================================================================
765747

766748
! Stage 3 of 3 =====================================================
767-
call s_compute_rhs(q_cons_ts(2)%vf, q_prim_vf, rhs_vf, pb_ts(2)%sf, rhs_pb, mv_ts(2)%sf, rhs_mv, t_step)
749+
call s_compute_rhs(q_cons_ts(2)%vf, q_prim_vf, rhs_vf, pb_ts(2)%sf, rhs_pb, mv_ts(2)%sf, rhs_mv, t_step, time_avg)
768750

769751
!$acc parallel loop collapse(4) gang vector default(present)
770752
do i = 1, sys_size
@@ -837,12 +819,6 @@ contains
837819
call cpu_time(finish)
838820

839821
time = time + (finish - start)
840-
841-
if (t_step >= 4) then
842-
time_avg = (abs(finish - start) + (t_step - 4)*time_avg)/(t_step - 3)
843-
else
844-
time_avg = 0d0
845-
end if
846822
end if
847823
! ==================================================================
848824

@@ -879,12 +855,6 @@ contains
879855

880856
time = time + (finish - start)
881857

882-
if (t_step >= 4) then
883-
time_avg = (abs(finish - start) + (t_step - 4)*time_avg)/(t_step - 3)
884-
else
885-
time_avg = 0d0
886-
end if
887-
888858
! ==================================================================
889859

890860
end subroutine s_strang_splitting ! ------------------------------------

toolchain/bench.yaml

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -35,4 +35,3 @@
3535
path: benchmarks/hypo_hll/case.py
3636
args: []
3737

38-

0 commit comments

Comments
 (0)