Add line numbering to gpu loops #1029

danieljvickers · 2025-11-06T16:37:23Z

User description

Description

I have refactored the to separate out the openMP and openACC macros into two separate macros. One for starting a parallel loop, GPU_PARALLEL_LOOP and one for ending a parallel region END_GPU_PARALLEL_LOOP. This allows the preprocessor to only insert new lines at the start and end of GPU parallel loops. This means that the code is not able to allow line markers to progress naturally instead of all lines inside a loop having the same line number.

I have also adjusted the macro files to remove the additional warnings that were being printed with every single file that has GPU macros.

The result is that this eliminates over half of the warnings that are being printed when the code compiles, and the line numbers for errors are correct. I have tested this locally and found that the line numbers reported are correct, at least in regions that I checked. The result should be a much improved developer experience.

Fixes #1028

Type of change

Please delete options that are not relevant.

Bug fix (non-breaking change which fixes an issue)

Scope

This PR comprises a set of related changes with a common goal

If you cannot check the above box, please split your PR into multiple PRs that each have a common goal.

How Has This Been Tested?

I compiled locally and saw that the warnings were removed. I also tried making intentional changes and saw that the correct line numbers were reported.

Test Configuration:

Checklist

I ran ./mfc.sh format before committing my code
I cannot think of a way to condense this code and reduce any introduced additional line count

If your code changes any code source files (anything in `src/simulation`)

To make sure the code is performing as expected on GPU devices, I have:

PR Type

Enhancement

Description

Refactored GPU parallel loop macros from #:call GPU_PARALLEL_LOOP() / #:endcall GPU_PARALLEL_LOOP syntax to $:GPU_PARALLEL_LOOP() / $:END_GPU_PARALLEL_LOOP() syntax
Separated OpenMP and OpenACC macro implementations into distinct start and end directives (OMP_PARALLEL_LOOP / END_OMP_PARALLEL_LOOP and ACC_PARALLEL_LOOP / END_GPU_PARALLEL_LOOP)
Added explicit private clause parameters to all GPU parallel loop macros to specify privatized loop variables
Modified macro definitions to emit only opening directives, allowing loop code to follow separately for improved line numbering accuracy
Applied changes consistently across multiple simulation and common modules including viscous stress, RHS, pressure relaxation, CBC, levelset computations, and MPI common utilities
Adjusted indentation of loop bodies to align with new macro structure
Removed embedded code execution from macro implementations to allow line markers to progress naturally
Result eliminates over half of compilation warnings and provides correct line numbers for error reporting

Diagram Walkthrough

flowchart LR
  A["Old Macro Style<br/>#:call GPU_PARALLEL_LOOP()"] -->|Refactor| B["New Macro Style<br/>$:GPU_PARALLEL_LOOP()"]
  B -->|Add| C["Explicit private<br/>clause"]
  B -->|Add| D["$:END_GPU_PARALLEL_LOOP()"]
  C -->|Result| E["Correct line numbers<br/>Fewer warnings"]
  D -->|Result| E

File Walkthrough

Relevant files

Enhancement

8 files

m_viscous.fpp `Refactor GPU parallel loop macros for improved line numbering` src/simulation/m_viscous.fpp Refactored GPU parallel loop macros from `#:call GPU_PARALLEL_LOOP()` / `#:endcall GPU_PARALLEL_LOOP` to `$:GPU_PARALLEL_LOOP()` / `$:END_GPU_PARALLEL_LOOP()` syntax Added explicit `private` parameter declarations to all `GPU_PARALLEL_LOOP` macro calls to specify loop variables Restructured loop nesting and indentation to align with new macro syntax requirements Applied changes consistently across multiple subroutines including `s_compute_viscous_stress_tensor` and gradient computation functions	+786/-786
acc_macros.fpp `Separate ACC parallel loop opening directive from code` src/common/include/acc_macros.fpp Modified `ACC_PARALLEL_LOOP` macro definition to remove `code` parameter and `acc_end_directive` from macro body Changed macro to only emit opening directive, allowing loop code to follow separately Removed embedded code execution and end directive from macro implementation	+2/-5
m_rhs.fpp `Refactor GPU loop macros with explicit private variables` src/simulation/m_rhs.fpp Refactored GPU parallel loop macros from `#:call GPU_PARALLEL_LOOP()` and `#:endcall GPU_PARALLEL_LOOP` to `$:GPU_PARALLEL_LOOP()` and `$:END_GPU_PARALLEL_LOOP()` syntax Added explicit `private` clause to all `GPU_PARALLEL_LOOP` macros specifying loop variables to be privatized Adjusted indentation of loop bodies to align with new macro syntax (removed extra indentation level) Applied changes consistently across all GPU parallel regions throughout the file	+572/-572
m_mpi_common.fpp `Update GPU macros with explicit private variable declarations` src/common/m_mpi_common.fpp Converted GPU parallel loop macros from `#:call GPU_PARALLEL_LOOP()` to `$:GPU_PARALLEL_LOOP()` syntax Added explicit `private` clause listing all loop variables (`i`, `j`, `k`, `l`, `q`, `r`) to GPU parallel regions Updated macro closing from `#:endcall GPU_PARALLEL_LOOP` to `$:END_GPU_PARALLEL_LOOP()` Corrected indentation of loop bodies to match new macro structure	+222/-222
m_pressure_relaxation.fpp `Refactor GPU parallel loop macros with private variables` src/simulation/m_pressure_relaxation.fpp Replaced `#:call GPU_PARALLEL_LOOP()` with `$:GPU_PARALLEL_LOOP()` macro syntax Added explicit `private='[j,k,l]'` clause to specify privatized variables Changed closing macro from `#:endcall GPU_PARALLEL_LOOP` to `$:END_GPU_PARALLEL_LOOP()` Adjusted loop body indentation to align with new macro format	+7/-7
m_cbc.fpp `Refactor GPU loop macros to separate start and end directives` src/simulation/m_cbc.fpp Refactored GPU parallel loop macros from `#:call GPU_PARALLEL_LOOP()` / `#:endcall GPU_PARALLEL_LOOP` to `$:GPU_PARALLEL_LOOP()` / `$:END_GPU_PARALLEL_LOOP()` syntax Added explicit `private` clause parameters to all `GPU_PARALLEL_LOOP` calls specifying loop variables Adjusted indentation of loop bodies to align with new macro structure, removing extra indentation from the old call-based approach Updated numerous GPU parallel regions throughout CBC calculations, flux reshaping, and output operations	+590/-590
m_compute_levelset.fpp `Refactor GPU parallel loop macros in levelset computations` src/common/m_compute_levelset.fpp Converted all `#:call GPU_PARALLEL_LOOP()` / `#:endcall GPU_PARALLEL_LOOP` invocations to `$:GPU_PARALLEL_LOOP()` / `$:END_GPU_PARALLEL_LOOP()` syntax Added explicit `private` clause with loop variable lists to all GPU parallel loop macros Reduced indentation of loop bodies by one level to match new macro structure Applied changes across all levelset computation functions (circle, airfoil, 3D airfoil, rectangle, cuboid, sphere, cylinder)	+274/-274
omp_macros.fpp `Split GPU parallel loop macro into separate start and end macros` src/common/include/omp_macros.fpp Separated `OMP_PARALLEL_LOOP` macro into two parts: start directive generation and end directive generation Removed `code` parameter from `OMP_PARALLEL_LOOP` macro definition Created new `END_OMP_PARALLEL_LOOP` macro to emit appropriate end directives based on compiler type Removed inline `$:code` execution from `OMP_PARALLEL_LOOP`, allowing loop bodies to be written directly in code	+15/-6

Formatting

1 files

shared_parallel_macros.fpp `Add required newline at end of file` src/common/include/shared_parallel_macros.fpp Added newline at end of file to comply with FYPP requirements	+1/-1

Additional files

27 files

parallel_macros.fpp	+17/-7
m_boundary_common.fpp	+345/-345
m_chemistry.fpp	+128/-128
m_finite_differences.fpp	+31/-31
m_ib_patches.fpp	+189/-189
m_phase_change.fpp	+118/-118
m_variables_conversion.fpp	+316/-316
m_acoustic_src.fpp	+105/-105
m_body_forces.fpp	+48/-48
m_bubbles_EE.fpp	+171/-171
m_bubbles_EL.fpp	+285/-285
m_bubbles_EL_kernels.fpp	+101/-101
m_data_output.fpp	+13/-13
m_derived_variables.fpp	+217/-217
m_fftw.fpp	+59/-59
m_hyperelastic.fpp	+93/-93
m_hypoelastic.fpp	+255/-255
m_ibm.fpp	+158/-158
m_igr.fpp	+1728/-1728
m_mhd.fpp	+48/-48
m_mpi_proxy.fpp	+53/-53
m_muscl.fpp	+141/-141
m_qbmm.fpp	+218/-218
m_riemann_solvers.fpp	+3155/-3155
m_surface_tension.fpp	+157/-157
m_time_steppers.fpp	+103/-103
m_weno.fpp	+494/-494

qodo-merge-pro · 2025-11-06T16:38:19Z

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

🎫 Ticket compliance analysis 🔶 1028 - Partially compliant Compliant requirements: Ensure FYPP-generated line markers advance correctly inside GPU parallel regions. Refactor GPU loop macros so that start/end markers are separate (no single wrapper causing same-line markers). Preserve functional equivalence of existing GPU loops (OpenACC/OpenMP paths). Reduce excessive FYPP/compile-time warnings caused by previous macro usage. Non-compliant requirements: Validate that compiler error line numbers inside GPU loops point to the correct source lines. Requires further human verification: Validate that compiler error line numbers inside GPU loops point to the correct source lines.
⏱️ Estimated effort to review: 4 🔵🔵🔵🔵⚪
🧪 No relevant tests
🔒 No security concerns identified
⚡ Recommended focus areas for review Possible Issue Many new GPU_PARALLEL_LOOP insertions use arrays indexed at k-1 or similar (e.g., flux at k-1) while outer loops start at 0 or include boundaries. Please verify boundary ranges to avoid out-of-bounds when k=0 (examples include flux access patterns around lines adding inv_ds with flux_face1 at index-1). call s_cbc(q_prim_vf%vf, flux_n(idir)%vf, flux_src_n_vf%vf, idir, 1, irx, iry, irz) end if $:GPU_PARALLEL_LOOP(collapse=4,private='[j,k_loop,l_loop,q_loop,inv_ds,flux_face1,flux_face2]') do j = 1, sys_size do q_loop = 0, p do l_loop = 0, n do k_loop = 0, m inv_ds = 1._wp/dx(k_loop) flux_face1 = flux_n(1)%vf(j)%sf(k_loop - 1, l_loop, q_loop) flux_face2 = flux_n(1)%vf(j)%sf(k_loop, l_loop, q_loop) rhs_vf(j)%sf(k_loop, l_loop, q_loop) = inv_ds(flux_face1 - flux_face2) end do end do end do end do $:END_GPU_PARALLEL_LOOP() if (model_eqns == 3) then $:GPU_PARALLEL_LOOP(collapse=4,private='[i_fluid_loop,k_loop,l_loop,q_loop,inv_ds,advected_qty_val, pressure_val,flux_face1,flux_face2]') do q_loop = 0, p do l_loop = 0, n do k_loop = 0, m do i_fluid_loop = 1, num_fluids inv_ds = 1._wp/dx(k_loop) advected_qty_val = q_cons_vf%vf(i_fluid_loop + advxb - 1)%sf(k_loop, l_loop, q_loop) pressure_val = q_prim_vf%vf(E_idx)%sf(k_loop, l_loop, q_loop) flux_face1 = flux_src_n_vf%vf(advxb)%sf(k_loop, l_loop, q_loop) flux_face2 = flux_src_n_vf%vf(advxb)%sf(k_loop - 1, l_loop, q_loop) rhs_vf(i_fluid_loop + intxb - 1)%sf(k_loop, l_loop, q_loop) = & rhs_vf(i_fluid_loop + intxb - 1)%sf(k_loop, l_loop, q_loop) - & inv_dsadvected_qty_valpressure_val(flux_face1 - flux_face2) end do end do end do end do $:END_GPU_PARALLEL_LOOP() end if call s_add_directional_advection_source_terms(idir, rhs_vf, q_cons_vf, q_prim_vf, flux_src_n_vf, Kterm) case (2) ! y-direction if (bc_y%beg <= BC_CHAR_SLIP_WALL .and. bc_y%beg >= BC_CHAR_SUP_OUTFLOW) then call s_cbc(q_prim_vf%vf, flux_n(idir)%vf, flux_src_n_vf%vf, idir, -1, irx, iry, irz) end if if (bc_y%end <= BC_CHAR_SLIP_WALL .and. bc_y%end >= BC_CHAR_SUP_OUTFLOW) then call s_cbc(q_prim_vf%vf, flux_n(idir)%vf, flux_src_n_vf%vf, idir, 1, irx, iry, irz) end if $:GPU_PARALLEL_LOOP(collapse=4,private='[j,k,l,q,inv_ds,flux_face1,flux_face2]') do j = 1, sys_size do l = 0, p do k = 0, n do q = 0, m inv_ds = 1._wp/dy(k) flux_face1 = flux_n(2)%vf(j)%sf(q, k - 1, l) flux_face2 = flux_n(2)%vf(j)%sf(q, k, l) rhs_vf(j)%sf(q, k, l) = rhs_vf(j)%sf(q, k, l) + inv_ds(flux_face1 - flux_face2) end do end do end do end do $:END_GPU_PARALLEL_LOOP() if (model_eqns == 3) then $:GPU_PARALLEL_LOOP(collapse=4,private='[i_fluid_loop,k,l,q,inv_ds,advected_qty_val, pressure_val,flux_face1,flux_face2]') do l = 0, p do k = 0, n do q = 0, m do i_fluid_loop = 1, num_fluids inv_ds = 1._wp/dy(k) advected_qty_val = q_cons_vf%vf(i_fluid_loop + advxb - 1)%sf(q, k, l) pressure_val = q_prim_vf%vf(E_idx)%sf(q, k, l) flux_face1 = flux_src_n_vf%vf(advxb)%sf(q, k, l) flux_face2 = flux_src_n_vf%vf(advxb)%sf(q, k - 1, l) rhs_vf(i_fluid_loop + intxb - 1)%sf(q, k, l) = & rhs_vf(i_fluid_loop + intxb - 1)%sf(q, k, l) - & inv_dsadvected_qty_valpressure_val(flux_face1 - flux_face2) if (cyl_coord) then rhs_vf(i_fluid_loop + intxb - 1)%sf(q, k, l) = & rhs_vf(i_fluid_loop + intxb - 1)%sf(q, k, l) - & 5.e-1_wp/y_cc(k)advected_qty_valpressure_val(flux_face1 + flux_face2) end if end do end do end do end do $:END_GPU_PARALLEL_LOOP() end if if (cyl_coord) then $:GPU_PARALLEL_LOOP(collapse=4,private='[j,k,l,q,flux_face1,flux_face2]') do j = 1, sys_size do l = 0, p do k = 0, n do q = 0, m flux_face1 = flux_gsrc_n(2)%vf(j)%sf(q, k - 1, l) flux_face2 = flux_gsrc_n(2)%vf(j)%sf(q, k, l) rhs_vf(j)%sf(q, k, l) = rhs_vf(j)%sf(q, k, l) - & 5.e-1_wp/y_cc(k)(flux_face1 + flux_face2) end do end do end do end do $:END_GPU_PARALLEL_LOOP() end if Off-by-one Risk Writes to data_real_gpu use +1 offsets and ranges l=0..p; ensure array extents and leading indices match device allocations after macro refactor, especially with firstprivate and collapse usage. #endif #:endcall GPU_HOST_DATA $:GPU_PARALLEL_LOOP(private='[j,k,l]', collapse=3) do k = 1, sys_size do j = 0, m do l = 0, p data_real_gpu(l + jreal_size + 1 + (k - 1)real_sizex_size) = data_real_gpu(l + jreal_size + 1 + (k - 1)real_sizex_size)/real(real_size, dp) q_cons_vf(k)%sf(j, 0, l) = data_real_gpu(l + jreal_size + 1 + (k - 1)real_sizex_size) end do end do end do $:END_GPU_PARALLEL_LOOP() do i = 1, fourier_rings $:GPU_PARALLEL_LOOP(private='[j,k,l]', collapse=3) do k = 1, sys_size do j = 0, m do l = 1, cmplx_size data_fltr_cmplx_gpu(l + jcmplx_size + (k - 1)cmplx_sizex_size) = (0_dp, 0_dp) end do end do end do $:END_GPU_PARALLEL_LOOP() $:GPU_PARALLEL_LOOP(private='[j,k,l]', collapse=3, firstprivate='[i]') do k = 1, sys_size do j = 0, m do l = 0, p data_real_gpu(l + jreal_size + 1 + (k - 1)real_sizex_size) = q_cons_vf(k)%sf(j, i, l) end do end do end do $:END_GPU_PARALLEL_LOOP() #:call GPU_HOST_DATA(use_device_ptr='[p_real, p_cmplx]') #if defined(__PGI) ierr = cufftExecD2Z(fwd_plan_gpu, data_real_gpu, data_cmplx_gpu) #else ierr = hipfftExecD2Z(fwd_plan_gpu, c_loc(p_real), c_loc(p_cmplx)) call hipCheck(hipDeviceSynchronize()) #endif #:endcall GPU_HOST_DATA Nfq = min(floor(2_dpreal(i, dp)pi), cmplx_size) $:GPU_UPDATE(device='[Nfq]') $:GPU_PARALLEL_LOOP(private='[j,k,l]', collapse=3) do k = 1, sys_size do j = 0, m do l = 1, Nfq data_fltr_cmplx_gpu(l + jcmplx_size + (k - 1)cmplx_sizex_size) = data_cmplx_gpu(l + jcmplx_size + (k - 1)cmplx_sizex_size) end do end do end do $:END_GPU_PARALLEL_LOOP() #:call GPU_HOST_DATA(use_device_ptr='[p_real, p_fltr_cmplx]') #if defined(__PGI) ierr = cufftExecZ2D(bwd_plan_gpu, data_fltr_cmplx_gpu, data_real_gpu) #else ierr = hipfftExecZ2D(bwd_plan_gpu, c_loc(p_fltr_cmplx), c_loc(p_real)) call hipCheck(hipDeviceSynchronize()) #endif #:endcall GPU_HOST_DATA $:GPU_PARALLEL_LOOP(private='[j,k,l]', collapse=3, firstprivate='[i]') do k = 1, sys_size do j = 0, m do l = 0, p data_real_gpu(l + jreal_size + 1 + (k - 1)real_sizex_size) = data_real_gpu(l + jreal_size + 1 + (k - 1)real_sizex_size)/real(real_size, dp) q_cons_vf(k)%sf(j, i, l) = data_real_gpu(l + jreal_size + 1 + (k - 1)real_sizex_size) end do end do end do $:END_GPU_PARALLEL_LOOP() end do Macro Consistency A comment line was moved to ensure newline at EOF. Confirm that macro endings and inserted END_GPU_PARALLEL_LOOP everywhere match the corresponding start to avoid unterminated regions in some files not shown here. #:endif $:extraArgs_val #:enddef ! New line at end of file is required for FYPP

qodo-merge-pro · 2025-11-06T16:39:36Z

src/simulation/m_riemann_solvers.fpp

+                            if (mhd) then
+                                if (n == 0) then ! 1D: d/dx flux only & Bx = Bx0 = const.
+                                    ! B_y flux = v_x * B_y - v_y * Bx0
+                                    ! B_z flux = v_x * B_z - v_z * Bx0
                                    $:GPU_LOOP(parallelism='[seq]')
-                                    do i = 1, 3
-                                        ! Flux of rho*v_i in the ${XYZ}$ direction
-                                        ! = rho * v_i * v_${XYZ}$ - B_i * B_${XYZ}$ + delta_(${XYZ}$,i) * p_tot
-                                        flux_rs${XYZ}$_vf(j, k, l, contxe + i) = &
-                                            (s_M*(rho_R*vel_R(i)*vel_R(norm_dir) &
-                                                  - B%R(i)*B%R(norm_dir) &
-                                                  + dir_flg(i)*(pres_R + pres_mag%R)) &
-                                             - s_P*(rho_L*vel_L(i)*vel_L(norm_dir) &
-                                                    - B%L(i)*B%L(norm_dir) &
-                                                    + dir_flg(i)*(pres_L + pres_mag%L)) &
-                                             + s_M*s_P*(rho_L*vel_L(i) - rho_R*vel_R(i))) &
-                                            /(s_M - s_P)
+                                    do i = 0, 1
+                                        flux_rsx_vf(j, k, l, B_idx%beg + i) = (s_M*(vel_R(1)*B%R(2 + i) - vel_R(2 + i)*Bx0) &
+                                                                               - s_P*(vel_L(1)*B%L(2 + i) - vel_L(2 + i)*Bx0) &
+                                                                               + s_M*s_P*(B%L(2 + i) - B%R(2 + i)))/(s_M - s_P)
                                    end do
-                                elseif (mhd .and. relativity) then
+                                else ! 2D/3D: Bx, By, Bz /= const. but zero flux component in the same direction
+                                    ! B_x d/d${XYZ}$ flux = (1 - delta(x,${XYZ}$)) * (v_${XYZ}$ * B_x - v_x * B_${XYZ}$)
+                                    ! B_y d/d${XYZ}$ flux = (1 - delta(y,${XYZ}$)) * (v_${XYZ}$ * B_y - v_y * B_${XYZ}$)
+                                    ! B_z d/d${XYZ}$ flux = (1 - delta(z,${XYZ}$)) * (v_${XYZ}$ * B_z - v_z * B_${XYZ}$)
                                    $:GPU_LOOP(parallelism='[seq]')
-                                    do i = 1, 3
-                                        ! Flux of m_i in the ${XYZ}$ direction
-                                        ! = m_i * v_${XYZ}$ - b_i/Gamma * B_${XYZ}$ + delta_(${XYZ}$,i) * p_tot
-                                        flux_rs${XYZ}$_vf(j, k, l, contxe + i) = &
-                                            (s_M*(cm%R(i)*vel_R(norm_dir) &
-                                                  - b4%R(i)/Ga%R*B%R(norm_dir) &
-                                                  + dir_flg(i)*(pres_R + pres_mag%R)) &
-                                             - s_P*(cm%L(i)*vel_L(norm_dir) &
-                                                    - b4%L(i)/Ga%L*B%L(norm_dir) &
-                                                    + dir_flg(i)*(pres_L + pres_mag%L)) &
-                                             + s_M*s_P*(cm%L(i) - cm%R(i))) &
-                                            /(s_M - s_P)
-                                    end do
-                                elseif (bubbles_euler) then
-                                    $:GPU_LOOP(parallelism='[seq]')
-                                    do i = 1, num_vels
-                                        flux_rs${XYZ}$_vf(j, k, l, contxe + dir_idx(i)) = &
-                                            (s_M*(rho_R*vel_R(dir_idx(1)) &
-                                                  *vel_R(dir_idx(i)) &
-                                                  + dir_flg(dir_idx(i))*(pres_R - ptilde_R)) &
-                                             - s_P*(rho_L*vel_L(dir_idx(1)) &
-                                                    *vel_L(dir_idx(i)) &
-                                                    + dir_flg(dir_idx(i))*(pres_L - ptilde_L)) &
-                                             + s_M*s_P*(rho_L*vel_L(dir_idx(i)) &
-                                                        - rho_R*vel_R(dir_idx(i)))) &
-                                            /(s_M - s_P) &
-                                            + (s_M/s_L)*(s_P/s_R)*pcorr*(vel_R(dir_idx(i)) - vel_L(dir_idx(i)))
-                                    end do
-                                else if (hypoelasticity) then
-                                    $:GPU_LOOP(parallelism='[seq]')
-                                    do i = 1, num_vels
-                                        flux_rs${XYZ}$_vf(j, k, l, contxe + dir_idx(i)) = &
-                                            (s_M*(rho_R*vel_R(dir_idx(1)) &
-                                                  *vel_R(dir_idx(i)) &
-                                                  + dir_flg(dir_idx(i))*pres_R &
-                                                  - tau_e_R(dir_idx_tau(i))) &
-                                             - s_P*(rho_L*vel_L(dir_idx(1)) &
-                                                    *vel_L(dir_idx(i)) &
-                                                    + dir_flg(dir_idx(i))*pres_L &
-                                                    - tau_e_L(dir_idx_tau(i))) &
-                                             + s_M*s_P*(rho_L*vel_L(dir_idx(i)) &
-                                                        - rho_R*vel_R(dir_idx(i)))) &
-                                            /(s_M - s_P)
-                                    end do
-                                else
-                                    $:GPU_LOOP(parallelism='[seq]')
-                                    do i = 1, num_vels
-                                        flux_rs${XYZ}$_vf(j, k, l, contxe + dir_idx(i)) = &
-                                            (s_M*(rho_R*vel_R(dir_idx(1)) &
-                                                  *vel_R(dir_idx(i)) &
-                                                  + dir_flg(dir_idx(i))*pres_R) &
-                                             - s_P*(rho_L*vel_L(dir_idx(1)) &
-                                                    *vel_L(dir_idx(i)) &
-                                                    + dir_flg(dir_idx(i))*pres_L) &
-                                             + s_M*s_P*(rho_L*vel_L(dir_idx(i)) &
-                                                        - rho_R*vel_R(dir_idx(i)))) &
-                                            /(s_M - s_P) &
-                                            + (s_M/s_L)*(s_P/s_R)*pcorr*(vel_R(dir_idx(i)) - vel_L(dir_idx(i)))
+                                    do i = 0, 2
+                                        flux_rs${XYZ}$_vf(j, k, l, B_idx%beg + i) = (1 - dir_flg(i + 1))*( &
+                                                                                    s_M*(vel_R(dir_idx(1))*B%R(i + 1) - vel_R(i + 1)*B%R(norm_dir)) - &
+                                                                                    s_P*(vel_L(dir_idx(1))*B%L(i + 1) - vel_L(i + 1)*B%L(norm_dir)) + &
+                                                                                    s_M*s_P*(B%L(i + 1) - B%R(i + 1)))/(s_M - s_P)
                                    end do
                                end if
+                                flux_src_rs${XYZ}$_vf(j, k, l, advxb) = 0._wp
+                            end if


Suggestion: Restrict the 1D MHD flux calculation to only execute for the x-direction (NORM_DIR == 1) to prevent incorrect flux calculations for the y and z directions. [possible issue, importance: 9]

Suggested change

if (mhd) then

if (n == 0) then ! 1D: d/dx flux only & Bx = Bx0 = const.

! B_y flux = v_x * B_y - v_y * Bx0

! B_z flux = v_x * B_z - v_z * Bx0

$:GPU_LOOP(parallelism='[seq]')

do i = 1, 3

! Flux of rho*v_i in the ${XYZ}$ direction

! = rho * v_i * v_${XYZ}$ - B_i * B_${XYZ}$ + delta_(${XYZ}$,i) * p_tot

flux_rs${XYZ}$_vf(j, k, l, contxe + i) = &

(s_M*(rho_R*vel_R(i)*vel_R(norm_dir) &

- B%R(i)*B%R(norm_dir) &

+ dir_flg(i)*(pres_R + pres_mag%R)) &

- s_P*(rho_L*vel_L(i)*vel_L(norm_dir) &

- B%L(i)*B%L(norm_dir) &

+ dir_flg(i)*(pres_L + pres_mag%L)) &

+ s_M*s_P*(rho_L*vel_L(i) - rho_R*vel_R(i))) &

/(s_M - s_P)

do i = 0, 1

flux_rsx_vf(j, k, l, B_idx%beg + i) = (s_M*(vel_R(1)*B%R(2 + i) - vel_R(2 + i)*Bx0) &

- s_P*(vel_L(1)*B%L(2 + i) - vel_L(2 + i)*Bx0) &

+ s_M*s_P*(B%L(2 + i) - B%R(2 + i)))/(s_M - s_P)

end do

elseif (mhd .and. relativity) then

else ! 2D/3D: Bx, By, Bz /= const. but zero flux component in the same direction

! B_x d/d${XYZ}$ flux = (1 - delta(x,${XYZ}$)) * (v_${XYZ}$ * B_x - v_x * B_${XYZ}$)

! B_y d/d${XYZ}$ flux = (1 - delta(y,${XYZ}$)) * (v_${XYZ}$ * B_y - v_y * B_${XYZ}$)

! B_z d/d${XYZ}$ flux = (1 - delta(z,${XYZ}$)) * (v_${XYZ}$ * B_z - v_z * B_${XYZ}$)

$:GPU_LOOP(parallelism='[seq]')

do i = 1, 3

! Flux of m_i in the ${XYZ}$ direction

! = m_i * v_${XYZ}$ - b_i/Gamma * B_${XYZ}$ + delta_(${XYZ}$,i) * p_tot

flux_rs${XYZ}$_vf(j, k, l, contxe + i) = &

(s_M*(cm%R(i)*vel_R(norm_dir) &

- b4%R(i)/Ga%R*B%R(norm_dir) &

+ dir_flg(i)*(pres_R + pres_mag%R)) &

- s_P*(cm%L(i)*vel_L(norm_dir) &

- b4%L(i)/Ga%L*B%L(norm_dir) &

+ dir_flg(i)*(pres_L + pres_mag%L)) &

+ s_M*s_P*(cm%L(i) - cm%R(i))) &

/(s_M - s_P)

end do

elseif (bubbles_euler) then

$:GPU_LOOP(parallelism='[seq]')

do i = 1, num_vels

flux_rs${XYZ}$_vf(j, k, l, contxe + dir_idx(i)) = &

(s_M*(rho_R*vel_R(dir_idx(1)) &

*vel_R(dir_idx(i)) &

+ dir_flg(dir_idx(i))*(pres_R - ptilde_R)) &

- s_P*(rho_L*vel_L(dir_idx(1)) &

*vel_L(dir_idx(i)) &

+ dir_flg(dir_idx(i))*(pres_L - ptilde_L)) &

+ s_M*s_P*(rho_L*vel_L(dir_idx(i)) &

- rho_R*vel_R(dir_idx(i)))) &

/(s_M - s_P) &

+ (s_M/s_L)*(s_P/s_R)*pcorr*(vel_R(dir_idx(i)) - vel_L(dir_idx(i)))

end do

else if (hypoelasticity) then

$:GPU_LOOP(parallelism='[seq]')

do i = 1, num_vels

flux_rs${XYZ}$_vf(j, k, l, contxe + dir_idx(i)) = &

(s_M*(rho_R*vel_R(dir_idx(1)) &

*vel_R(dir_idx(i)) &

+ dir_flg(dir_idx(i))*pres_R &

- tau_e_R(dir_idx_tau(i))) &

- s_P*(rho_L*vel_L(dir_idx(1)) &

*vel_L(dir_idx(i)) &

+ dir_flg(dir_idx(i))*pres_L &

- tau_e_L(dir_idx_tau(i))) &

+ s_M*s_P*(rho_L*vel_L(dir_idx(i)) &

- rho_R*vel_R(dir_idx(i)))) &

/(s_M - s_P)

end do

else

$:GPU_LOOP(parallelism='[seq]')

do i = 1, num_vels

flux_rs${XYZ}$_vf(j, k, l, contxe + dir_idx(i)) = &

(s_M*(rho_R*vel_R(dir_idx(1)) &

*vel_R(dir_idx(i)) &

+ dir_flg(dir_idx(i))*pres_R) &

- s_P*(rho_L*vel_L(dir_idx(1)) &

*vel_L(dir_idx(i)) &

+ dir_flg(dir_idx(i))*pres_L) &

+ s_M*s_P*(rho_L*vel_L(dir_idx(i)) &

- rho_R*vel_R(dir_idx(i)))) &

/(s_M - s_P) &

+ (s_M/s_L)*(s_P/s_R)*pcorr*(vel_R(dir_idx(i)) - vel_L(dir_idx(i)))

do i = 0, 2

flux_rs${XYZ}$_vf(j, k, l, B_idx%beg + i) = (1 - dir_flg(i + 1))*( &

s_M*(vel_R(dir_idx(1))*B%R(i + 1) - vel_R(i + 1)*B%R(norm_dir)) - &

s_P*(vel_L(dir_idx(1))*B%L(i + 1) - vel_L(i + 1)*B%L(norm_dir)) + &

s_M*s_P*(B%L(i + 1) - B%R(i + 1)))/(s_M - s_P)

end do

end if

flux_src_rs${XYZ}$_vf(j, k, l, advxb) = 0._wp

end if

if (mhd) then

if (n == 0) then ! 1D: d/dx flux only & Bx = Bx0 = const.

#:if (NORM_DIR == 1)

! B_y flux = v_x * B_y - v_y * Bx0

! B_z flux = v_x * B_z - v_z * Bx0

$:GPU_LOOP(parallelism='[seq]')

do i = 0, 1

flux_rsx_vf(j, k, l, B_idx%beg + i) = (s_M*(vel_R(1)*B%R(2 + i) - vel_R(2 + i)*Bx0) &

- s_P*(vel_L(1)*B%L(2 + i) - vel_L(2 + i)*Bx0) &

+ s_M*s_P*(B%L(2 + i) - B%R(2 + i)))/(s_M - s_P)

end do

#:endif

else ! 2D/3D: Bx, By, Bz /= const. but zero flux component in the same direction

! B_x d/d${XYZ}$ flux = (1 - delta(x,${XYZ}$)) * (v_${XYZ}$ * B_x - v_x * B_${XYZ}$)

! B_y d/d${XYZ}$ flux = (1 - delta(y,${XYZ}$)) * (v_${XYZ}$ * B_y - v_y * B_${XYZ}$)

! B_z d/d${XYZ}$ flux = (1 - delta(z,${XYZ}$)) * (v_${XYZ}$ * B_z - v_z * B_${XYZ}$)

$:GPU_LOOP(parallelism='[seq]')

do i = 0, 2

flux_rs${XYZ}$_vf(j, k, l, B_idx%beg + i) = (1 - dir_flg(i + 1))*( &

s_M*(vel_R(dir_idx(1))*B%R(i + 1) - vel_R(i + 1)*B%R(norm_dir)) - &

s_P*(vel_L(dir_idx(1))*B%L(i + 1) - vel_L(i + 1)*B%L(norm_dir)) + &

s_M*s_P*(B%L(i + 1) - B%R(i + 1)))/(s_M - s_P)

end do

end if

flux_src_rs${XYZ}$_vf(j, k, l, advxb) = 0._wp

end if

qodo-merge-pro · 2025-11-06T16:39:37Z

src/simulation/m_viscous.fpp

-                                rho_visc = 0._wp
-                                gamma_visc = 0._wp
-                                pi_inf_visc = 0._wp
+            $:GPU_PARALLEL_LOOP(collapse=3, private='[i,j,k,l,alpha_visc, alpha_rho_visc, Re_visc, tau_Re]')


Suggestion: Add the loop variable q to the private clause of the GPU_PARALLEL_LOOP directive to prevent a race condition. [possible issue, importance: 9]

Suggested change

$:GPU_PARALLEL_LOOP(collapse=3, private='[i,j,k,l,alpha_visc, alpha_rho_visc, Re_visc, tau_Re]')

$:GPU_PARALLEL_LOOP(collapse=3, private='[i,j,k,l,q,alpha_visc, alpha_rho_visc, Re_visc, tau_Re]')

danieljvickers · 2025-11-06T18:46:46Z

I just builkt this on frontier manually and both omp and acc compile fine. I am going to rerun the build for frontier.

codecov · 2025-11-06T19:16:30Z

Codecov Report

❌ Patch coverage is 39.75456% with 1620 lines in your changes missing coverage. Please review.
✅ Project coverage is 44.39%. Comparing base (bfd732c) to head (723822d).
⚠️ Report is 2 commits behind head on master.

Files with missing lines	Patch %	Lines
src/simulation/m_cbc.fpp	45.03%	162 Missing and 4 partials ⚠️
src/simulation/m_rhs.fpp	44.40%	128 Missing and 11 partials ⚠️
src/simulation/m_derived_variables.fpp	8.73%	114 Missing and 1 partial ⚠️
src/simulation/m_bubbles_EL.fpp	37.27%	105 Missing and 1 partial ⚠️
src/common/m_compute_levelset.fpp	40.00%	82 Missing and 20 partials ⚠️
src/common/m_mpi_common.fpp	18.03%	100 Missing ⚠️
src/simulation/m_hypoelastic.fpp	30.06%	96 Missing and 4 partials ⚠️
src/simulation/m_weno.fpp	57.97%	87 Missing ⚠️
src/simulation/m_surface_tension.fpp	32.07%	68 Missing and 4 partials ⚠️
src/common/m_ib_patches.fpp	29.78%	60 Missing and 6 partials ⚠️
... and 18 more

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #1029      +/-   ##
==========================================
- Coverage   46.02%   44.39%   -1.64%     
==========================================
  Files          67       70       +3     
  Lines       13437    20330    +6893     
  Branches     1550     1947     +397     
==========================================
+ Hits         6185     9025    +2840     
- Misses       6362    10211    +3849     
- Partials      890     1094     +204

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

sbryngelson · 2025-11-06T22:33:00Z

running this by @anandrdbz and @prathi-wind. also the MHD parts might need to be examined by @ChrisZYJ

danieljvickers · 2025-11-07T00:45:30Z

We can get their input. Luckily a few tests are actually passing on Frontier, and the ones that are failing appear to mostly have Adaptive time stepping failed to converge. as the error message, which is a workable debug point.

…ectives getting printed. Thanks Anand for looking at things with me

…was never called, breaking the IO

danieljvickers added 8 commits November 6, 2025 08:34

Correct line markers have been recovered

ed2bc5e

Made it through common

35c57eb

Modified a few hundred macros

477a13b

Finished with parallel loops

3c010f8

Removed duplicate parallel statements

f8dd4cd

This builds

fd482ed

Removed all warnings

ceec116

Spelling and Format

0ce8883

danieljvickers requested a review from a team as a code owner November 6, 2025 16:37

qodo-merge-pro bot added the Review effort 4/5 label Nov 6, 2025

qodo-merge-pro bot reviewed Nov 6, 2025

View reviewed changes

Updated macro API documentation

824585f

danieljvickers requested a review from a team as a code owner November 6, 2025 16:43

Spelling

28a20bb

danieljvickers added 5 commits November 7, 2025 08:27

Found an error in how FYPP processes that resulted in only openMP dir…

5f768ec

…ectives getting printed. Thanks Anand for looking at things with me

Resolved chemistry issues

5ea3506

Ben found my IO error

1980851

Fixed issue with s_get_char_vol being elemental, which means that it …

1d5e33c

…was never called, breaking the IO

All tests pass on wingtip

723822d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add line numbering to gpu loops #1029

Add line numbering to gpu loops #1029

danieljvickers commented Nov 6, 2025 •

edited by qodo-merge-pro bot

Loading

Uh oh!

qodo-merge-pro bot commented Nov 6, 2025

Uh oh!

qodo-merge-pro bot Nov 6, 2025

Uh oh!

qodo-merge-pro bot Nov 6, 2025

Uh oh!

danieljvickers commented Nov 6, 2025

Uh oh!

codecov bot commented Nov 6, 2025 •

edited

Loading

Uh oh!

sbryngelson commented Nov 6, 2025

Uh oh!

danieljvickers commented Nov 7, 2025

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

2 participants

	$:GPU_PARALLEL_LOOP(collapse=3, private='[i,j,k,l,alpha_visc, alpha_rho_visc, Re_visc, tau_Re]')
	$:GPU_PARALLEL_LOOP(collapse=3, private='[i,j,k,l,q,alpha_visc, alpha_rho_visc, Re_visc, tau_Re]')

Add line numbering to gpu loops #1029

Are you sure you want to change the base?

Add line numbering to gpu loops #1029

Conversation

danieljvickers commented Nov 6, 2025 • edited by qodo-merge-pro bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

User description

Description

Type of change

Scope

How Has This Been Tested?

Checklist

If your code changes any code source files (anything in src/simulation)

PR Type

Description

Diagram Walkthrough

File Walkthrough

Uh oh!

qodo-merge-pro bot commented Nov 6, 2025

PR Reviewer Guide 🔍

Uh oh!

qodo-merge-pro bot Nov 6, 2025

Choose a reason for hiding this comment

Uh oh!

qodo-merge-pro bot Nov 6, 2025

Choose a reason for hiding this comment

Uh oh!

danieljvickers commented Nov 6, 2025

Uh oh!

codecov bot commented Nov 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

sbryngelson commented Nov 6, 2025

Uh oh!

danieljvickers commented Nov 7, 2025

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

2 participants

danieljvickers commented Nov 6, 2025 •

edited by qodo-merge-pro bot

Loading

If your code changes any code source files (anything in `src/simulation`)

codecov bot commented Nov 6, 2025 •

edited

Loading