Skip to content

Conversation

@danieljvickers
Copy link
Member

@danieljvickers danieljvickers commented Nov 6, 2025

User description

Description

I have refactored the to separate out the openMP and openACC macros into two separate macros. One for starting a parallel loop, GPU_PARALLEL_LOOP and one for ending a parallel region END_GPU_PARALLEL_LOOP. This allows the preprocessor to only insert new lines at the start and end of GPU parallel loops. This means that the code is not able to allow line markers to progress naturally instead of all lines inside a loop having the same line number.

I have also adjusted the macro files to remove the additional warnings that were being printed with every single file that has GPU macros.

The result is that this eliminates over half of the warnings that are being printed when the code compiles, and the line numbers for errors are correct. I have tested this locally and found that the line numbers reported are correct, at least in regions that I checked. The result should be a much improved developer experience.

Fixes #1028

Type of change

Please delete options that are not relevant.

  • Bug fix (non-breaking change which fixes an issue)

Scope

  • This PR comprises a set of related changes with a common goal

If you cannot check the above box, please split your PR into multiple PRs that each have a common goal.

How Has This Been Tested?

I compiled locally and saw that the warnings were removed. I also tried making intentional changes and saw that the correct line numbers were reported.

Test Configuration:

Checklist

  • I ran ./mfc.sh format before committing my code
  • I cannot think of a way to condense this code and reduce any introduced additional line count

If your code changes any code source files (anything in src/simulation)

To make sure the code is performing as expected on GPU devices, I have:


PR Type

Enhancement


Description

  • Refactored GPU parallel loop macros from #:call GPU_PARALLEL_LOOP() / #:endcall GPU_PARALLEL_LOOP syntax to $:GPU_PARALLEL_LOOP() / $:END_GPU_PARALLEL_LOOP() syntax

  • Separated OpenMP and OpenACC macro implementations into distinct start and end directives (OMP_PARALLEL_LOOP / END_OMP_PARALLEL_LOOP and ACC_PARALLEL_LOOP / END_GPU_PARALLEL_LOOP)

  • Added explicit private clause parameters to all GPU parallel loop macros to specify privatized loop variables

  • Modified macro definitions to emit only opening directives, allowing loop code to follow separately for improved line numbering accuracy

  • Applied changes consistently across multiple simulation and common modules including viscous stress, RHS, pressure relaxation, CBC, levelset computations, and MPI common utilities

  • Adjusted indentation of loop bodies to align with new macro structure

  • Removed embedded code execution from macro implementations to allow line markers to progress naturally

  • Result eliminates over half of compilation warnings and provides correct line numbers for error reporting


Diagram Walkthrough

flowchart LR
  A["Old Macro Style<br/>#:call GPU_PARALLEL_LOOP()"] -->|Refactor| B["New Macro Style<br/>$:GPU_PARALLEL_LOOP()"]
  B -->|Add| C["Explicit private<br/>clause"]
  B -->|Add| D["$:END_GPU_PARALLEL_LOOP()"]
  C -->|Result| E["Correct line numbers<br/>Fewer warnings"]
  D -->|Result| E
Loading

File Walkthrough

Relevant files
Enhancement
8 files
m_viscous.fpp
Refactor GPU parallel loop macros for improved line numbering

src/simulation/m_viscous.fpp

  • Refactored GPU parallel loop macros from #:call GPU_PARALLEL_LOOP() /
    #:endcall GPU_PARALLEL_LOOP to $:GPU_PARALLEL_LOOP() /
    $:END_GPU_PARALLEL_LOOP() syntax
  • Added explicit private parameter declarations to all GPU_PARALLEL_LOOP
    macro calls to specify loop variables
  • Restructured loop nesting and indentation to align with new macro
    syntax requirements
  • Applied changes consistently across multiple subroutines including
    s_compute_viscous_stress_tensor and gradient computation functions
+786/-786
acc_macros.fpp
Separate ACC parallel loop opening directive from code     

src/common/include/acc_macros.fpp

  • Modified ACC_PARALLEL_LOOP macro definition to remove code parameter
    and acc_end_directive from macro body
  • Changed macro to only emit opening directive, allowing loop code to
    follow separately
  • Removed embedded code execution and end directive from macro
    implementation
+2/-5     
m_rhs.fpp
Refactor GPU loop macros with explicit private variables 

src/simulation/m_rhs.fpp

  • Refactored GPU parallel loop macros from #:call GPU_PARALLEL_LOOP()
    and #:endcall GPU_PARALLEL_LOOP to $:GPU_PARALLEL_LOOP() and
    $:END_GPU_PARALLEL_LOOP() syntax
  • Added explicit private clause to all GPU_PARALLEL_LOOP macros
    specifying loop variables to be privatized
  • Adjusted indentation of loop bodies to align with new macro syntax
    (removed extra indentation level)
  • Applied changes consistently across all GPU parallel regions
    throughout the file
+572/-572
m_mpi_common.fpp
Update GPU macros with explicit private variable declarations

src/common/m_mpi_common.fpp

  • Converted GPU parallel loop macros from #:call GPU_PARALLEL_LOOP() to
    $:GPU_PARALLEL_LOOP() syntax
  • Added explicit private clause listing all loop variables (i, j, k, l,
    q, r) to GPU parallel regions
  • Updated macro closing from #:endcall GPU_PARALLEL_LOOP to
    $:END_GPU_PARALLEL_LOOP()
  • Corrected indentation of loop bodies to match new macro structure
+222/-222
m_pressure_relaxation.fpp
Refactor GPU parallel loop macros with private variables 

src/simulation/m_pressure_relaxation.fpp

  • Replaced #:call GPU_PARALLEL_LOOP() with $:GPU_PARALLEL_LOOP() macro
    syntax
  • Added explicit private='[j,k,l]' clause to specify privatized
    variables
  • Changed closing macro from #:endcall GPU_PARALLEL_LOOP to
    $:END_GPU_PARALLEL_LOOP()
  • Adjusted loop body indentation to align with new macro format
+7/-7     
m_cbc.fpp
Refactor GPU loop macros to separate start and end directives

src/simulation/m_cbc.fpp

  • Refactored GPU parallel loop macros from #:call GPU_PARALLEL_LOOP() /
    #:endcall GPU_PARALLEL_LOOP to $:GPU_PARALLEL_LOOP() /
    $:END_GPU_PARALLEL_LOOP() syntax
  • Added explicit private clause parameters to all GPU_PARALLEL_LOOP
    calls specifying loop variables
  • Adjusted indentation of loop bodies to align with new macro structure,
    removing extra indentation from the old call-based approach
  • Updated numerous GPU parallel regions throughout CBC calculations,
    flux reshaping, and output operations
+590/-590
m_compute_levelset.fpp
Refactor GPU parallel loop macros in levelset computations

src/common/m_compute_levelset.fpp

  • Converted all #:call GPU_PARALLEL_LOOP() / #:endcall GPU_PARALLEL_LOOP
    invocations to $:GPU_PARALLEL_LOOP() / $:END_GPU_PARALLEL_LOOP()
    syntax
  • Added explicit private clause with loop variable lists to all GPU
    parallel loop macros
  • Reduced indentation of loop bodies by one level to match new macro
    structure
  • Applied changes across all levelset computation functions (circle,
    airfoil, 3D airfoil, rectangle, cuboid, sphere, cylinder)
+274/-274
omp_macros.fpp
Split GPU parallel loop macro into separate start and end macros

src/common/include/omp_macros.fpp

  • Separated OMP_PARALLEL_LOOP macro into two parts: start directive
    generation and end directive generation
  • Removed code parameter from OMP_PARALLEL_LOOP macro definition
  • Created new END_OMP_PARALLEL_LOOP macro to emit appropriate end
    directives based on compiler type
  • Removed inline $:code execution from OMP_PARALLEL_LOOP, allowing loop
    bodies to be written directly in code
+15/-6   
Formatting
1 files
shared_parallel_macros.fpp
Add required newline at end of file                                           

src/common/include/shared_parallel_macros.fpp

  • Added newline at end of file to comply with FYPP requirements
+1/-1     
Additional files
27 files
parallel_macros.fpp +17/-7   
m_boundary_common.fpp +345/-345
m_chemistry.fpp +128/-128
m_finite_differences.fpp +31/-31 
m_ib_patches.fpp +189/-189
m_phase_change.fpp +118/-118
m_variables_conversion.fpp +316/-316
m_acoustic_src.fpp +105/-105
m_body_forces.fpp +48/-48 
m_bubbles_EE.fpp +171/-171
m_bubbles_EL.fpp +285/-285
m_bubbles_EL_kernels.fpp +101/-101
m_data_output.fpp +13/-13 
m_derived_variables.fpp +217/-217
m_fftw.fpp +59/-59 
m_hyperelastic.fpp +93/-93 
m_hypoelastic.fpp +255/-255
m_ibm.fpp +158/-158
m_igr.fpp +1728/-1728
m_mhd.fpp +48/-48 
m_mpi_proxy.fpp +53/-53 
m_muscl.fpp +141/-141
m_qbmm.fpp +218/-218
m_riemann_solvers.fpp +3155/-3155
m_surface_tension.fpp +157/-157
m_time_steppers.fpp +103/-103
m_weno.fpp +494/-494

@danieljvickers danieljvickers requested a review from a team as a code owner November 6, 2025 16:37
@qodo-merge-pro
Copy link
Contributor

qodo-merge-pro bot commented Nov 6, 2025

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

🎫 Ticket compliance analysis 🔶

1028 - Partially compliant

Compliant requirements:

  • Ensure FYPP-generated line markers advance correctly inside GPU parallel regions.
  • Refactor GPU loop macros so that start/end markers are separate (no single wrapper causing same-line markers).
  • Preserve functional equivalence of existing GPU loops (OpenACC/OpenMP paths).
  • Reduce excessive FYPP/compile-time warnings caused by previous macro usage.

Non-compliant requirements:

  • Validate that compiler error line numbers inside GPU loops point to the correct source lines.

Requires further human verification:

  • Validate that compiler error line numbers inside GPU loops point to the correct source lines.
⏱️ Estimated effort to review: 4 🔵🔵🔵🔵⚪
🧪 No relevant tests
🔒 No security concerns identified
⚡ Recommended focus areas for review

Possible Issue

Many new GPU_PARALLEL_LOOP insertions use arrays indexed at k-1 or similar (e.g., flux at k-1) while outer loops start at 0 or include boundaries. Please verify boundary ranges to avoid out-of-bounds when k=0 (examples include flux access patterns around lines adding inv_ds with flux_face1 at index-1).

        call s_cbc(q_prim_vf%vf, flux_n(idir)%vf, flux_src_n_vf%vf, idir, 1, irx, iry, irz)
    end if

    $:GPU_PARALLEL_LOOP(collapse=4,private='[j,k_loop,l_loop,q_loop,inv_ds,flux_face1,flux_face2]')
    do j = 1, sys_size
        do q_loop = 0, p
            do l_loop = 0, n
                do k_loop = 0, m
                    inv_ds = 1._wp/dx(k_loop)
                    flux_face1 = flux_n(1)%vf(j)%sf(k_loop - 1, l_loop, q_loop)
                    flux_face2 = flux_n(1)%vf(j)%sf(k_loop, l_loop, q_loop)
                    rhs_vf(j)%sf(k_loop, l_loop, q_loop) = inv_ds*(flux_face1 - flux_face2)
                end do
            end do
        end do
    end do
    $:END_GPU_PARALLEL_LOOP()

    if (model_eqns == 3) then
        $:GPU_PARALLEL_LOOP(collapse=4,private='[i_fluid_loop,k_loop,l_loop,q_loop,inv_ds,advected_qty_val, pressure_val,flux_face1,flux_face2]')
        do q_loop = 0, p
            do l_loop = 0, n
                do k_loop = 0, m
                    do i_fluid_loop = 1, num_fluids
                        inv_ds = 1._wp/dx(k_loop)
                        advected_qty_val = q_cons_vf%vf(i_fluid_loop + advxb - 1)%sf(k_loop, l_loop, q_loop)
                        pressure_val = q_prim_vf%vf(E_idx)%sf(k_loop, l_loop, q_loop)
                        flux_face1 = flux_src_n_vf%vf(advxb)%sf(k_loop, l_loop, q_loop)
                        flux_face2 = flux_src_n_vf%vf(advxb)%sf(k_loop - 1, l_loop, q_loop)
                        rhs_vf(i_fluid_loop + intxb - 1)%sf(k_loop, l_loop, q_loop) = &
                            rhs_vf(i_fluid_loop + intxb - 1)%sf(k_loop, l_loop, q_loop) - &
                            inv_ds*advected_qty_val*pressure_val*(flux_face1 - flux_face2)
                    end do
                end do
            end do
        end do
        $:END_GPU_PARALLEL_LOOP()
    end if

    call s_add_directional_advection_source_terms(idir, rhs_vf, q_cons_vf, q_prim_vf, flux_src_n_vf, Kterm)

case (2) ! y-direction
    if (bc_y%beg <= BC_CHAR_SLIP_WALL .and. bc_y%beg >= BC_CHAR_SUP_OUTFLOW) then
        call s_cbc(q_prim_vf%vf, flux_n(idir)%vf, flux_src_n_vf%vf, idir, -1, irx, iry, irz)
    end if
    if (bc_y%end <= BC_CHAR_SLIP_WALL .and. bc_y%end >= BC_CHAR_SUP_OUTFLOW) then
        call s_cbc(q_prim_vf%vf, flux_n(idir)%vf, flux_src_n_vf%vf, idir, 1, irx, iry, irz)
    end if

    $:GPU_PARALLEL_LOOP(collapse=4,private='[j,k,l,q,inv_ds,flux_face1,flux_face2]')
    do j = 1, sys_size
        do l = 0, p
            do k = 0, n
                do q = 0, m
                    inv_ds = 1._wp/dy(k)
                    flux_face1 = flux_n(2)%vf(j)%sf(q, k - 1, l)
                    flux_face2 = flux_n(2)%vf(j)%sf(q, k, l)
                    rhs_vf(j)%sf(q, k, l) = rhs_vf(j)%sf(q, k, l) + inv_ds*(flux_face1 - flux_face2)
                end do
            end do
        end do
    end do
    $:END_GPU_PARALLEL_LOOP()

    if (model_eqns == 3) then
        $:GPU_PARALLEL_LOOP(collapse=4,private='[i_fluid_loop,k,l,q,inv_ds,advected_qty_val, pressure_val,flux_face1,flux_face2]')
        do l = 0, p
            do k = 0, n
                do q = 0, m
                    do i_fluid_loop = 1, num_fluids
                        inv_ds = 1._wp/dy(k)
                        advected_qty_val = q_cons_vf%vf(i_fluid_loop + advxb - 1)%sf(q, k, l)
                        pressure_val = q_prim_vf%vf(E_idx)%sf(q, k, l)
                        flux_face1 = flux_src_n_vf%vf(advxb)%sf(q, k, l)
                        flux_face2 = flux_src_n_vf%vf(advxb)%sf(q, k - 1, l)
                        rhs_vf(i_fluid_loop + intxb - 1)%sf(q, k, l) = &
                            rhs_vf(i_fluid_loop + intxb - 1)%sf(q, k, l) - &
                            inv_ds*advected_qty_val*pressure_val*(flux_face1 - flux_face2)
                        if (cyl_coord) then
                            rhs_vf(i_fluid_loop + intxb - 1)%sf(q, k, l) = &
                                rhs_vf(i_fluid_loop + intxb - 1)%sf(q, k, l) - &
                                5.e-1_wp/y_cc(k)*advected_qty_val*pressure_val*(flux_face1 + flux_face2)
                        end if
                    end do
                end do
            end do
        end do
        $:END_GPU_PARALLEL_LOOP()
    end if

    if (cyl_coord) then
        $:GPU_PARALLEL_LOOP(collapse=4,private='[j,k,l,q,flux_face1,flux_face2]')
        do j = 1, sys_size
            do l = 0, p
                do k = 0, n
                    do q = 0, m
                        flux_face1 = flux_gsrc_n(2)%vf(j)%sf(q, k - 1, l)
                        flux_face2 = flux_gsrc_n(2)%vf(j)%sf(q, k, l)
                        rhs_vf(j)%sf(q, k, l) = rhs_vf(j)%sf(q, k, l) - &
                                                5.e-1_wp/y_cc(k)*(flux_face1 + flux_face2)
                    end do
                end do
            end do
        end do
        $:END_GPU_PARALLEL_LOOP()
    end if
Off-by-one Risk

Writes to data_real_gpu use +1 offsets and ranges l=0..p; ensure array extents and leading indices match device allocations after macro refactor, especially with firstprivate and collapse usage.

#endif
                #:endcall GPU_HOST_DATA

                $:GPU_PARALLEL_LOOP(private='[j,k,l]', collapse=3)
                do k = 1, sys_size
                    do j = 0, m
                        do l = 0, p
                            data_real_gpu(l + j*real_size + 1 + (k - 1)*real_size*x_size) = data_real_gpu(l + j*real_size + 1 + (k - 1)*real_size*x_size)/real(real_size, dp)
                            q_cons_vf(k)%sf(j, 0, l) = data_real_gpu(l + j*real_size + 1 + (k - 1)*real_size*x_size)
                        end do
                    end do
                end do
                $:END_GPU_PARALLEL_LOOP()

                do i = 1, fourier_rings

                    $:GPU_PARALLEL_LOOP(private='[j,k,l]', collapse=3)
                    do k = 1, sys_size
                        do j = 0, m
                            do l = 1, cmplx_size
                                data_fltr_cmplx_gpu(l + j*cmplx_size + (k - 1)*cmplx_size*x_size) = (0_dp, 0_dp)
                            end do
                        end do
                    end do
                    $:END_GPU_PARALLEL_LOOP()

                    $:GPU_PARALLEL_LOOP(private='[j,k,l]', collapse=3, firstprivate='[i]')
                    do k = 1, sys_size
                        do j = 0, m
                            do l = 0, p
                                data_real_gpu(l + j*real_size + 1 + (k - 1)*real_size*x_size) = q_cons_vf(k)%sf(j, i, l)
                            end do
                        end do
                    end do
                    $:END_GPU_PARALLEL_LOOP()

                    #:call GPU_HOST_DATA(use_device_ptr='[p_real, p_cmplx]')
#if defined(__PGI)
                        ierr = cufftExecD2Z(fwd_plan_gpu, data_real_gpu, data_cmplx_gpu)
#else
                        ierr = hipfftExecD2Z(fwd_plan_gpu, c_loc(p_real), c_loc(p_cmplx))
                        call hipCheck(hipDeviceSynchronize())
#endif
                    #:endcall GPU_HOST_DATA

                    Nfq = min(floor(2_dp*real(i, dp)*pi), cmplx_size)
                    $:GPU_UPDATE(device='[Nfq]')

                    $:GPU_PARALLEL_LOOP(private='[j,k,l]', collapse=3)
                    do k = 1, sys_size
                        do j = 0, m
                            do l = 1, Nfq
                                data_fltr_cmplx_gpu(l + j*cmplx_size + (k - 1)*cmplx_size*x_size) = data_cmplx_gpu(l + j*cmplx_size + (k - 1)*cmplx_size*x_size)
                            end do
                        end do
                    end do
                    $:END_GPU_PARALLEL_LOOP()

                    #:call GPU_HOST_DATA(use_device_ptr='[p_real, p_fltr_cmplx]')
#if defined(__PGI)
                        ierr = cufftExecZ2D(bwd_plan_gpu, data_fltr_cmplx_gpu, data_real_gpu)
#else
                        ierr = hipfftExecZ2D(bwd_plan_gpu, c_loc(p_fltr_cmplx), c_loc(p_real))
                        call hipCheck(hipDeviceSynchronize())
#endif
                    #:endcall GPU_HOST_DATA

                    $:GPU_PARALLEL_LOOP(private='[j,k,l]', collapse=3, firstprivate='[i]')
                    do k = 1, sys_size
                        do j = 0, m
                            do l = 0, p
                                data_real_gpu(l + j*real_size + 1 + (k - 1)*real_size*x_size) = data_real_gpu(l + j*real_size + 1 + (k - 1)*real_size*x_size)/real(real_size, dp)
                                q_cons_vf(k)%sf(j, i, l) = data_real_gpu(l + j*real_size + 1 + (k - 1)*real_size*x_size)
                            end do
                        end do
                    end do
                    $:END_GPU_PARALLEL_LOOP()

                end do
Macro Consistency

A comment line was moved to ensure newline at EOF. Confirm that macro endings and inserted END_GPU_PARALLEL_LOOP everywhere match the corresponding start to avoid unterminated regions in some files not shown here.

    #:endif
    $:extraArgs_val
#:enddef
! New line at end of file is required for FYPP

Comment on lines +917 to +940
if (mhd) then
if (n == 0) then ! 1D: d/dx flux only & Bx = Bx0 = const.
! B_y flux = v_x * B_y - v_y * Bx0
! B_z flux = v_x * B_z - v_z * Bx0
$:GPU_LOOP(parallelism='[seq]')
do i = 1, 3
! Flux of rho*v_i in the ${XYZ}$ direction
! = rho * v_i * v_${XYZ}$ - B_i * B_${XYZ}$ + delta_(${XYZ}$,i) * p_tot
flux_rs${XYZ}$_vf(j, k, l, contxe + i) = &
(s_M*(rho_R*vel_R(i)*vel_R(norm_dir) &
- B%R(i)*B%R(norm_dir) &
+ dir_flg(i)*(pres_R + pres_mag%R)) &
- s_P*(rho_L*vel_L(i)*vel_L(norm_dir) &
- B%L(i)*B%L(norm_dir) &
+ dir_flg(i)*(pres_L + pres_mag%L)) &
+ s_M*s_P*(rho_L*vel_L(i) - rho_R*vel_R(i))) &
/(s_M - s_P)
do i = 0, 1
flux_rsx_vf(j, k, l, B_idx%beg + i) = (s_M*(vel_R(1)*B%R(2 + i) - vel_R(2 + i)*Bx0) &
- s_P*(vel_L(1)*B%L(2 + i) - vel_L(2 + i)*Bx0) &
+ s_M*s_P*(B%L(2 + i) - B%R(2 + i)))/(s_M - s_P)
end do
elseif (mhd .and. relativity) then
else ! 2D/3D: Bx, By, Bz /= const. but zero flux component in the same direction
! B_x d/d${XYZ}$ flux = (1 - delta(x,${XYZ}$)) * (v_${XYZ}$ * B_x - v_x * B_${XYZ}$)
! B_y d/d${XYZ}$ flux = (1 - delta(y,${XYZ}$)) * (v_${XYZ}$ * B_y - v_y * B_${XYZ}$)
! B_z d/d${XYZ}$ flux = (1 - delta(z,${XYZ}$)) * (v_${XYZ}$ * B_z - v_z * B_${XYZ}$)
$:GPU_LOOP(parallelism='[seq]')
do i = 1, 3
! Flux of m_i in the ${XYZ}$ direction
! = m_i * v_${XYZ}$ - b_i/Gamma * B_${XYZ}$ + delta_(${XYZ}$,i) * p_tot
flux_rs${XYZ}$_vf(j, k, l, contxe + i) = &
(s_M*(cm%R(i)*vel_R(norm_dir) &
- b4%R(i)/Ga%R*B%R(norm_dir) &
+ dir_flg(i)*(pres_R + pres_mag%R)) &
- s_P*(cm%L(i)*vel_L(norm_dir) &
- b4%L(i)/Ga%L*B%L(norm_dir) &
+ dir_flg(i)*(pres_L + pres_mag%L)) &
+ s_M*s_P*(cm%L(i) - cm%R(i))) &
/(s_M - s_P)
end do
elseif (bubbles_euler) then
$:GPU_LOOP(parallelism='[seq]')
do i = 1, num_vels
flux_rs${XYZ}$_vf(j, k, l, contxe + dir_idx(i)) = &
(s_M*(rho_R*vel_R(dir_idx(1)) &
*vel_R(dir_idx(i)) &
+ dir_flg(dir_idx(i))*(pres_R - ptilde_R)) &
- s_P*(rho_L*vel_L(dir_idx(1)) &
*vel_L(dir_idx(i)) &
+ dir_flg(dir_idx(i))*(pres_L - ptilde_L)) &
+ s_M*s_P*(rho_L*vel_L(dir_idx(i)) &
- rho_R*vel_R(dir_idx(i)))) &
/(s_M - s_P) &
+ (s_M/s_L)*(s_P/s_R)*pcorr*(vel_R(dir_idx(i)) - vel_L(dir_idx(i)))
end do
else if (hypoelasticity) then
$:GPU_LOOP(parallelism='[seq]')
do i = 1, num_vels
flux_rs${XYZ}$_vf(j, k, l, contxe + dir_idx(i)) = &
(s_M*(rho_R*vel_R(dir_idx(1)) &
*vel_R(dir_idx(i)) &
+ dir_flg(dir_idx(i))*pres_R &
- tau_e_R(dir_idx_tau(i))) &
- s_P*(rho_L*vel_L(dir_idx(1)) &
*vel_L(dir_idx(i)) &
+ dir_flg(dir_idx(i))*pres_L &
- tau_e_L(dir_idx_tau(i))) &
+ s_M*s_P*(rho_L*vel_L(dir_idx(i)) &
- rho_R*vel_R(dir_idx(i)))) &
/(s_M - s_P)
end do
else
$:GPU_LOOP(parallelism='[seq]')
do i = 1, num_vels
flux_rs${XYZ}$_vf(j, k, l, contxe + dir_idx(i)) = &
(s_M*(rho_R*vel_R(dir_idx(1)) &
*vel_R(dir_idx(i)) &
+ dir_flg(dir_idx(i))*pres_R) &
- s_P*(rho_L*vel_L(dir_idx(1)) &
*vel_L(dir_idx(i)) &
+ dir_flg(dir_idx(i))*pres_L) &
+ s_M*s_P*(rho_L*vel_L(dir_idx(i)) &
- rho_R*vel_R(dir_idx(i)))) &
/(s_M - s_P) &
+ (s_M/s_L)*(s_P/s_R)*pcorr*(vel_R(dir_idx(i)) - vel_L(dir_idx(i)))
do i = 0, 2
flux_rs${XYZ}$_vf(j, k, l, B_idx%beg + i) = (1 - dir_flg(i + 1))*( &
s_M*(vel_R(dir_idx(1))*B%R(i + 1) - vel_R(i + 1)*B%R(norm_dir)) - &
s_P*(vel_L(dir_idx(1))*B%L(i + 1) - vel_L(i + 1)*B%L(norm_dir)) + &
s_M*s_P*(B%L(i + 1) - B%R(i + 1)))/(s_M - s_P)
end do
end if
flux_src_rs${XYZ}$_vf(j, k, l, advxb) = 0._wp
end if
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: Restrict the 1D MHD flux calculation to only execute for the x-direction (NORM_DIR == 1) to prevent incorrect flux calculations for the y and z directions. [possible issue, importance: 9]

Suggested change
if (mhd) then
if (n == 0) then ! 1D: d/dx flux only & Bx = Bx0 = const.
! B_y flux = v_x * B_y - v_y * Bx0
! B_z flux = v_x * B_z - v_z * Bx0
$:GPU_LOOP(parallelism='[seq]')
do i = 1, 3
! Flux of rho*v_i in the ${XYZ}$ direction
! = rho * v_i * v_${XYZ}$ - B_i * B_${XYZ}$ + delta_(${XYZ}$,i) * p_tot
flux_rs${XYZ}$_vf(j, k, l, contxe + i) = &
(s_M*(rho_R*vel_R(i)*vel_R(norm_dir) &
- B%R(i)*B%R(norm_dir) &
+ dir_flg(i)*(pres_R + pres_mag%R)) &
- s_P*(rho_L*vel_L(i)*vel_L(norm_dir) &
- B%L(i)*B%L(norm_dir) &
+ dir_flg(i)*(pres_L + pres_mag%L)) &
+ s_M*s_P*(rho_L*vel_L(i) - rho_R*vel_R(i))) &
/(s_M - s_P)
do i = 0, 1
flux_rsx_vf(j, k, l, B_idx%beg + i) = (s_M*(vel_R(1)*B%R(2 + i) - vel_R(2 + i)*Bx0) &
- s_P*(vel_L(1)*B%L(2 + i) - vel_L(2 + i)*Bx0) &
+ s_M*s_P*(B%L(2 + i) - B%R(2 + i)))/(s_M - s_P)
end do
elseif (mhd .and. relativity) then
else ! 2D/3D: Bx, By, Bz /= const. but zero flux component in the same direction
! B_x d/d${XYZ}$ flux = (1 - delta(x,${XYZ}$)) * (v_${XYZ}$ * B_x - v_x * B_${XYZ}$)
! B_y d/d${XYZ}$ flux = (1 - delta(y,${XYZ}$)) * (v_${XYZ}$ * B_y - v_y * B_${XYZ}$)
! B_z d/d${XYZ}$ flux = (1 - delta(z,${XYZ}$)) * (v_${XYZ}$ * B_z - v_z * B_${XYZ}$)
$:GPU_LOOP(parallelism='[seq]')
do i = 1, 3
! Flux of m_i in the ${XYZ}$ direction
! = m_i * v_${XYZ}$ - b_i/Gamma * B_${XYZ}$ + delta_(${XYZ}$,i) * p_tot
flux_rs${XYZ}$_vf(j, k, l, contxe + i) = &
(s_M*(cm%R(i)*vel_R(norm_dir) &
- b4%R(i)/Ga%R*B%R(norm_dir) &
+ dir_flg(i)*(pres_R + pres_mag%R)) &
- s_P*(cm%L(i)*vel_L(norm_dir) &
- b4%L(i)/Ga%L*B%L(norm_dir) &
+ dir_flg(i)*(pres_L + pres_mag%L)) &
+ s_M*s_P*(cm%L(i) - cm%R(i))) &
/(s_M - s_P)
end do
elseif (bubbles_euler) then
$:GPU_LOOP(parallelism='[seq]')
do i = 1, num_vels
flux_rs${XYZ}$_vf(j, k, l, contxe + dir_idx(i)) = &
(s_M*(rho_R*vel_R(dir_idx(1)) &
*vel_R(dir_idx(i)) &
+ dir_flg(dir_idx(i))*(pres_R - ptilde_R)) &
- s_P*(rho_L*vel_L(dir_idx(1)) &
*vel_L(dir_idx(i)) &
+ dir_flg(dir_idx(i))*(pres_L - ptilde_L)) &
+ s_M*s_P*(rho_L*vel_L(dir_idx(i)) &
- rho_R*vel_R(dir_idx(i)))) &
/(s_M - s_P) &
+ (s_M/s_L)*(s_P/s_R)*pcorr*(vel_R(dir_idx(i)) - vel_L(dir_idx(i)))
end do
else if (hypoelasticity) then
$:GPU_LOOP(parallelism='[seq]')
do i = 1, num_vels
flux_rs${XYZ}$_vf(j, k, l, contxe + dir_idx(i)) = &
(s_M*(rho_R*vel_R(dir_idx(1)) &
*vel_R(dir_idx(i)) &
+ dir_flg(dir_idx(i))*pres_R &
- tau_e_R(dir_idx_tau(i))) &
- s_P*(rho_L*vel_L(dir_idx(1)) &
*vel_L(dir_idx(i)) &
+ dir_flg(dir_idx(i))*pres_L &
- tau_e_L(dir_idx_tau(i))) &
+ s_M*s_P*(rho_L*vel_L(dir_idx(i)) &
- rho_R*vel_R(dir_idx(i)))) &
/(s_M - s_P)
end do
else
$:GPU_LOOP(parallelism='[seq]')
do i = 1, num_vels
flux_rs${XYZ}$_vf(j, k, l, contxe + dir_idx(i)) = &
(s_M*(rho_R*vel_R(dir_idx(1)) &
*vel_R(dir_idx(i)) &
+ dir_flg(dir_idx(i))*pres_R) &
- s_P*(rho_L*vel_L(dir_idx(1)) &
*vel_L(dir_idx(i)) &
+ dir_flg(dir_idx(i))*pres_L) &
+ s_M*s_P*(rho_L*vel_L(dir_idx(i)) &
- rho_R*vel_R(dir_idx(i)))) &
/(s_M - s_P) &
+ (s_M/s_L)*(s_P/s_R)*pcorr*(vel_R(dir_idx(i)) - vel_L(dir_idx(i)))
do i = 0, 2
flux_rs${XYZ}$_vf(j, k, l, B_idx%beg + i) = (1 - dir_flg(i + 1))*( &
s_M*(vel_R(dir_idx(1))*B%R(i + 1) - vel_R(i + 1)*B%R(norm_dir)) - &
s_P*(vel_L(dir_idx(1))*B%L(i + 1) - vel_L(i + 1)*B%L(norm_dir)) + &
s_M*s_P*(B%L(i + 1) - B%R(i + 1)))/(s_M - s_P)
end do
end if
flux_src_rs${XYZ}$_vf(j, k, l, advxb) = 0._wp
end if
if (mhd) then
if (n == 0) then ! 1D: d/dx flux only & Bx = Bx0 = const.
#:if (NORM_DIR == 1)
! B_y flux = v_x * B_y - v_y * Bx0
! B_z flux = v_x * B_z - v_z * Bx0
$:GPU_LOOP(parallelism='[seq]')
do i = 0, 1
flux_rsx_vf(j, k, l, B_idx%beg + i) = (s_M*(vel_R(1)*B%R(2 + i) - vel_R(2 + i)*Bx0) &
- s_P*(vel_L(1)*B%L(2 + i) - vel_L(2 + i)*Bx0) &
+ s_M*s_P*(B%L(2 + i) - B%R(2 + i)))/(s_M - s_P)
end do
#:endif
else ! 2D/3D: Bx, By, Bz /= const. but zero flux component in the same direction
! B_x d/d${XYZ}$ flux = (1 - delta(x,${XYZ}$)) * (v_${XYZ}$ * B_x - v_x * B_${XYZ}$)
! B_y d/d${XYZ}$ flux = (1 - delta(y,${XYZ}$)) * (v_${XYZ}$ * B_y - v_y * B_${XYZ}$)
! B_z d/d${XYZ}$ flux = (1 - delta(z,${XYZ}$)) * (v_${XYZ}$ * B_z - v_z * B_${XYZ}$)
$:GPU_LOOP(parallelism='[seq]')
do i = 0, 2
flux_rs${XYZ}$_vf(j, k, l, B_idx%beg + i) = (1 - dir_flg(i + 1))*( &
s_M*(vel_R(dir_idx(1))*B%R(i + 1) - vel_R(i + 1)*B%R(norm_dir)) - &
s_P*(vel_L(dir_idx(1))*B%L(i + 1) - vel_L(i + 1)*B%L(norm_dir)) + &
s_M*s_P*(B%L(i + 1) - B%R(i + 1)))/(s_M - s_P)
end do
end if
flux_src_rs${XYZ}$_vf(j, k, l, advxb) = 0._wp
end if

rho_visc = 0._wp
gamma_visc = 0._wp
pi_inf_visc = 0._wp
$:GPU_PARALLEL_LOOP(collapse=3, private='[i,j,k,l,alpha_visc, alpha_rho_visc, Re_visc, tau_Re]')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: Add the loop variable q to the private clause of the GPU_PARALLEL_LOOP directive to prevent a race condition. [possible issue, importance: 9]

Suggested change
$:GPU_PARALLEL_LOOP(collapse=3, private='[i,j,k,l,alpha_visc, alpha_rho_visc, Re_visc, tau_Re]')
$:GPU_PARALLEL_LOOP(collapse=3, private='[i,j,k,l,q,alpha_visc, alpha_rho_visc, Re_visc, tau_Re]')

@danieljvickers danieljvickers requested a review from a team as a code owner November 6, 2025 16:43
@danieljvickers
Copy link
Member Author

I just builkt this on frontier manually and both omp and acc compile fine. I am going to rerun the build for frontier.

@codecov
Copy link

codecov bot commented Nov 6, 2025

Codecov Report

❌ Patch coverage is 39.75456% with 1620 lines in your changes missing coverage. Please review.
✅ Project coverage is 44.39%. Comparing base (bfd732c) to head (723822d).
⚠️ Report is 2 commits behind head on master.

Files with missing lines Patch % Lines
src/simulation/m_cbc.fpp 45.03% 162 Missing and 4 partials ⚠️
src/simulation/m_rhs.fpp 44.40% 128 Missing and 11 partials ⚠️
src/simulation/m_derived_variables.fpp 8.73% 114 Missing and 1 partial ⚠️
src/simulation/m_bubbles_EL.fpp 37.27% 105 Missing and 1 partial ⚠️
src/common/m_compute_levelset.fpp 40.00% 82 Missing and 20 partials ⚠️
src/common/m_mpi_common.fpp 18.03% 100 Missing ⚠️
src/simulation/m_hypoelastic.fpp 30.06% 96 Missing and 4 partials ⚠️
src/simulation/m_weno.fpp 57.97% 87 Missing ⚠️
src/simulation/m_surface_tension.fpp 32.07% 68 Missing and 4 partials ⚠️
src/common/m_ib_patches.fpp 29.78% 60 Missing and 6 partials ⚠️
... and 18 more
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1029      +/-   ##
==========================================
- Coverage   46.02%   44.39%   -1.64%     
==========================================
  Files          67       70       +3     
  Lines       13437    20330    +6893     
  Branches     1550     1947     +397     
==========================================
+ Hits         6185     9025    +2840     
- Misses       6362    10211    +3849     
- Partials      890     1094     +204     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@sbryngelson
Copy link
Member

running this by @anandrdbz and @prathi-wind. also the MHD parts might need to be examined by @ChrisZYJ

@danieljvickers
Copy link
Member Author

We can get their input. Luckily a few tests are actually passing on Frontier, and the ones that are failing appear to mostly have Adaptive time stepping failed to converge. as the error message, which is a workable debug point.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

add-correct-line-numbering-to-fypp

2 participants