Skip to content

Conversation

@danieljvickers
Copy link
Member

@danieljvickers danieljvickers commented Oct 24, 2025

User description

Description

This fixes an issue with IO that does not properly export the IB markers at each time step during post processing, allowing us to see the IB markers move for MIBM cases. This also supports and initial port of the IB Marker, levelset, and levelset Norm calculations to the GPU for improved performance.

Fixes #1013
Fixes #1010

Type of change

Please delete options that are not relevant.

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)

Scope

  • This PR comprises a set of related changes with a common goal

If you cannot check the above box, please split your PR into multiple PRs that each have a common goal.

How Has This Been Tested?

Please describe the tests that you ran to verify your changes.
Provide instructions so we can reproduce.
Please also list any relevant details for your test configuration

  • I reran both of the MIBM cases on the GPU with noted improved performance. I also observed the IB markers moving in the IO.
  • What computers and compilers did you use to test this: GNU and NVHPC

Checklist

  • I have added comments for the new code
  • I ran ./mfc.sh format before committing my code
  • New and existing tests pass locally with my changes, including with GPU capability enabled (both NVIDIA hardware with NVHPC compilers and AMD hardware with CRAY compilers) and disabled
  • I cannot think of a way to condense this code and reduce any introduced additional line count

If your code changes any code source files (anything in src/simulation)

To make sure the code is performing as expected on GPU devices, I have:

  • Checked that the code compiles using NVHPC compilers
  • Ran the code on either V100, A100, or H100 GPUs and ensured the new feature performed as expected (the GPU results match the CPU results)
  • Ran the code on MI200+ GPUs and ensure the new features performed as expected (the GPU results match the CPU results)
  • Enclosed the new feature via nvtx ranges so that they can be identified in profiles
  • Ran a Nsight Systems profile using ./mfc.sh run XXXX --gpu -t simulation --nsys, and have attached the output file (.nsys-rep) and plain text results to this PR
  • Ran a Rocprof Systems profile using ./mfc.sh run XXXX --gpu -t simulation --rsys --hip-trace, and have attached the output file and plain text results to this PR.
  • Ran my code using various numbers of different GPUs (1, 2, and 8, for example) in parallel and made sure that the results scale similarly to what happens if you run without the new code/feature

PR Type

Enhancement, Bug fix


Description

  • Port IB marker, levelset, and levelset norm calculations to GPU

  • Add GPU parallelization directives to all levelset computation routines

  • Fix IO bug: read IB files from correct time step directory

  • Fix IO bug: properly export IB markers at each time step

  • Refactor centroid variables into arrays for GPU compatibility

  • Add moving IBM velocity correction for slip boundary conditions


Diagram Walkthrough

flowchart LR
  A["Levelset Calculations"] -->|GPU_PARALLEL_LOOP| B["GPU Execution"]
  C["IB Marker Computation"] -->|GPU_PARALLEL_LOOP| B
  D["IO Data Reading"] -->|Fix t_step Path| E["Correct IB File Loading"]
  F["Data Output"] -->|Export at Each Step| G["IB Markers Saved"]
  H["Moving IBM"] -->|Add Rotation Velocity| I["Slip BC Correction"]
  B --> J["Improved Performance"]
Loading

File Walkthrough

Relevant files
Enhancement
m_compute_levelset.fpp
GPU parallelization of all levelset computation routines 

src/common/m_compute_levelset.fpp

  • Refactored centroid variables (x_centroid, y_centroid, z_centroid)
    into center arrays for GPU memory efficiency
  • Added $:GPU_PARALLEL_LOOP directives with appropriate
    private/copy/copyin clauses to all levelset subroutines
  • Consolidated levelset normal vector calculations using temporary
    dist_vec variables before assignment
  • Fixed cuboid levelset bug: corrected index from (i, j, 0) to (i, j, k)
    in normal vector assignment
  • Simplified vector operations using array subtraction syntax
+95/-101
m_ib_patches.fpp
GPU parallelization of IB marker computation routines       

src/common/m_ib_patches.fpp

  • Refactored centroid variables into center arrays across all IB patch
    subroutines
  • Added $:GPU_PARALLEL_LOOP directives to circle, airfoil, 3D airfoil,
    rectangle, sphere, cuboid, and cylinder marker routines
  • Moved airfoil grid allocation inside conditional block to prevent
    redundant allocation
  • Replaced boundary structure variables with direct array calculations
    for GPU compatibility
  • Simplified coordinate transformations using array subtraction syntax
+164/-163
m_ibm.fpp
Add moving IBM slip boundary condition support                     

src/simulation/m_ibm.fpp

  • Fixed ghost point array allocation to accommodate both ghost and inner
    points with 2x buffer
  • Added moving IBM velocity correction for slip boundary conditions
  • Compute rotation velocity contribution from angular velocity and
    radial vector
  • Apply only normal component of IB motion to slip boundary condition
+12/-2   
Bug fix
m_data_input.f90
Fix IB data file reading from correct time step                   

src/post_process/m_data_input.f90

  • Added optional t_step parameter to s_read_ib_data_files subroutine for
    correct file offset calculation
  • Fixed IO bug: compute proper MPI file displacement based on time step
    and save index
  • Updated all calls to s_read_ib_data_files to pass t_step parameter
  • Removed hardcoded IB directory path that always read from t_step=0
  • Removed redundant IB directory existence checks
+17/-18 
m_data_output.fpp
Fix IB marker export at each time step                                     

src/simulation/m_data_output.fpp

  • Fixed IB marker output to explicitly write array bounds (0:m, 0:n,
    0:p) instead of entire array
  • Added parallel IO support for IB marker data with proper MPI file
    displacement calculation
  • Implemented IB data writing in restart files with correct time step
    indexing
+19/-1   
p_main.fpp
Save initial data at t_step zero                                                 

src/simulation/p_main.fpp

  • Added explicit data save call at t_step=0 before main simulation loop
  • Ensures initial IB marker configuration is exported to output files
+3/-0     

@danieljvickers danieljvickers requested a review from a team as a code owner October 24, 2025 17:39
@qodo-merge-pro
Copy link
Contributor

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

🎫 Ticket compliance analysis 🔶

1010 - Partially compliant

Compliant requirements:

  • Move remaining MIBM calculations to the GPU; avoid serial CPU sections and host-device copies.
  • Add GPU-parallelization where straightforward.
  • Improve overall performance of MIBM, with profiling support if applicable.

Non-compliant requirements:

  • Consider adding time-step movement (time evolution) for ghost points to avoid full recalculation each step, if feasible.

Requires further human verification:

  • Performance profiling evidence across GPU types and scales (Nsight/Rocprof reports, scaling).
  • Numerical parity checks CPU vs GPU for all ported routines.

1013 - PR Code Verified

Compliant requirements:

  • Ensure IB markers are saved at each time step, not only at the beginning, so post-processing shows moving IBs correctly.
  • Read IB markers from the correct time-step directory during post-processing/restarts.

Requires further human verification:

  • Validate in ParaView that IB markers animate correctly over time for moving cases.
⏱️ Estimated effort to review: 4 🔵🔵🔵🔵⚪
🧪 No relevant tests
🔒 No security concerns identified
⚡ Recommended focus areas for review

Possible Issue

In s_3D_airfoil_levelset, the condition changed to "if (xyz_local(2) >= center(2))" compares a local-frame coordinate to a global-center component; likely should be a local-threshold (e.g., 0._wp). This may flip side-selection and normal directions.

if (xyz_local(2) >= center(2)) then
    do k = 1, Np
        dist_vec(1) = xyz_local(1) - airfoil_grid_u(k)%x
        dist_vec(2) = xyz_local(2) - airfoil_grid_u(k)%y
Logic Ambiguity

Cuboid normal selection uses first-equal min with a note that rotations by 90° cause test failures; code now computes a dist_vec then applies rotation, but tie-breaking remains order-dependent. Needs deterministic tie-break to avoid orientation-dependent results.

    side_dists(6) = xyz_local(3) - Front
    min_dist = minval(abs(side_dists))

    ! TODO :: The way that this is written, it looks like we will
    ! trigger at the first size that is close to the minimum distance,
    ! meaning corners where side_dists are the same will
    ! trigger on what may not actually be the minimum,
    ! leading to undesired behavior. This should be resolved
    ! and this code should be cleaned up. It also means that
    ! rotating the box 90 degrees will cause tests to fail.
    dist_vec = 0._wp
    if (f_approx_equal(min_dist, abs(side_dists(1)))) then
        levelset%sf(i, j, k, ib_patch_id) = side_dists(1)
        if (.not. f_approx_equal(side_dists(1), 0._wp)) then
            dist_vec(1) = side_dists(1)/abs(side_dists(1))
        end if

    else if (f_approx_equal(min_dist, abs(side_dists(2)))) then
        levelset%sf(i, j, k, ib_patch_id) = side_dists(2)
        if (.not. f_approx_equal(side_dists(2), 0._wp)) then
            dist_vec(1) = -side_dists(2)/abs(side_dists(2))
        end if

    else if (f_approx_equal(min_dist, abs(side_dists(3)))) then
        levelset%sf(i, j, k, ib_patch_id) = side_dists(3)
        if (.not. f_approx_equal(side_dists(3), 0._wp)) then
            dist_vec(2) = side_dists(3)/abs(side_dists(3))
        end if

    else if (f_approx_equal(min_dist, abs(side_dists(4)))) then
        levelset%sf(i, j, k, ib_patch_id) = side_dists(4)
        if (.not. f_approx_equal(side_dists(4), 0._wp)) then
            dist_vec(2) = -side_dists(4)/abs(side_dists(4))
        end if

    else if (f_approx_equal(min_dist, abs(side_dists(5)))) then
        levelset%sf(i, j, k, ib_patch_id) = side_dists(5)
        if (.not. f_approx_equal(side_dists(5), 0._wp)) then
            dist_vec(3) = side_dists(5)/abs(side_dists(5))
        end if

    else if (f_approx_equal(min_dist, abs(side_dists(6)))) then
        levelset%sf(i, j, k, ib_patch_id) = side_dists(6)
        if (.not. f_approx_equal(side_dists(6), 0._wp)) then
            dist_vec(3) = -side_dists(6)/abs(side_dists(6))
        end if
    end if
    levelset_norm%sf(i, j, k, ib_patch_id, :) = matmul(rotation, dist_vec)
end if
API Assumption

Slip BC correction uses cross_product(matmul(rotation_matrix, angular_vel), radial_vector). Verify angular_vel frame and units; rotation_matrix application order may be incorrect if angular_vel already in global frame. Could cause incorrect rotation velocity.

    ! compute the linear velocity of the ghost point due to rotation
    radial_vector = physical_loc - [patch_ib(patch_id)%x_centroid, &
                                    patch_ib(patch_id)%y_centroid, patch_ib(patch_id)%z_centroid]
    rotation_velocity = cross_product(matmul(patch_ib(patch_id)%rotation_matrix, patch_ib(patch_id)%angular_vel), radial_vector)

    ! add only the component of the IB's motion that is normal to the surface
    vel_g = vel_g + sum((patch_ib(patch_id)%vel + rotation_velocity)*norm)*norm
end if

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

High-level Suggestion

To improve code clarity and reduce redundancy, refactor the patch_ib derived type to store geometric properties like centroids as arrays directly. This avoids repeatedly creating local temporary arrays within each GPU-accelerated subroutine. [High-level, importance: 7]

Solution Walkthrough:

Before:

subroutine s_sphere_levelset(ib_patch_id, levelset, levelset_norm)
    ...
    real(wp), dimension(3) :: dist_vec, center
    ...
    center(1) = patch_ib(ib_patch_id)%x_centroid
    center(2) = patch_ib(ib_patch_id)%y_centroid
    center(3) = patch_ib(ib_patch_id)%z_centroid

    $:GPU_PARALLEL_LOOP(..., copyin='[...,center,...]')
    do i = 0, m
        ...
        dist_vec(1) = x_cc(i) - center(1)
        ...
    end do
end subroutine

After:

! In the module defining patch_ib type
type t_patch_ib
  ...
  ! real(wp) :: x_centroid, y_centroid, z_centroid
  real(wp), dimension(3) :: center
  ...
end type

! In the subroutine
subroutine s_sphere_levelset(ib_patch_id, levelset, levelset_norm)
    ...
    ! No local 'center' array and manual copy needed.
    ...
    $:GPU_PARALLEL_LOOP(..., copyin='[...,patch_ib(ib_patch_id)%center,...]')
    do i = 0, m
        ...
        dist_vec(1) = x_cc(i) - patch_ib(ib_patch_id)%center(1)
        ...
    end do
end subroutine

Comment on lines 185 to 186
var_MOK = int(sys_size + 1, MPI_OFFSET_KIND)
disp = m_MOK*max(MOK, n_MOK)*max(MOK, p_MOK)*WP_MOK*(var_MOK - 1 + int(save_index, MPI_OFFSET_KIND))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: Correct the MPI file displacement calculation for reading IB data. The current formula incorrectly uses sys_size in the offset, which should be based only on the data slice size and the save index. [possible issue, importance: 9]

Suggested change
var_MOK = int(sys_size + 1, MPI_OFFSET_KIND)
disp = m_MOK*max(MOK, n_MOK)*max(MOK, p_MOK)*WP_MOK*(var_MOK - 1 + int(save_index, MPI_OFFSET_KIND))
disp = m_MOK*max(MOK, n_MOK)*max(MOK, p_MOK)*int(4_wp, MPI_OFFSET_KIND)*int(save_index, MPI_OFFSET_KIND)

@codecov
Copy link

codecov bot commented Oct 24, 2025

Codecov Report

❌ Patch coverage is 49.27536% with 105 lines in your changes missing coverage. Please review.
✅ Project coverage is 41.66%. Comparing base (4f8fb91) to head (1cdbbe3).
⚠️ Report is 1 commits behind head on master.

Files with missing lines Patch % Lines
src/common/m_ib_patches.fpp 41.22% 66 Missing and 1 partial ⚠️
src/common/m_compute_levelset.fpp 52.38% 28 Missing and 2 partials ⚠️
src/post_process/m_data_input.f90 78.57% 2 Missing and 1 partial ⚠️
src/simulation/m_ibm.fpp 50.00% 3 Missing ⚠️
src/simulation/m_data_output.fpp 80.00% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1019      +/-   ##
==========================================
+ Coverage   41.60%   41.66%   +0.06%     
==========================================
  Files          70       70              
  Lines       20783    20769      -14     
  Branches     2616     2618       +2     
==========================================
+ Hits         8647     8654       +7     
+ Misses      10499    10477      -22     
- Partials     1637     1638       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@sbryngelson
Copy link
Member

looks good, i'll wait to see some benchmarks then merge. ping me if they complete and i don't see it (seems likely) and I will merge

@danieljvickers
Copy link
Member Author

danieljvickers commented Oct 30, 2025

@sbryngelson Looks like there is some missing YAML file on frontier that caused a failure. I saw the same thing on another branch too.

@wilfonba
Copy link
Collaborator

saw

It appears the IGR test failed on master. I'll download the logs and take a closer look when I get to the office.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

Fix IB marker IO MIBM Performance Optimization

3 participants