Mibm gpu optimization and io bugs #1019

danieljvickers · 2025-10-24T17:39:55Z

User description

Description

This fixes an issue with IO that does not properly export the IB markers at each time step during post processing, allowing us to see the IB markers move for MIBM cases. This also supports and initial port of the IB Marker, levelset, and levelset Norm calculations to the GPU for improved performance.

Fixes #1013
Fixes #1010

Type of change

Please delete options that are not relevant.

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)

Scope

This PR comprises a set of related changes with a common goal

If you cannot check the above box, please split your PR into multiple PRs that each have a common goal.

How Has This Been Tested?

Please describe the tests that you ran to verify your changes.
Provide instructions so we can reproduce.
Please also list any relevant details for your test configuration

I reran both of the MIBM cases on the GPU with noted improved performance. I also observed the IB markers moving in the IO.

What computers and compilers did you use to test this: GNU and NVHPC

Checklist

I have added comments for the new code
I ran ./mfc.sh format before committing my code
New and existing tests pass locally with my changes, including with GPU capability enabled (both NVIDIA hardware with NVHPC compilers and AMD hardware with CRAY compilers) and disabled
I cannot think of a way to condense this code and reduce any introduced additional line count

If your code changes any code source files (anything in `src/simulation`)

To make sure the code is performing as expected on GPU devices, I have:

Checked that the code compiles using NVHPC compilers
Ran the code on either V100, A100, or H100 GPUs and ensured the new feature performed as expected (the GPU results match the CPU results)
Ran the code on MI200+ GPUs and ensure the new features performed as expected (the GPU results match the CPU results)
Enclosed the new feature via nvtx ranges so that they can be identified in profiles
Ran a Nsight Systems profile using ./mfc.sh run XXXX --gpu -t simulation --nsys, and have attached the output file (.nsys-rep) and plain text results to this PR
Ran a Rocprof Systems profile using ./mfc.sh run XXXX --gpu -t simulation --rsys --hip-trace, and have attached the output file and plain text results to this PR.
Ran my code using various numbers of different GPUs (1, 2, and 8, for example) in parallel and made sure that the results scale similarly to what happens if you run without the new code/feature

PR Type

Enhancement, Bug fix

Description

Port IB marker, levelset, and levelset norm calculations to GPU
Add GPU parallelization directives to all levelset computation routines
Fix IO bug: read IB files from correct time step directory
Fix IO bug: properly export IB markers at each time step
Refactor centroid variables into arrays for GPU compatibility
Add moving IBM velocity correction for slip boundary conditions

Diagram Walkthrough

flowchart LR
  A["Levelset Calculations"] -->|GPU_PARALLEL_LOOP| B["GPU Execution"]
  C["IB Marker Computation"] -->|GPU_PARALLEL_LOOP| B
  D["IO Data Reading"] -->|Fix t_step Path| E["Correct IB File Loading"]
  F["Data Output"] -->|Export at Each Step| G["IB Markers Saved"]
  H["Moving IBM"] -->|Add Rotation Velocity| I["Slip BC Correction"]
  B --> J["Improved Performance"]

File Walkthrough

Relevant files

Enhancement

m_compute_levelset.fpp `GPU parallelization of all levelset computation routines` src/common/m_compute_levelset.fpp Refactored centroid variables (x_centroid, y_centroid, z_centroid) into `center` arrays for GPU memory efficiency Added `$:GPU_PARALLEL_LOOP` directives with appropriate private/copy/copyin clauses to all levelset subroutines Consolidated levelset normal vector calculations using temporary `dist_vec` variables before assignment Fixed cuboid levelset bug: corrected index from `(i, j, 0)` to `(i, j, k)` in normal vector assignment Simplified vector operations using array subtraction syntax	+95/-101
m_ib_patches.fpp `GPU parallelization of IB marker computation routines` src/common/m_ib_patches.fpp Refactored centroid variables into `center` arrays across all IB patch subroutines Added `$:GPU_PARALLEL_LOOP` directives to circle, airfoil, 3D airfoil, rectangle, sphere, cuboid, and cylinder marker routines Moved airfoil grid allocation inside conditional block to prevent redundant allocation Replaced boundary structure variables with direct array calculations for GPU compatibility Simplified coordinate transformations using array subtraction syntax	+164/-163
m_ibm.fpp `Add moving IBM slip boundary condition support` src/simulation/m_ibm.fpp Fixed ghost point array allocation to accommodate both ghost and inner points with 2x buffer Added moving IBM velocity correction for slip boundary conditions Compute rotation velocity contribution from angular velocity and radial vector Apply only normal component of IB motion to slip boundary condition	+12/-2

Bug fix

m_data_input.f90 `Fix IB data file reading from correct time step` src/post_process/m_data_input.f90 Added optional `t_step` parameter to `s_read_ib_data_files` subroutine for correct file offset calculation Fixed IO bug: compute proper MPI file displacement based on time step and save index Updated all calls to `s_read_ib_data_files` to pass `t_step` parameter Removed hardcoded IB directory path that always read from t_step=0 Removed redundant IB directory existence checks	+17/-18
m_data_output.fpp `Fix IB marker export at each time step` src/simulation/m_data_output.fpp Fixed IB marker output to explicitly write array bounds `(0:m, 0:n,` `0:p)` instead of entire array Added parallel IO support for IB marker data with proper MPI file displacement calculation Implemented IB data writing in restart files with correct time step indexing	+19/-1
p_main.fpp `Save initial data at t_step zero` src/simulation/p_main.fpp Added explicit data save call at t_step=0 before main simulation loop Ensures initial IB marker configuration is exported to output files	+3/-0

… step

…step

…ions

qodo-merge-pro · 2025-10-24T17:40:54Z

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

🎫 Ticket compliance analysis 🔶 1010 - Partially compliant Compliant requirements: Move remaining MIBM calculations to the GPU; avoid serial CPU sections and host-device copies. Add GPU-parallelization where straightforward. Improve overall performance of MIBM, with profiling support if applicable. Non-compliant requirements: Consider adding time-step movement (time evolution) for ghost points to avoid full recalculation each step, if feasible. Requires further human verification: Performance profiling evidence across GPU types and scales (Nsight/Rocprof reports, scaling). Numerical parity checks CPU vs GPU for all ported routines. 1013 - PR Code Verified Compliant requirements: Ensure IB markers are saved at each time step, not only at the beginning, so post-processing shows moving IBs correctly. Read IB markers from the correct time-step directory during post-processing/restarts. Requires further human verification: Validate in ParaView that IB markers animate correctly over time for moving cases.
⏱️ Estimated effort to review: 4 🔵🔵🔵🔵⚪
🧪 No relevant tests
🔒 No security concerns identified
⚡ Recommended focus areas for review Possible Issue In s_3D_airfoil_levelset, the condition changed to "if (xyz_local(2) >= center(2))" compares a local-frame coordinate to a global-center component; likely should be a local-threshold (e.g., 0._wp). This may flip side-selection and normal directions. if (xyz_local(2) >= center(2)) then do k = 1, Np dist_vec(1) = xyz_local(1) - airfoil_grid_u(k)%x dist_vec(2) = xyz_local(2) - airfoil_grid_u(k)%y Logic Ambiguity Cuboid normal selection uses first-equal min with a note that rotations by 90° cause test failures; code now computes a dist_vec then applies rotation, but tie-breaking remains order-dependent. Needs deterministic tie-break to avoid orientation-dependent results. side_dists(6) = xyz_local(3) - Front min_dist = minval(abs(side_dists)) ! TODO :: The way that this is written, it looks like we will ! trigger at the first size that is close to the minimum distance, ! meaning corners where side_dists are the same will ! trigger on what may not actually be the minimum, ! leading to undesired behavior. This should be resolved ! and this code should be cleaned up. It also means that ! rotating the box 90 degrees will cause tests to fail. dist_vec = 0._wp if (f_approx_equal(min_dist, abs(side_dists(1)))) then levelset%sf(i, j, k, ib_patch_id) = side_dists(1) if (.not. f_approx_equal(side_dists(1), 0._wp)) then dist_vec(1) = side_dists(1)/abs(side_dists(1)) end if else if (f_approx_equal(min_dist, abs(side_dists(2)))) then levelset%sf(i, j, k, ib_patch_id) = side_dists(2) if (.not. f_approx_equal(side_dists(2), 0._wp)) then dist_vec(1) = -side_dists(2)/abs(side_dists(2)) end if else if (f_approx_equal(min_dist, abs(side_dists(3)))) then levelset%sf(i, j, k, ib_patch_id) = side_dists(3) if (.not. f_approx_equal(side_dists(3), 0._wp)) then dist_vec(2) = side_dists(3)/abs(side_dists(3)) end if else if (f_approx_equal(min_dist, abs(side_dists(4)))) then levelset%sf(i, j, k, ib_patch_id) = side_dists(4) if (.not. f_approx_equal(side_dists(4), 0._wp)) then dist_vec(2) = -side_dists(4)/abs(side_dists(4)) end if else if (f_approx_equal(min_dist, abs(side_dists(5)))) then levelset%sf(i, j, k, ib_patch_id) = side_dists(5) if (.not. f_approx_equal(side_dists(5), 0._wp)) then dist_vec(3) = side_dists(5)/abs(side_dists(5)) end if else if (f_approx_equal(min_dist, abs(side_dists(6)))) then levelset%sf(i, j, k, ib_patch_id) = side_dists(6) if (.not. f_approx_equal(side_dists(6), 0._wp)) then dist_vec(3) = -side_dists(6)/abs(side_dists(6)) end if end if levelset_norm%sf(i, j, k, ib_patch_id, :) = matmul(rotation, dist_vec) end if API Assumption Slip BC correction uses cross_product(matmul(rotation_matrix, angular_vel), radial_vector). Verify angular_vel frame and units; rotation_matrix application order may be incorrect if angular_vel already in global frame. Could cause incorrect rotation velocity. ! compute the linear velocity of the ghost point due to rotation radial_vector = physical_loc - [patch_ib(patch_id)%x_centroid, & patch_ib(patch_id)%y_centroid, patch_ib(patch_id)%z_centroid] rotation_velocity = cross_product(matmul(patch_ib(patch_id)%rotation_matrix, patch_ib(patch_id)%angular_vel), radial_vector) ! add only the component of the IB's motion that is normal to the surface vel_g = vel_g + sum((patch_ib(patch_id)%vel + rotation_velocity)norm)norm end if

qodo-merge-pro · 2025-10-24T17:41:58Z

src/common/m_compute_levelset.fpp

High-level Suggestion

To improve code clarity and reduce redundancy, refactor the patch_ib derived type to store geometric properties like centroids as arrays directly. This avoids repeatedly creating local temporary arrays within each GPU-accelerated subroutine. [High-level, importance: 7]

Solution Walkthrough:

Before:

subroutine s_sphere_levelset(ib_patch_id, levelset, levelset_norm) ... real(wp), dimension(3) :: dist_vec, center ... center(1) = patch_ib(ib_patch_id)%x_centroid center(2) = patch_ib(ib_patch_id)%y_centroid center(3) = patch_ib(ib_patch_id)%z_centroid $:GPU_PARALLEL_LOOP(..., copyin='[...,center,...]') do i = 0, m ... dist_vec(1) = x_cc(i) - center(1) ... end do end subroutine

After:

! In the module defining patch_ib type type t_patch_ib ... ! real(wp) :: x_centroid, y_centroid, z_centroid real(wp), dimension(3) :: center ... end type ! In the subroutine subroutine s_sphere_levelset(ib_patch_id, levelset, levelset_norm) ... ! No local 'center' array and manual copy needed. ... $:GPU_PARALLEL_LOOP(..., copyin='[...,patch_ib(ib_patch_id)%center,...]') do i = 0, m ... dist_vec(1) = x_cc(i) - patch_ib(ib_patch_id)%center(1) ... end do end subroutine

qodo-merge-pro · 2025-10-24T17:41:58Z

src/post_process/m_data_input.f90

+                var_MOK = int(sys_size + 1, MPI_OFFSET_KIND)
+                disp = m_MOK*max(MOK, n_MOK)*max(MOK, p_MOK)*WP_MOK*(var_MOK - 1 + int(save_index, MPI_OFFSET_KIND))


Suggestion: Correct the MPI file displacement calculation for reading IB data. The current formula incorrectly uses sys_size in the offset, which should be based only on the data slice size and the save index. [possible issue, importance: 9]

Suggested change

var_MOK = int(sys_size + 1, MPI_OFFSET_KIND)

disp = m_MOK*max(MOK, n_MOK)*max(MOK, p_MOK)*WP_MOK*(var_MOK - 1 + int(save_index, MPI_OFFSET_KIND))

disp = m_MOK*max(MOK, n_MOK)*max(MOK, p_MOK)*int(4_wp, MPI_OFFSET_KIND)*int(save_index, MPI_OFFSET_KIND)

…. Need another solution

codecov · 2025-10-24T22:56:55Z

Codecov Report

❌ Patch coverage is 49.27536% with 105 lines in your changes missing coverage. Please review.
✅ Project coverage is 41.66%. Comparing base (4f8fb91) to head (1cdbbe3).
⚠️ Report is 1 commits behind head on master.

Files with missing lines	Patch %	Lines
src/common/m_ib_patches.fpp	41.22%	66 Missing and 1 partial ⚠️
src/common/m_compute_levelset.fpp	52.38%	28 Missing and 2 partials ⚠️
src/post_process/m_data_input.f90	78.57%	2 Missing and 1 partial ⚠️
src/simulation/m_ibm.fpp	50.00%	3 Missing ⚠️
src/simulation/m_data_output.fpp	80.00%	2 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #1019      +/-   ##
==========================================
+ Coverage   41.60%   41.66%   +0.06%     
==========================================
  Files          70       70              
  Lines       20783    20769      -14     
  Branches     2616     2618       +2     
==========================================
+ Hits         8647     8654       +7     
+ Misses      10499    10477      -22     
- Partials     1637     1638       +1

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

…jvickers/MFC into mibm-gpu-optimization-and-io-bugs

sbryngelson · 2025-10-30T12:36:45Z

looks good, i'll wait to see some benchmarks then merge. ping me if they complete and i don't see it (seems likely) and I will merge

danieljvickers · 2025-10-30T13:45:28Z

@sbryngelson Looks like there is some missing YAML file on frontier that caused a failure. I saw the same thing on another branch too.

wilfonba · 2025-10-30T13:58:31Z

saw

It appears the IGR test failed on master. I'll download the logs and take a closer look when I get to the office.

danieljvickers added 13 commits October 17, 2025 11:46

Replaced ib step dir to always read the correct IB files for the time…

450ff9f

… step

Resolved serial IO issue:

1257732

Successfully writing out markers with parallel io except for initial …

d6e1976

…step

IO is working

a24ff39

mostly working, but original levelset not changing

64a57bb

Back to working rectangle case with moving IBs

71dc556

Finished porting all IB marker compute to the GPU

f02636d

Full GPU support working for airfoils

8965ae2

Removed depricated code

e87fbff

Removed some unused variables and parallelized more levelset calculat…

240c588

…ions

All levelsets updated for GPU compute

77c1432

Passed test suite of IBM cases

bd6096b

Ran formatting

91b256f

danieljvickers requested a review from a team as a code owner October 24, 2025 17:39

qodo-merge-pro bot added the Review effort 4/5 label Oct 24, 2025

qodo-merge-pro bot reviewed Oct 24, 2025

View reviewed changes

The premature data save fixes MIBM frame 0 data, but ruins test suite…

0389b67

…. Need another solution

danieljvickers and others added 8 commits October 27, 2025 13:59

Added something that may fix the cray issue

eae1974

Intermittend commit

3dc61e0

Passes cray compiler tests

e1591e3

Modified all GPU parallel statements and ran formatting

a301fbc

Fixed IB marker write issue with something more sustainable

7a05539

Merge branch 'master' into mibm-gpu-optimization-and-io-bugs

21b79e4

Fixed bug with centroid and length not being comunicated over MPI

76ec5ed

Merge branch 'mibm-gpu-optimization-and-io-bugs' of github.com:daniel…

1cdbbe3

…jvickers/MFC into mibm-gpu-optimization-and-io-bugs

sbryngelson approved these changes Oct 30, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Mibm gpu optimization and io bugs #1019

Mibm gpu optimization and io bugs #1019

danieljvickers commented Oct 24, 2025 •

edited by qodo-merge-pro bot

Loading

Uh oh!

qodo-merge-pro bot commented Oct 24, 2025

Uh oh!

qodo-merge-pro bot Oct 24, 2025

Uh oh!

qodo-merge-pro bot Oct 24, 2025

Uh oh!

codecov bot commented Oct 24, 2025 •

edited

Loading

Uh oh!

sbryngelson commented Oct 30, 2025

Uh oh!

danieljvickers commented Oct 30, 2025 •

edited

Loading

Uh oh!

wilfonba commented Oct 30, 2025

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

3 participants

		var_MOK = int(sys_size + 1, MPI_OFFSET_KIND)
		disp = m_MOKmax(MOK, n_MOK)max(MOK, p_MOK)WP_MOK(var_MOK - 1 + int(save_index, MPI_OFFSET_KIND))

Mibm gpu optimization and io bugs #1019

Are you sure you want to change the base?

Mibm gpu optimization and io bugs #1019

Conversation

danieljvickers commented Oct 24, 2025 • edited by qodo-merge-pro bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

User description

Description

Type of change

Scope

How Has This Been Tested?

Checklist

If your code changes any code source files (anything in src/simulation)

PR Type

Description

Diagram Walkthrough

File Walkthrough

Uh oh!

qodo-merge-pro bot commented Oct 24, 2025

PR Reviewer Guide 🔍

Uh oh!

qodo-merge-pro bot Oct 24, 2025

Choose a reason for hiding this comment

High-level Suggestion

Solution Walkthrough:

Before:

After:

Uh oh!

qodo-merge-pro bot Oct 24, 2025

Choose a reason for hiding this comment

Uh oh!

codecov bot commented Oct 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

sbryngelson commented Oct 30, 2025

Uh oh!

danieljvickers commented Oct 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

wilfonba commented Oct 30, 2025

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

3 participants

danieljvickers commented Oct 24, 2025 •

edited by qodo-merge-pro bot

Loading

If your code changes any code source files (anything in `src/simulation`)

codecov bot commented Oct 24, 2025 •

edited

Loading

danieljvickers commented Oct 30, 2025 •

edited

Loading