- 
                Notifications
    You must be signed in to change notification settings 
- Fork 121
Mibm gpu optimization and io bugs #1019
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Mibm gpu optimization and io bugs #1019
Conversation
| PR Reviewer Guide 🔍Here are some key observations to aid the review process: 
 | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
High-level Suggestion
To improve code clarity and reduce redundancy, refactor the patch_ib derived type to store geometric properties like centroids as arrays directly. This avoids repeatedly creating local temporary arrays within each GPU-accelerated subroutine. [High-level, importance: 7]
Solution Walkthrough:
Before:
subroutine s_sphere_levelset(ib_patch_id, levelset, levelset_norm)
    ...
    real(wp), dimension(3) :: dist_vec, center
    ...
    center(1) = patch_ib(ib_patch_id)%x_centroid
    center(2) = patch_ib(ib_patch_id)%y_centroid
    center(3) = patch_ib(ib_patch_id)%z_centroid
    $:GPU_PARALLEL_LOOP(..., copyin='[...,center,...]')
    do i = 0, m
        ...
        dist_vec(1) = x_cc(i) - center(1)
        ...
    end do
end subroutine
After:
! In the module defining patch_ib type
type t_patch_ib
  ...
  ! real(wp) :: x_centroid, y_centroid, z_centroid
  real(wp), dimension(3) :: center
  ...
end type
! In the subroutine
subroutine s_sphere_levelset(ib_patch_id, levelset, levelset_norm)
    ...
    ! No local 'center' array and manual copy needed.
    ...
    $:GPU_PARALLEL_LOOP(..., copyin='[...,patch_ib(ib_patch_id)%center,...]')
    do i = 0, m
        ...
        dist_vec(1) = x_cc(i) - patch_ib(ib_patch_id)%center(1)
        ...
    end do
end subroutine
        
          
                src/post_process/m_data_input.f90
              
                Outdated
          
        
      | var_MOK = int(sys_size + 1, MPI_OFFSET_KIND) | ||
| disp = m_MOK*max(MOK, n_MOK)*max(MOK, p_MOK)*WP_MOK*(var_MOK - 1 + int(save_index, MPI_OFFSET_KIND)) | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion: Correct the MPI file displacement calculation for reading IB data. The current formula incorrectly uses sys_size in the offset, which should be based only on the data slice size and the save index. [possible issue, importance: 9]
| var_MOK = int(sys_size + 1, MPI_OFFSET_KIND) | |
| disp = m_MOK*max(MOK, n_MOK)*max(MOK, p_MOK)*WP_MOK*(var_MOK - 1 + int(save_index, MPI_OFFSET_KIND)) | |
| disp = m_MOK*max(MOK, n_MOK)*max(MOK, p_MOK)*int(4_wp, MPI_OFFSET_KIND)*int(save_index, MPI_OFFSET_KIND) | 
…. Need another solution
| Codecov Report❌ Patch coverage is  Additional details and impacted files@@            Coverage Diff             @@
##           master    #1019      +/-   ##
==========================================
+ Coverage   41.60%   41.66%   +0.06%     
==========================================
  Files          70       70              
  Lines       20783    20769      -14     
  Branches     2616     2618       +2     
==========================================
+ Hits         8647     8654       +7     
+ Misses      10499    10477      -22     
- Partials     1637     1638       +1     ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
 | 
…jvickers/MFC into mibm-gpu-optimization-and-io-bugs
| looks good, i'll wait to see some benchmarks then merge. ping me if they complete and i don't see it (seems likely) and I will merge | 
| @sbryngelson Looks like there is some missing YAML file on frontier that caused a failure. I saw the same thing on another branch too. | 
| 
 It appears the IGR test failed on master. I'll download the logs and take a closer look when I get to the office. | 
User description
Description
This fixes an issue with IO that does not properly export the IB markers at each time step during post processing, allowing us to see the IB markers move for MIBM cases. This also supports and initial port of the IB Marker, levelset, and levelset Norm calculations to the GPU for improved performance.
Fixes #1013
Fixes #1010
Type of change
Please delete options that are not relevant.
Scope
If you cannot check the above box, please split your PR into multiple PRs that each have a common goal.
How Has This Been Tested?
Please describe the tests that you ran to verify your changes.
Provide instructions so we can reproduce.
Please also list any relevant details for your test configuration
Checklist
./mfc.sh formatbefore committing my codeIf your code changes any code source files (anything in
src/simulation)To make sure the code is performing as expected on GPU devices, I have:
nvtxranges so that they can be identified in profiles./mfc.sh run XXXX --gpu -t simulation --nsys, and have attached the output file (.nsys-rep) and plain text results to this PR./mfc.sh run XXXX --gpu -t simulation --rsys --hip-trace, and have attached the output file and plain text results to this PR.PR Type
Enhancement, Bug fix
Description
Port IB marker, levelset, and levelset norm calculations to GPU
Add GPU parallelization directives to all levelset computation routines
Fix IO bug: read IB files from correct time step directory
Fix IO bug: properly export IB markers at each time step
Refactor centroid variables into arrays for GPU compatibility
Add moving IBM velocity correction for slip boundary conditions
Diagram Walkthrough
File Walkthrough
m_compute_levelset.fpp
GPU parallelization of all levelset computation routinessrc/common/m_compute_levelset.fpp
into
centerarrays for GPU memory efficiency$:GPU_PARALLEL_LOOPdirectives with appropriateprivate/copy/copyin clauses to all levelset subroutines
dist_vecvariables before assignment(i, j, 0)to(i, j, k)in normal vector assignment
m_ib_patches.fpp
GPU parallelization of IB marker computation routinessrc/common/m_ib_patches.fpp
centerarrays across all IB patchsubroutines
$:GPU_PARALLEL_LOOPdirectives to circle, airfoil, 3D airfoil,rectangle, sphere, cuboid, and cylinder marker routines
redundant allocation
for GPU compatibility
m_ibm.fpp
Add moving IBM slip boundary condition supportsrc/simulation/m_ibm.fpp
points with 2x buffer
radial vector
m_data_input.f90
Fix IB data file reading from correct time stepsrc/post_process/m_data_input.f90
t_stepparameter tos_read_ib_data_filessubroutine forcorrect file offset calculation
and save index
s_read_ib_data_filesto passt_stepparameterm_data_output.fpp
Fix IB marker export at each time stepsrc/simulation/m_data_output.fpp
(0:m, 0:n,0:p)instead of entire arraydisplacement calculation
indexing
p_main.fpp
Save initial data at t_step zerosrc/simulation/p_main.fpp