Skip to content

Commit 3446dfb

Browse files
jaharris87Harris, Austin
andauthored
Cray MPICH/ROCm compatibility for GPU-aware MPI (#4729)
## Summary Implements a check in OLCF makefile to compare Cray MPICH supported ROCm version with that in current environment. If the major versions differ, do not link library for GPU-aware MPI as this will cause an error at runtime. ## Additional background * If the ROCm versions are incompatible, a user will see an error like the following at runtime: `error while loading shared libraries: libamdhip64.so.6: cannot open shared object file: No such file or directory`. * Supported major ROCm version for Cray MPICH is determined by parsing `ldd` output for `libmpi_gtl_hsa.so`. * The user's ROCm version is determined by parsing the output of `hipconfig --version`. * The user must also make sure the `craype-accel-amd-gfx90a` module (on Frontier) is not loaded, as this will automatically try to link GTL. * If running with a ROCm version that is not supported by the Cray MPICH version, `CRAY_LD_LIBRARY_PATH` must also be prepended to `LD_LIBRARY_PATH` and disable GPU-aware MPI with `export MPICH_GPU_SUPPORT_ENABLED=0`. ## Checklist The proposed changes: - [x] fix a bug or incorrect behavior in AMReX - [ ] add new capabilities to AMReX - [ ] changes answers in the test suite to more than roundoff level - [ ] are likely to significantly affect the results of downstream AMReX users - [ ] include documentation in the code and/or rst files, if appropriate Co-authored-by: Harris, Austin <[email protected]>
1 parent 9a2bff0 commit 3446dfb

File tree

1 file changed

+5
-1
lines changed

1 file changed

+5
-1
lines changed

Tools/GNUMake/sites/Make.olcf

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -43,7 +43,11 @@ ifeq ($(which_computer),frontier)
4343
endif
4444
# for gpu aware mpi
4545
ifeq ($(USE_HIP),TRUE)
46-
LIBRARIES += $(PE_MPICH_GTL_DIR_amd_gfx90a) -lmpi_gtl_hsa
46+
ROCM_MAJOR_VERSION = $(shell hipconfig --version | cut -d. -f1)
47+
MPICH_ROCM_VERSION = $(shell ldd ${CRAY_MPICH_ROOTDIR}/gtl/lib/libmpi_gtl_hsa.so | grep libamdhip64.so | cut -d" " -f1 | cut -d. -f3 )
48+
ifeq ($(ROCM_MAJOR_VERSION),$(MPICH_ROCM_VERSION))
49+
LIBRARIES += $(PE_MPICH_GTL_DIR_amd_gfx90a) -lmpi_gtl_hsa
50+
endif
4751
endif
4852
endif
4953
endif

0 commit comments

Comments
 (0)