Commit 3446dfb
Cray MPICH/ROCm compatibility for GPU-aware MPI (#4729)
## Summary
Implements a check in OLCF makefile to compare Cray MPICH supported ROCm
version with that in current environment. If the major versions differ,
do not link library for GPU-aware MPI as this will cause an error at
runtime.
## Additional background
* If the ROCm versions are incompatible, a user will see an error like
the following at runtime: `error while loading shared libraries:
libamdhip64.so.6: cannot open shared object file: No such file or
directory`.
* Supported major ROCm version for Cray MPICH is determined by parsing
`ldd` output for `libmpi_gtl_hsa.so`.
* The user's ROCm version is determined by parsing the output of
`hipconfig --version`.
* The user must also make sure the `craype-accel-amd-gfx90a` module (on
Frontier) is not loaded, as this will automatically try to link GTL.
* If running with a ROCm version that is not supported by the Cray MPICH
version, `CRAY_LD_LIBRARY_PATH` must also be prepended to
`LD_LIBRARY_PATH` and disable GPU-aware MPI with `export
MPICH_GPU_SUPPORT_ENABLED=0`.
## Checklist
The proposed changes:
- [x] fix a bug or incorrect behavior in AMReX
- [ ] add new capabilities to AMReX
- [ ] changes answers in the test suite to more than roundoff level
- [ ] are likely to significantly affect the results of downstream AMReX
users
- [ ] include documentation in the code and/or rst files, if appropriate
Co-authored-by: Harris, Austin <[email protected]>1 parent 9a2bff0 commit 3446dfb
1 file changed
+5
-1
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
43 | 43 | | |
44 | 44 | | |
45 | 45 | | |
46 | | - | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
47 | 51 | | |
48 | 52 | | |
49 | 53 | | |
0 commit comments