|
| 1 | +[](){#ref-communication-cray-mpich} |
| 2 | +# Cray MPICH |
| 3 | + |
| 4 | +Cray MPICH is the recommended MPI implementation on Alps. |
| 5 | +It is available through uenvs like [prgenv-gnu][ref-uenv-prgenv-gnu] and [the application-specific uenvs][ref-software-sciapps]. |
| 6 | + |
| 7 | +The [Cray MPICH documentation](https://cpe.ext.hpe.com/docs/latest/mpt/mpich/index.html) contains detailed information about Cray MPICH. |
| 8 | +On this page we outline the most common workflows and issues that you may encounter on Alps. |
| 9 | + |
| 10 | +## GPU-aware MPI |
| 11 | + |
| 12 | +We recommend using GPU-aware MPI whenever possible, as it almost always provides a significant performance improvement compared to communication through CPU memory. |
| 13 | +To use GPU-aware MPI with Cray MPICH, 1. the application must be linked to the GTL library, and 2. the `MPICH_GPU_SUPPORT_ENABLED=1` environment variable must be set. If either of these are missing, the application will fail to communicate GPU buffers. |
| 14 | + |
| 15 | +In supported uenvs, Cray MPICH is built with GPU support (on clusters that have GPUs). |
| 16 | +This means that Cray MPICH will automatically be linked to the GTL library, which implements the the GPU support for Cray MPICH. |
| 17 | + |
| 18 | +??? info "Checking that the application links to the GTL library" |
| 19 | + |
| 20 | + To check if your application is linked against the required GTL library, running `ldd`` on your executable should print something similar to: |
| 21 | + $ ldd myexecutable | grep gtl |
| 22 | + libmpi_gtl_cuda.so => /user-environment/linux-sles15-neoverse_v2/gcc-13.2.0/cray-gtl-8.1.30-fptqzc5u6t4nals5mivl75nws2fb5vcq/lib/libmpi_gtl_cuda.so (0x0000ffff82aa0000) |
| 23 | + |
| 24 | + The path may be different, but the `libmpi_gtl_cuda.so` library should be printed when using CUDA. |
| 25 | + In ROCm environments the `libmpi_gtl_hsa.so` library should be linked. |
| 26 | + If the GTL library is not linked, nothing will be printed. |
| 27 | + |
| 28 | +In addition to linking to the GTL library, Cray MPICH must be configured to be GPU-aware at runtime by setting the `MPICH_GPU_SUPPORT_ENABLED=1` environment variable. |
| 29 | +On some CSCS systems this option is set by default. |
| 30 | +See [this page][ref-slurm-gh200] for more information on configuring SLURM to use GPUs. |
| 31 | + |
| 32 | +!!! warning "Segmentation faults when trying to communicate GPU buffers without `MPICH_GPU_SUPPORT_ENABLED=1`" |
| 33 | + If you attempt to communicate GPU buffers through MPI without setting `MPICH_GPU_SUPPORT_ENABLED=1`, it will lead to segmentation faults, usually without any specific indication that it is the communication that fails. |
| 34 | + Make sure that the option is set if you are communicating GPU buffers through MPI. |
| 35 | + |
| 36 | +!!! warning "Error: "`GPU_SUPPORT_ENABLED` is requested, but GTL library is not linked"" |
| 37 | + If `MPICH_GPU_SUPPORT_ENABLED` is set to `1` and your application does not link against one of the GTL libraries you will get an error similar to the following during MPI initialization: |
| 38 | + ```bash |
| 39 | + MPICH ERROR [Rank 0] [job id 410301.1] [Thu Feb 13 12:42:18 2025] [nid005414] - Abort(-1) (rank 0 in comm 0): MPIDI_CRAY_init: GPU_SUPPORT_ENABLED is requested, but GTL library is not linked |
| 40 | + (Other MPI error) |
| 41 | + |
| 42 | + aborting job: |
| 43 | + MPIDI_CRAY_init: GPU_SUPPORT_ENABLED is requested, but GTL library is not linked |
| 44 | + ``` |
| 45 | + |
| 46 | + This means that the required GTL library was not linked to the application. |
| 47 | + In supported uenvs, GPU support is enabled by default. |
| 48 | + If you believe a uenv should have GPU support but you are getting the above error, feel free to [get in touch with us][ref-get-in-touch] to understand whether there is an issue with the uenv or something else in your environment. |
| 49 | + If you are using Cray modules you must load the corresponding accelerator module, e.g. `craype-accel-nvidia90`, before compiling your application. |
| 50 | + |
| 51 | + Alternatively, if you wish to not use GPU-aware MPI, either unset `MPICH_GPU_SUPPORT_ENABLED` or explicitly set it to `0` in your launch scripts. |
| 52 | + |
| 53 | +## Known issues |
| 54 | + |
| 55 | +This section documents known issues related to Cray MPICH on Alps. Resolved issues are also listed for reference. |
| 56 | + |
| 57 | +### Existing Issues |
| 58 | + |
| 59 | +#### Cray MPICH hangs |
| 60 | + |
| 61 | +Cray MPICH may sometimes hang on larger runs. |
| 62 | + |
| 63 | +!!! info "Workaround" |
| 64 | + |
| 65 | + There are many possible reasons why an application would hang, many unrelated to Cray MPICH. However, if you are experiencing hangs the issue may be worked around by setting: |
| 66 | + ```bash |
| 67 | + export FI_MR_CACHE_MONITOR=disabled |
| 68 | + ``` |
| 69 | + |
| 70 | +Performance may be negatively affected by this option. |
| 71 | + |
| 72 | +### Resolved issues |
| 73 | + |
| 74 | +#### `"cxil_map: write error"` when doing inter-node GPU-aware MPI communication |
| 75 | + |
| 76 | +!!! info |
| 77 | + The issue has been resolved by a system update on 7th October 2024 and the workaround is no longer needed. |
| 78 | + The issue was caused by a system misconfiguration. |
| 79 | + |
| 80 | +When doing inter-node GPU-aware communication with Cray MPICH after the October 2024 update on Alps, applications will fail with: |
| 81 | +```bash |
| 82 | +cxil_map: write error |
| 83 | +``` |
| 84 | + |
| 85 | +??? Workaround |
| 86 | + The only workaround is to not use inter-node GPU-aware MPI. |
| 87 | + |
| 88 | + For users of CP2K encountering this issue, one can disable the use of COSMA, which uses GPU-aware MPI, by placing the following in the `&GLOBAL`` section of your input file: |
| 89 | + ```bash |
| 90 | + &FM |
| 91 | + TYPE_OF_MATRIX_MULTIPLICATION SCALAPACK |
| 92 | + &END FM |
| 93 | + ``` |
| 94 | + |
| 95 | + Unless you run RPA calculations, this should have limited impact on performance. |
| 96 | + |
| 97 | +#### `MPI_THREAD_MULTIPLE` does not work |
| 98 | + |
| 99 | +!!! info |
| 100 | + The issue has been resolved in Cray MPICH version 8.1.30. |
| 101 | + |
| 102 | +When using `MPI_THREAD_MULTIPLE` on GH200 systems Cray MPICH may fail with an assertion that looks similar to: |
| 103 | +```bash |
| 104 | +Assertion failed [...]: (&MPIR_THREAD_GLOBAL_ALLFUNC_MUTEX)->count == 0 |
| 105 | +``` |
| 106 | + |
| 107 | +or |
| 108 | + |
| 109 | +```bash |
| 110 | +Assertion failed [...]: MPIR_THREAD_GLOBAL_ALLFUNC_MUTEX.count == 0 |
| 111 | +``` |
| 112 | + |
| 113 | +??? Workaround |
| 114 | + The issue can be worked around by falling back to a less optimized implementation of `MPICH_THREAD_MULTIPLE` by setting `MPICH_OPT_THREAD_SYNC=0`. |
0 commit comments