Skip to content

Commit d6cafbc

Browse files
committed
Start "Communication Libraries" section
1 parent 8b1aae0 commit d6cafbc

File tree

8 files changed

+176
-0
lines changed

8 files changed

+176
-0
lines changed

docs/index.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -65,6 +65,7 @@ The Alps Research infrastructure hosts multiple platforms and clusters targeting
6565

6666
</div>
6767

68+
[](){#ref-get-in-touch}
6869
## Get in Touch
6970

7071
If you can't find the information that you need in the documentation, help is available.
Lines changed: 114 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,114 @@
1+
[](){#ref-communication-cray-mpich}
2+
# Cray MPICH
3+
4+
Cray MPICH is the recommended MPI implementation on Alps.
5+
It is available through uenvs like [prgenv-gnu][ref-uenv-prgenv-gnu] and [the application-specific uenvs][ref-software-sciapps].
6+
7+
The [Cray MPICH documentation](https://cpe.ext.hpe.com/docs/latest/mpt/mpich/index.html) contains detailed information about Cray MPICH.
8+
On this page we outline the most common workflows and issues that you may encounter on Alps.
9+
10+
## GPU-aware MPI
11+
12+
We recommend using GPU-aware MPI whenever possible, as it almost always provides a significant performance improvement compared to communication through CPU memory.
13+
To use GPU-aware MPI with Cray MPICH, 1. the application must be linked to the GTL library, and 2. the `MPICH_GPU_SUPPORT_ENABLED=1` environment variable must be set. If either of these are missing, the application will fail to communicate GPU buffers.
14+
15+
In supported uenvs, Cray MPICH is built with GPU support (on clusters that have GPUs).
16+
This means that Cray MPICH will automatically be linked to the GTL library, which implements the the GPU support for Cray MPICH.
17+
18+
??? info "Checking that the application links to the GTL library"
19+
20+
To check if your application is linked against the required GTL library, running `ldd`` on your executable should print something similar to:
21+
$ ldd myexecutable | grep gtl
22+
libmpi_gtl_cuda.so => /user-environment/linux-sles15-neoverse_v2/gcc-13.2.0/cray-gtl-8.1.30-fptqzc5u6t4nals5mivl75nws2fb5vcq/lib/libmpi_gtl_cuda.so (0x0000ffff82aa0000)
23+
24+
The path may be different, but the `libmpi_gtl_cuda.so` library should be printed when using CUDA.
25+
In ROCm environments the `libmpi_gtl_hsa.so` library should be linked.
26+
If the GTL library is not linked, nothing will be printed.
27+
28+
In addition to linking to the GTL library, Cray MPICH must be configured to be GPU-aware at runtime by setting the `MPICH_GPU_SUPPORT_ENABLED=1` environment variable.
29+
On some CSCS systems this option is set by default.
30+
See [this page][ref-slurm-gh200] for more information on configuring SLURM to use GPUs.
31+
32+
!!! warning "Segmentation faults when trying to communicate GPU buffers without `MPICH_GPU_SUPPORT_ENABLED=1`"
33+
If you attempt to communicate GPU buffers through MPI without setting `MPICH_GPU_SUPPORT_ENABLED=1`, it will lead to segmentation faults, usually without any specific indication that it is the communication that fails.
34+
Make sure that the option is set if you are communicating GPU buffers through MPI.
35+
36+
!!! warning "Error: "`GPU_SUPPORT_ENABLED` is requested, but GTL library is not linked""
37+
If `MPICH_GPU_SUPPORT_ENABLED` is set to `1` and your application does not link against one of the GTL libraries you will get an error similar to the following during MPI initialization:
38+
```bash
39+
MPICH ERROR [Rank 0] [job id 410301.1] [Thu Feb 13 12:42:18 2025] [nid005414] - Abort(-1) (rank 0 in comm 0): MPIDI_CRAY_init: GPU_SUPPORT_ENABLED is requested, but GTL library is not linked
40+
(Other MPI error)
41+
42+
aborting job:
43+
MPIDI_CRAY_init: GPU_SUPPORT_ENABLED is requested, but GTL library is not linked
44+
```
45+
46+
This means that the required GTL library was not linked to the application.
47+
In supported uenvs, GPU support is enabled by default.
48+
If you believe a uenv should have GPU support but you are getting the above error, feel free to [get in touch with us][ref-get-in-touch] to understand whether there is an issue with the uenv or something else in your environment.
49+
If you are using Cray modules you must load the corresponding accelerator module, e.g. `craype-accel-nvidia90`, before compiling your application.
50+
51+
Alternatively, if you wish to not use GPU-aware MPI, either unset `MPICH_GPU_SUPPORT_ENABLED` or explicitly set it to `0` in your launch scripts.
52+
53+
## Known issues
54+
55+
This section documents known issues related to Cray MPICH on Alps. Resolved issues are also listed for reference.
56+
57+
### Existing Issues
58+
59+
#### Cray MPICH hangs
60+
61+
Cray MPICH may sometimes hang on larger runs.
62+
63+
!!! info "Workaround"
64+
65+
There are many possible reasons why an application would hang, many unrelated to Cray MPICH. However, if you are experiencing hangs the issue may be worked around by setting:
66+
```bash
67+
export FI_MR_CACHE_MONITOR=disabled
68+
```
69+
70+
Performance may be negatively affected by this option.
71+
72+
### Resolved issues
73+
74+
#### `"cxil_map: write error"` when doing inter-node GPU-aware MPI communication
75+
76+
!!! info
77+
The issue has been resolved by a system update on 7th October 2024 and the workaround is no longer needed.
78+
The issue was caused by a system misconfiguration.
79+
80+
When doing inter-node GPU-aware communication with Cray MPICH after the October 2024 update on Alps, applications will fail with:
81+
```bash
82+
cxil_map: write error
83+
```
84+
85+
??? Workaround
86+
The only workaround is to not use inter-node GPU-aware MPI.
87+
88+
For users of CP2K encountering this issue, one can disable the use of COSMA, which uses GPU-aware MPI, by placing the following in the `&GLOBAL`` section of your input file:
89+
```bash
90+
&FM
91+
TYPE_OF_MATRIX_MULTIPLICATION SCALAPACK
92+
&END FM
93+
```
94+
95+
Unless you run RPA calculations, this should have limited impact on performance.
96+
97+
#### `MPI_THREAD_MULTIPLE` does not work
98+
99+
!!! info
100+
The issue has been resolved in Cray MPICH version 8.1.30.
101+
102+
When using `MPI_THREAD_MULTIPLE` on GH200 systems Cray MPICH may fail with an assertion that looks similar to:
103+
```bash
104+
Assertion failed [...]: (&MPIR_THREAD_GLOBAL_ALLFUNC_MUTEX)->count == 0
105+
```
106+
107+
or
108+
109+
```bash
110+
Assertion failed [...]: MPIR_THREAD_GLOBAL_ALLFUNC_MUTEX.count == 0
111+
```
112+
113+
??? Workaround
114+
The issue can be worked around by falling back to a less optimized implementation of `MPICH_THREAD_MULTIPLE` by setting `MPICH_OPT_THREAD_SYNC=0`.
Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,17 @@
1+
[](){#ref-software-communication}
2+
# Communication Libraries
3+
4+
CSCS provides common communication libraries optimized for the [Slingshot 11 network on Alps][ref-alps-hsn].
5+
6+
For most scientific applications relying on MPI, [Cray MPICH][ref-communication-cray-mpich] is recommended.
7+
8+
Most machine learning applications rely on [NCCL][ref-communication-nccl] for high-performance implementations of collectives.
9+
NCCL has to be configured with a plugin using [libfabric][ref-communication-libfabric] to make full use of the Slingshot network.
10+
11+
See the individual pages for each library for information on how to use and best configure the libraries.
12+
13+
* [Cray MPICH][ref-communication-cray-mpich]
14+
* [OpenMPI][ref-communication-openmpi]
15+
* [NCCL][ref-communication-nccl]
16+
* [RCCL][ref-communication-rccl]
17+
* [libfabric][ref-communication-libfabric]
Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
[](){#ref-communication-libfabric}
2+
# Libfabric
3+
4+
[Libfabric](https://ofiwg.github.io/libfabric/), or Open Fabrics Interfaces (OFI), is a low level networking library that abstracts away various networking backends.
5+
It is used by Cray MPICH, and can be used together with OpenMPI, NCCL, and RCCL to make use of the [Slingshot network on Alps][ref-alps-hsn].
6+
7+
!!! todo
Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
[](){#ref-communication-nccl}
2+
# NCCL
3+
4+
[NCCL](https://developer.nvidia.com/nccl) is an optimized inter-GPU communication library for NVIDIA GPUs.
5+
It is commonly used in machine learning frameworks, but traditional scientific applications can also benefit from NCCL.
6+
7+
!!! todo
8+
- high level description
9+
- libfabric/aws-ofi-nccl plugin
10+
- configuration options
Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
[](){#ref-communication-openmpi}
2+
# OpenMPI
3+
4+
[Cray MPICH][ref-communication-cray-mpich] is the recommended MPI implementation on Alps.
5+
However, [OpenMPI](https://www.open-mpi.org/) can be used as an alternative in some cases, with limited support from CSCS.
6+
7+
To use OpenMPI on Alps, it must be built against [libfabric][ref-communication-libfabric] with support for the [Slingshot 11 network][ref-alps-hsn].
8+
9+
!!! todo
10+
Building OpenMPI for Alps is still work in progress: https://eth-cscs.github.io/cray-network-stack/.
Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
[](){#ref-communication-rccl}
2+
# RCCL
3+
4+
[RCCL](https://rocmdocs.amd.com/projects/rccl/en/latest/) is an optimized inter-GPU communication library for AMD GPUs.
5+
It provides equivalent functionality to [NCCL][ref-communication-nccl] for AMD GPUs.
6+
7+
!!! todo
8+
- high level description
9+
- libfabric/aws-ofi-rccl plugin
10+
- configuration options

mkdocs.yml

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -57,6 +57,13 @@ nav:
5757
- 'prgenv-gnu': software/prgenv/prgenv-gnu.md
5858
- 'prgenv-nvfortran': software/prgenv/prgenv-nvfortran.md
5959
- 'linalg': software/prgenv/linalg.md
60+
- 'Communication Libraries':
61+
- software/communication/index.md
62+
- 'Cray MPICH': software/communication/cray-mpich.md
63+
- 'OpenMPI': software/communication/openmpi.md
64+
- 'NCCL': software/communication/nccl.md
65+
- 'RCCL': software/communication/rccl.md
66+
- 'libfabric': software/communication/libfabric.md
6067
- 'Tools':
6168
- software/tools/index.md
6269
- 'Linaro Forge': software/tools/linaro.md

0 commit comments

Comments
 (0)