You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/software/communication/openmpi.md
+35-1Lines changed: 35 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -6,5 +6,39 @@ However, [OpenMPI](https://www.open-mpi.org/) can be used as an alternative in s
6
6
7
7
To use OpenMPI on Alps, it must be built against [libfabric][ref-communication-libfabric] with support for the [Slingshot 11 network][ref-alps-hsn].
8
8
9
+
## Using OpenMPI
10
+
11
+
!!! warning
12
+
Building and using OpenMPI on Alps is still [work in progress](https://eth-cscs.github.io/cray-network-stack/).
13
+
The instructions found on this page may be inaccurate, but are a good starting point to using OpenMPI on Alps.
14
+
15
+
!!! todo
16
+
Deploy experimental uenv.
17
+
9
18
!!! todo
10
-
Building OpenMPI for Alps is still work in progress: https://eth-cscs.github.io/cray-network-stack/.
19
+
Document OpenMPI uenv next to prgenv-gnu, prgenv-nvfortran, and linalg?
20
+
21
+
OpenMPI is provided through a [uenv][ref-uenv] similar to [`prgenv-gnu`][ref-uenv-prgenv-gnu].
22
+
Once the uenv is loaded, compiling and linking with OpenMPI and libfabric is transparent.
23
+
At runtime, some additional options must be set to correctly use the Slingshot network.
24
+
25
+
First, when launching applications through slurm, [PMIx](https://pmix.github.com) must be used for application launching.
26
+
This is done with the `--mpi` flag of `srun`:
27
+
```bash
28
+
srun --mpi=pmix ...
29
+
```
30
+
31
+
Additionally, the following environment variables should be set:
32
+
```bash
33
+
export PMIX_MCA_psec="native"# (1)
34
+
export FI_PROVIDER="lnx"# (2)
35
+
export FI_LNX_PROV_LINKS="shm+cxi"# (3)
36
+
export OMPI_MCA_pml="^ucx"# (4)
37
+
export OMPI_MCA_mtl="ofi"# (5)
38
+
```
39
+
40
+
1. Ensures PMIx uses the same security domain as Slurm. Otherwise PMIx will print warnings at startup.
41
+
2. Use the [libfabric LINKx](https://ofiwg.github.io/libfabric/v2.1.0/man/fi_lnx.7.html) provider, to allow using different libfabric providers for inter- and intra-node communication.
42
+
3. Use the shared memory provider for intra-node communication and the CXI (Slingshot) provider for inter-node communication.
43
+
4. Use anything except [UCX](https://openucx.org/documentation/) for [point-to-point communication](https://docs.open-mpi.org/en/v5.0.x/mca.html#selecting-which-open-mpi-components-are-used-at-run-time).
44
+
5. Use libfabric for the [Matching Transport Layer](https://docs.open-mpi.org/en/v5.0.x/mca.html#frameworks).
0 commit comments