Skip to content

Commit 7a871b3

Browse files
committed
Add a few environment variables for OpenMPI on Alps
1 parent 477e097 commit 7a871b3

File tree

1 file changed

+35
-1
lines changed

1 file changed

+35
-1
lines changed

docs/software/communication/openmpi.md

Lines changed: 35 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,5 +6,39 @@ However, [OpenMPI](https://www.open-mpi.org/) can be used as an alternative in s
66

77
To use OpenMPI on Alps, it must be built against [libfabric][ref-communication-libfabric] with support for the [Slingshot 11 network][ref-alps-hsn].
88

9+
## Using OpenMPI
10+
11+
!!! warning
12+
Building and using OpenMPI on Alps is still [work in progress](https://eth-cscs.github.io/cray-network-stack/).
13+
The instructions found on this page may be inaccurate, but are a good starting point to using OpenMPI on Alps.
14+
15+
!!! todo
16+
Deploy experimental uenv.
17+
918
!!! todo
10-
Building OpenMPI for Alps is still work in progress: https://eth-cscs.github.io/cray-network-stack/.
19+
Document OpenMPI uenv next to prgenv-gnu, prgenv-nvfortran, and linalg?
20+
21+
OpenMPI is provided through a [uenv][ref-uenv] similar to [`prgenv-gnu`][ref-uenv-prgenv-gnu].
22+
Once the uenv is loaded, compiling and linking with OpenMPI and libfabric is transparent.
23+
At runtime, some additional options must be set to correctly use the Slingshot network.
24+
25+
First, when launching applications through slurm, [PMIx](https://pmix.github.com) must be used for application launching.
26+
This is done with the `--mpi` flag of `srun`:
27+
```bash
28+
srun --mpi=pmix ...
29+
```
30+
31+
Additionally, the following environment variables should be set:
32+
```bash
33+
export PMIX_MCA_psec="native" # (1)
34+
export FI_PROVIDER="lnx" # (2)
35+
export FI_LNX_PROV_LINKS="shm+cxi" # (3)
36+
export OMPI_MCA_pml="^ucx" # (4)
37+
export OMPI_MCA_mtl="ofi" # (5)
38+
```
39+
40+
1. Ensures PMIx uses the same security domain as Slurm. Otherwise PMIx will print warnings at startup.
41+
2. Use the [libfabric LINKx](https://ofiwg.github.io/libfabric/v2.1.0/man/fi_lnx.7.html) provider, to allow using different libfabric providers for inter- and intra-node communication.
42+
3. Use the shared memory provider for intra-node communication and the CXI (Slingshot) provider for inter-node communication.
43+
4. Use anything except [UCX](https://openucx.org/documentation/) for [point-to-point communication](https://docs.open-mpi.org/en/v5.0.x/mca.html#selecting-which-open-mpi-components-are-used-at-run-time).
44+
5. Use libfabric for the [Matching Transport Layer](https://docs.open-mpi.org/en/v5.0.x/mca.html#frameworks).

0 commit comments

Comments
 (0)