You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/software/communication/openmpi.md
+22-8Lines changed: 22 additions & 8 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -31,14 +31,28 @@ srun --mpi=pmix ...
31
31
Additionally, the following environment variables should be set:
32
32
```bash
33
33
export PMIX_MCA_psec="native"# (1)
34
-
export FI_PROVIDER="lnx"# (2)
35
-
export FI_LNX_PROV_LINKS="shm+cxi"# (3)
36
-
export OMPI_MCA_pml="^ucx"# (4)
37
-
export OMPI_MCA_mtl="ofi"# (5)
34
+
export FI_PROVIDER="cxi"# (2)
35
+
export OMPI_MCA_pml="^ucx"# (3)
36
+
export OMPI_MCA_mtl="ofi"# (4)
38
37
```
39
38
40
39
1. Ensures PMIx uses the same security domain as Slurm. Otherwise PMIx will print warnings at startup.
41
-
2. Use the [libfabric LINKx](https://ofiwg.github.io/libfabric/v2.1.0/man/fi_lnx.7.html) provider, to allow using different libfabric providers for inter- and intra-node communication.
42
-
3. Use the shared memory provider for intra-node communication and the CXI (Slingshot) provider for inter-node communication.
43
-
4. Use anything except [UCX](https://openucx.org/documentation/) for [point-to-point communication](https://docs.open-mpi.org/en/v5.0.x/mca.html#selecting-which-open-mpi-components-are-used-at-run-time).
44
-
5. Use libfabric for the [Matching Transport Layer](https://docs.open-mpi.org/en/v5.0.x/mca.html#frameworks).
40
+
2. Use the CXI (Slingshot) provider.
41
+
3. Use anything except [UCX](https://openucx.org/documentation/) for [point-to-point communication](https://docs.open-mpi.org/en/v5.0.x/mca.html#selecting-which-open-mpi-components-are-used-at-run-time).
42
+
4. Use libfabric for the [Matching Transport Layer](https://docs.open-mpi.org/en/v5.0.x/mca.html#frameworks).
43
+
44
+
!!! info "CXI provider does all communication through the network interface cards (NICs)"
45
+
When using the libfabric CXI provider, all communication goes through NICs, including intra-node communication.
46
+
This means that intra-node communication can not make use of shared memory optimizations and the maximum bandwidth will not be severely limited.
47
+
48
+
Libfabric has a new [LINKx](https://ofiwg.github.io/libfabric/v2.1.0/man/fi_lnx.7.html) provider, which allows using different libfabric providers for inter- and intra-node communication.
49
+
This provider is not as well tested, but can in theory perform better for intra-node communication, because it can use shared memory.
50
+
To use the LINKx provider, set the following, instead of `FI_PROVIDER=cxi`:
51
+
52
+
```bash
53
+
export FI_PROVIDER="lnx" # (1)
54
+
export FI_LNX_PROV_LINKS="shm+cxi" # (2)
55
+
```
56
+
57
+
1. Use the libfabric LINKx provider, to allow using different libfabric providers for inter- and intra-node communication.
58
+
2. Use the shared memory provider for intra-node communication and the CXI (Slingshot) provider for inter-node communication.
0 commit comments