-
Notifications
You must be signed in to change notification settings - Fork 41
Expand communication pages #75
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 5 commits
477e097
7a871b3
ea89fe0
4b2a984
59f5ba2
8f15929
259fd4b
30901d1
64db5bf
c93b4df
988c24a
79c51c2
9ae6744
4b7ae6b
f0b7e1d
18aee3f
49af1cc
b1e6b3a
36262c8
20a8b3c
2b2ba8c
4ea05bc
4b9a49c
b489566
c7703c7
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,5 +1,5 @@ | ||
| * @bcumming @msimberg @RMeli | ||
| docs/services/firecrest @jpdorsch @ekouts | ||
| docs/software/communication @msimberg | ||
| docs/software/communication @biddisco @Madeeks @msimberg | ||
| docs/software/prgenv/linalg.md @finkandreas @msimberg | ||
| docs/software/sciapps/cp2k.md @abussy @RMeli | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -6,5 +6,39 @@ However, [OpenMPI](https://www.open-mpi.org/) can be used as an alternative in s | |
|
|
||
| To use OpenMPI on Alps, it must be built against [libfabric][ref-communication-libfabric] with support for the [Slingshot 11 network][ref-alps-hsn]. | ||
|
|
||
| ## Using OpenMPI | ||
|
|
||
| !!! warning | ||
| Building and using OpenMPI on Alps is still [work in progress](https://eth-cscs.github.io/cray-network-stack/). | ||
| The instructions found on this page may be inaccurate, but are a good starting point to using OpenMPI on Alps. | ||
|
|
||
| !!! todo | ||
| Deploy experimental uenv. | ||
|
Collaborator
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Will do - we don't need to do this before we deploy these docs |
||
|
|
||
| !!! todo | ||
| Building OpenMPI for Alps is still work in progress: https://eth-cscs.github.io/cray-network-stack/. | ||
| Document OpenMPI uenv next to prgenv-gnu, prgenv-nvfortran, and linalg? | ||
|
|
||
| OpenMPI is provided through a [uenv][ref-uenv] similar to [`prgenv-gnu`][ref-uenv-prgenv-gnu]. | ||
| Once the uenv is loaded, compiling and linking with OpenMPI and libfabric is transparent. | ||
| At runtime, some additional options must be set to correctly use the Slingshot network. | ||
|
|
||
| First, when launching applications through slurm, [PMIx](https://pmix.github.com) must be used for application launching. | ||
| This is done with the `--mpi` flag of `srun`: | ||
| ```bash | ||
| srun --mpi=pmix ... | ||
| ``` | ||
|
|
||
| Additionally, the following environment variables should be set: | ||
| ```bash | ||
| export PMIX_MCA_psec="native" # (1) | ||
| export FI_PROVIDER="lnx" # (2) | ||
| export FI_LNX_PROV_LINKS="shm+cxi" # (3) | ||
msimberg marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| export OMPI_MCA_pml="^ucx" # (4) | ||
| export OMPI_MCA_mtl="ofi" # (5) | ||
|
||
| ``` | ||
|
|
||
| 1. Ensures PMIx uses the same security domain as Slurm. Otherwise PMIx will print warnings at startup. | ||
| 2. Use the [libfabric LINKx](https://ofiwg.github.io/libfabric/v2.1.0/man/fi_lnx.7.html) provider, to allow using different libfabric providers for inter- and intra-node communication. | ||
| 3. Use the shared memory provider for intra-node communication and the CXI (Slingshot) provider for inter-node communication. | ||
| 4. Use anything except [UCX](https://openucx.org/documentation/) for [point-to-point communication](https://docs.open-mpi.org/en/v5.0.x/mca.html#selecting-which-open-mpi-components-are-used-at-run-time). | ||
| 5. Use libfabric for the [Matching Transport Layer](https://docs.open-mpi.org/en/v5.0.x/mca.html#frameworks). | ||
Uh oh!
There was an error while loading. Please reload this page.