Skip to content

Commit 5876615

Browse files
committed
draft docs
1 parent ba1a34d commit 5876615

File tree

3 files changed

+98
-36
lines changed

3 files changed

+98
-36
lines changed

docs/cluster-config.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -3,11 +3,12 @@
33
Spack stacks are built on bare-metal clusters using a minimum of dependencies from the underlying system.
44
A cluster configuration is a directory with the following structure:
55

6+
TODO: document layout of the `network.yaml` file
7+
68
```
79
/path/to/cluster/configuration
8-
├─ compilers.yaml # system compiler
910
├─ packages.yaml # external system packages
10-
├─ concretiser.yaml
11+
├─ network.yaml # configuration options for network libraries
1112
└─ repos.yaml # optional reference to additional site packages
1213
```
1314

@@ -51,9 +52,8 @@ This is designed to make it encourage putting cluster definitions and the site d
5152
```
5253
/path/to/cluster-configs
5354
├─ my_cluster
54-
│ ├─ compilers.yaml
5555
│ ├─ packages.yaml
56-
│ ├─ concretiser.yaml
56+
│ ├─ network.yaml
5757
│ └─ repos.yaml # refers to ../site/repo
5858
└─ site
5959
└─ repo # the site wide repo

docs/porting.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -98,8 +98,11 @@ Because of this, the compiler description is greatly streamlined.
9898
version: "25.1"
9999
```
100100

101+
[](){#ref-porting-network}
101102
## `environments.yaml`
102103

104+
TODO: document `mpi` -> `network` field.
105+
103106
The main change in `environments.yaml` is how the compiler toolchain is specified.
104107
The compilers are provided as a list, without version information.
105108

docs/recipes.md

Lines changed: 91 additions & 32 deletions
Original file line numberDiff line numberDiff line change
@@ -100,7 +100,7 @@ To provide a single Spack stack that meets the workflow's needs, we would create
100100
# A GCC-based programming environment
101101
prgenv-gnu:
102102
compiler: # ... compiler toolchain
103-
mpi: # ... mpi configuration
103+
network: # ... network configuration
104104
deprecated: # ... whether to allow usage of deprecated packages or not
105105
unify: # ... configure Spack concretizer
106106
specs: # ... list of packages to install
@@ -146,48 +146,107 @@ For example, in the recipe below, only `netcdf-fortran` will be built with the `
146146
!!! note
147147
This approach is typically used to build Fortran applications and packages with one toolchain (e.g. `nvhpc`), and all of the C/C++ dependencies with a different toolchain (e.g. `gcc`).
148148

149-
### MPI
149+
[](){#ref-recipes-network}
150+
### MPI and networking
150151

151-
Stackinator can configure cray-mpich (CUDA, ROCM, or non-GPU aware) on a per-environment basis, by setting the `mpi` field in an environment.
152+
Stackinator can configure MPI (cray-mpich and OpenMPI) its dependencies (libfabric, cxi, etc) through the `network` field.
152153

153-
!!! note
154-
Future versions of Stackinator will support OpenMPI, MPICH and MVAPICH when (and if) they develop robust support for HPE SlingShot 11 interconnect.
154+
!!! note ""
155+
The `network` field replaces the `mpi` field in Stackinator 6.
156+
See the [porting guide][ref-porting-network] for guidance on updating uenv recipes for Spack 1.0.
157+
158+
If the `network` field is not set, or is set to `null`, MPI will not be configured in an environment:
155159

156-
If the `mpi` field is not set, or is set to `null`, MPI will not be configured in an environment:
157-
```yaml title="environments.yaml: no MPI"
160+
```yaml title="environments.yaml no network/mpi stack"
158161
serial-env:
159-
mpi: null
160-
# ...
162+
network: null
161163
```
162164

163-
To configure MPI without GPU support, set the `spec` field with an optional version:
164-
```yaml title="environments.yaml: MPI without GPU support"
165-
host-env:
166-
mpi:
167-
spec: cray-mpich@8.1.23
168-
# ...
169-
```
165+
The `network` field has separate fields for defining cray-mpich, OpenMPI and additional custom package definitions
170166

171-
GPU-aware MPI can be configured by setting the optional `gpu` field to specify whether to support `cuda` or `rocm` GPUs:
172-
```yaml title="environments.yaml: GPU aware MPI"
173-
cuda-env:
174-
mpi:
175-
spec: cray-mpich
176-
gpu: cuda
177-
# ...
178-
rocm-env:
179-
mpi:
180-
spec: cray-mpich
181-
gpu: rocm
182-
# ...
167+
```yaml title="enironments.yaml overview of options"
168+
<env-name>:
169+
network:
170+
cray-mpich: # describe cray-mpich (can not be used with openmpi)
171+
openmpi: # describe openmpi (can not be used with cray-mpich)
172+
specs: # additional custom specs for dependencies (libfabric etc)
183173
```
184174

175+
#### Configuring MPI
176+
185177
!!! alps
186178

187-
As new versions of cray-mpich are released with CPE, they are provided on Alps vClusters, via the Spack package repo in the [CSCS cluster configuration repo](https://github.com/eth-cscs/alps-cluster-config/tree/main/site/spack_repo/alps).
179+
The recommended MPI distribution on Alps is `cray-mpich`, as it is the most widely tested MPI for the libfabric/slingshot network.
188180

189-
!!! note
190-
The `cray-mpich` spec is added to the list of package specs automatically, and all packages that use the virtual dependency `+mpi` will use this `cray-mpich`.
181+
OpenMPI's support for the Slingshot network is improving, however it may not be optimal for many applications, or requires more effort to fine tune.
182+
As such, it is recommended as an option for applications that have performance issues or bugs with cray-mpich.
183+
184+
185+
It is only possible to have one MPI implementation in an environment - choose one of `cray-mpich` or `openmpi`.
186+
187+
Most of the time, you will want to use the "defaults" that are configured in [alps-cluster-config](https://github.com/eth-cscs/alps-cluster-config)
188+
189+
=== "cray-mpich"
190+
191+
```yaml
192+
network:
193+
cray-mpich:
194+
gpu: <one of cuda, rocm or null> # default is system specific
195+
version: <one of version string or null>
196+
```
197+
198+
=== "openmpi"
199+
200+
```yaml
201+
network:
202+
openmpi:
203+
gpu: <one of cuda, rocm or null> # default is system specific
204+
version: <one of version string or null>
205+
```
206+
??? question "What are the defaults?"
207+
The defaults are cluster-specific, for example on a system with NVIDIA GPUs, cray-mpich and openmpi will probably be configured to enable `gpu=cuda` by default.
208+
209+
See the `network.yaml` file in the cluster configuration for the default flags, and for the definitions of the `cray-mpich`, `openmpi`, `libfabric`, and `libcxi` Spack packages.
210+
211+
Possibly changing which version of MPI or whether to enable GPU support, as shown in the following examples:
212+
213+
!!! example "configure with the defaults"
214+
=== "cray-mpich"
215+
216+
Choose the default version of cray-mpich with cuda support enabled.
217+
218+
```yaml
219+
network:
220+
cray-mpich:
221+
gpu: cuda
222+
```
223+
224+
=== "openmpi"
225+
226+
Choose the openmpi version 5.0.6 with the default GPU support for the target system.
227+
228+
```yaml
229+
network:
230+
openmpi:
231+
version: 5.0.6
232+
```
233+
234+
It is possible to fully customise how MPI is built by providing the full spec instead of setting individual sub-options.
235+
This is an advanced option, that the majority of uenv authors will not use, because stackinator aims to simplify MPI deployment on HPC Alps.
236+
237+
!!! example "openmpi with a custom spec"
238+
Build OpenMPI with a fully customised spec
239+
240+
```yaml
241+
openmpi: openmpi@5.0.7 +cray-xpmem +cuda +internal-pmix fabrics=cma,ofi,xpmem schedulers=slurm
242+
specs: ['libfabric@2.2 fabrics=cxi,rxm,tcp +debug']
243+
```
244+
245+
Note that when customising the full spec, you will probably also need to fine tune the network stack dependencies using `specs`.
246+
247+
#### Custimsing network dependences with specs
248+
249+
You can provide
191250

192251
### Specs
193252

@@ -224,7 +283,7 @@ cuda-env:
224283
Use `unify:true` when possible, then `unify:when_possible`, and finally `unify:false`.
225284

226285
!!! warning
227-
Don't provide a spec for MPI or Compilers, which are configured in the [`mpi:`](recipes.md#mpi) and [`compilers`](recipes.md#compilers) fields respecively.
286+
Don't provide a spec for MPI or Compilers, which are configured in the [`network:`][ref-recipes-network] and [`compilers`](recipes.md#compilers) fields respectively.
228287

229288
!!! warning
230289
Stackinator does not support "spec matrices", and likely won't, because they use multiple compiler toolchains in a manner that is contrary to the Stackinator "keep it simple" principle.

0 commit comments

Comments
 (0)