eth-cscs · bcumming · Jul 2, 2025 · Jul 1, 2025
@@ -12,7 +12,7 @@ Bristen consists of 32 A100 nodes [NVIDIA A100 nodes][ref-alps-a100-node]. The n
 |-----------|--------| ----------------- | ---------- |
 | [a100][ref-alps-a100-node] | 32 | 32 | 128 |
 
-Nodes are in the [`normal` slurm partition][ref-slurm-partition-normal].
+Nodes are in the [`normal` Slurm partition][ref-slurm-partition-normal].
 
 ### Storage and file systems
 
@@ -48,7 +48,7 @@ Users are encouraged to use containers on Bristen.
 
 Bristen uses [Slurm][ref-slurm] as the workload manager, which is used to launch and monitor distributed workloads, such as training runs.
 
-There is currently a single slurm partition on the system:
+There is currently a single Slurm partition on the system:
 
 * the `normal` partition is for all production workloads.
     + nodes in this partition are not shared.

@@ -14,7 +14,7 @@ The number of nodes can change when nodes are added or removed from other cluste
 |-----------|--------| ----------------- | ---------- |
 | [gh200][ref-alps-gh200-node] | 1,200 | 4,800 | 4,800 |
 
-Most nodes are in the [`normal` slurm partition][ref-slurm-partition-normal], while a few nodes are in the [`debug` partition][ref-slurm-partition-debug].
+Most nodes are in the [`normal` Slurm partition][ref-slurm-partition-normal], while a few nodes are in the [`debug` partition][ref-slurm-partition-debug].
 
 ### Storage and file systems
 
@@ -71,7 +71,7 @@ Alternatively, [uenv][ref-uenv] are also available on Clariden. Currently deploy
 
 Clariden uses [Slurm][ref-slurm] as the workload manager, which is used to launch and monitor distributed workloads, such as training runs.
 
-There are two slurm partitions on the system:
+There are two Slurm partitions on the system:
 
 * the `normal` partition is for all production workloads.
 * the `debug` partition can be used to access a small allocation for up to 30 minutes for debugging and testing purposes.

@@ -56,7 +56,7 @@ The report is divided into two parts: a general summary and GPU specific values.
 | Field | Description |
 | ----- | ----------- |
 | Job Id | The Slurm job id |
-| Step Id | The slurm step id. A job step in Slurm is a subdivision of a job started with srun |
+| Step Id | The Slurm step id. A job step in Slurm is a subdivision of a job started with srun |
 | User | The user account that submitted the job |
 | Slurm Account | The project account that will be billed |
 | Start Time, End Time, Elapsed Time | The time the job started and ended, and how long it ran |
@@ -77,7 +77,7 @@ The report is divided into two parts: a general summary and GPU specific values.
 | SM Utilization % | The percentage of the process's lifetime during which Streaming Multiprocessors (SM) were executing a kernel |
 | Memory Utilization % | The percentage of process's lifetime during which global (device) memory was being read or written |
 
-## Example with slurm: srun
+## Example with Slurm: srun
 
 The simplest example to test `jobreport` is to run it with the sleep command.
 It is important to separate `jobreport` (and its options) and your command  with `--`.
@@ -155,7 +155,7 @@ GPU Specific Values
     4. Uncheck "Set locale environment variables on startup"
     5. Quit and reopen the terminal and try again. This should fix the issue.
 
-## Example with slurm: batch script
+## Example with Slurm: batch script
 
 The `jobreport` command can be used in a batch script
 The report printing, too, can be included in the script and does not need the `srun` command.

@@ -22,7 +22,7 @@ OpenMPI is provided through a [uenv][ref-uenv] similar to [`prgenv-gnu`][ref-uen
 Once the uenv is loaded, compiling and linking with OpenMPI and libfabric is transparent.
 At runtime, some additional options must be set to correctly use the Slingshot network.
 
-First, when launching applications through slurm, [PMIx](https://pmix.github.com) must be used for application launching.
+First, when launching applications through Slurm, [PMIx](https://pmix.github.com) must be used for application launching.
 This is done with the `--mpi` flag of `srun`:
 ```bash
 srun --mpi=pmix ...

@@ -65,7 +65,7 @@ On our systems, CP2K is built with the following dependencies:
 
 ### Running on the HPC platform
 
-To start a job, two bash scripts are potentially required: a [slurm] submission script, and a wrapper to start the [CUDA
+To start a job, two bash scripts are potentially required: a [Slurm] submission script, and a wrapper to start the [CUDA
 MPS] daemon so that multiple MPI ranks can use the same GPU.
 
 ```bash title="run_cp2k.sh"
@@ -138,7 +138,7 @@ sbatch run_cp2k.sh
 
     Each GH200 node has 4 modules, each of them composed of a ARM Grace CPU with 72 cores and a H200 GPU directly
     attached to it. Please see [Alps hardware][ref-alps-hardware] for more information.
-    It is important that the number of MPI ranks passed to [slurm] with `--ntasks-per-node` is a multiple of 4.
+    It is important that the number of MPI ranks passed to [Slurm] with `--ntasks-per-node` is a multiple of 4.
 
     ??? note
 
@@ -524,5 +524,5 @@ As a workaround, you can disable CUDA acceleration for the grid backend:
 [OpenBLAS]: http://www.openmathlib.org/OpenBLAS/
 [Intel MKL]: https://www.intel.com/content/www/us/en/developer/tools/oneapi/onemkl.html
 [Cray MPICH]: https://docs.nersc.gov/development/programming-models/mpi/cray-mpich/
-[slurm]: https://slurm.schedmd.com/
+[Slurm]: https://slurm.schedmd.com/
 [CUDA MPS]: https://docs.nvidia.com/deploy/mps/index.html
@@ -213,7 +213,7 @@ This is very useful for interactive sessions, for example if you want to work in
     $ make -j
 
     # run the affinity executable on two nodes - note how the uenv is
-    # automatically loaded by slurm on the compute nodes, because CUDA and MPI from
+    # automatically loaded by Slurm on the compute nodes, because CUDA and MPI from
     # the uenv are required to run.
     $ srun -n2 -N2 ./affinity.cuda
     GPU affinity test for 2 MPI ranks