From 76fdbe7626d643f29bf3956eeff2626550e21015 Mon Sep 17 00:00:00 2001 From: Andreas Fink Date: Tue, 1 Jul 2025 16:53:06 +0200 Subject: [PATCH] slurm -> Slurm --- docs/clusters/bristen.md | 4 ++-- docs/clusters/clariden.md | 4 ++-- docs/running/jobreport.md | 6 +++--- docs/software/communication/openmpi.md | 2 +- docs/software/sciapps/cp2k.md | 6 +++--- docs/software/uenv/index.md | 2 +- 6 files changed, 12 insertions(+), 12 deletions(-) diff --git a/docs/clusters/bristen.md b/docs/clusters/bristen.md index 419a2b71..86ed0f7a 100644 --- a/docs/clusters/bristen.md +++ b/docs/clusters/bristen.md @@ -12,7 +12,7 @@ Bristen consists of 32 A100 nodes [NVIDIA A100 nodes][ref-alps-a100-node]. The n |-----------|--------| ----------------- | ---------- | | [a100][ref-alps-a100-node] | 32 | 32 | 128 | -Nodes are in the [`normal` slurm partition][ref-slurm-partition-normal]. +Nodes are in the [`normal` Slurm partition][ref-slurm-partition-normal]. ### Storage and file systems @@ -48,7 +48,7 @@ Users are encouraged to use containers on Bristen. Bristen uses [Slurm][ref-slurm] as the workload manager, which is used to launch and monitor distributed workloads, such as training runs. -There is currently a single slurm partition on the system: +There is currently a single Slurm partition on the system: * the `normal` partition is for all production workloads. + nodes in this partition are not shared. diff --git a/docs/clusters/clariden.md b/docs/clusters/clariden.md index 9f725b17..75f67cf7 100644 --- a/docs/clusters/clariden.md +++ b/docs/clusters/clariden.md @@ -14,7 +14,7 @@ The number of nodes can change when nodes are added or removed from other cluste |-----------|--------| ----------------- | ---------- | | [gh200][ref-alps-gh200-node] | 1,200 | 4,800 | 4,800 | -Most nodes are in the [`normal` slurm partition][ref-slurm-partition-normal], while a few nodes are in the [`debug` partition][ref-slurm-partition-debug]. +Most nodes are in the [`normal` Slurm partition][ref-slurm-partition-normal], while a few nodes are in the [`debug` partition][ref-slurm-partition-debug]. ### Storage and file systems @@ -71,7 +71,7 @@ Alternatively, [uenv][ref-uenv] are also available on Clariden. Currently deploy Clariden uses [Slurm][ref-slurm] as the workload manager, which is used to launch and monitor distributed workloads, such as training runs. -There are two slurm partitions on the system: +There are two Slurm partitions on the system: * the `normal` partition is for all production workloads. * the `debug` partition can be used to access a small allocation for up to 30 minutes for debugging and testing purposes. diff --git a/docs/running/jobreport.md b/docs/running/jobreport.md index 35f43ce7..bbec239e 100644 --- a/docs/running/jobreport.md +++ b/docs/running/jobreport.md @@ -56,7 +56,7 @@ The report is divided into two parts: a general summary and GPU specific values. | Field | Description | | ----- | ----------- | | Job Id | The Slurm job id | -| Step Id | The slurm step id. A job step in Slurm is a subdivision of a job started with srun | +| Step Id | The Slurm step id. A job step in Slurm is a subdivision of a job started with srun | | User | The user account that submitted the job | | Slurm Account | The project account that will be billed | | Start Time, End Time, Elapsed Time | The time the job started and ended, and how long it ran | @@ -77,7 +77,7 @@ The report is divided into two parts: a general summary and GPU specific values. | SM Utilization % | The percentage of the process's lifetime during which Streaming Multiprocessors (SM) were executing a kernel | | Memory Utilization % | The percentage of process's lifetime during which global (device) memory was being read or written | -## Example with slurm: srun +## Example with Slurm: srun The simplest example to test `jobreport` is to run it with the sleep command. It is important to separate `jobreport` (and its options) and your command with `--`. @@ -155,7 +155,7 @@ GPU Specific Values 4. Uncheck "Set locale environment variables on startup" 5. Quit and reopen the terminal and try again. This should fix the issue. -## Example with slurm: batch script +## Example with Slurm: batch script The `jobreport` command can be used in a batch script The report printing, too, can be included in the script and does not need the `srun` command. diff --git a/docs/software/communication/openmpi.md b/docs/software/communication/openmpi.md index 6a60746b..9c45c0da 100644 --- a/docs/software/communication/openmpi.md +++ b/docs/software/communication/openmpi.md @@ -22,7 +22,7 @@ OpenMPI is provided through a [uenv][ref-uenv] similar to [`prgenv-gnu`][ref-uen Once the uenv is loaded, compiling and linking with OpenMPI and libfabric is transparent. At runtime, some additional options must be set to correctly use the Slingshot network. -First, when launching applications through slurm, [PMIx](https://pmix.github.com) must be used for application launching. +First, when launching applications through Slurm, [PMIx](https://pmix.github.com) must be used for application launching. This is done with the `--mpi` flag of `srun`: ```bash srun --mpi=pmix ... diff --git a/docs/software/sciapps/cp2k.md b/docs/software/sciapps/cp2k.md index 1dd51bf6..b77aa7c4 100644 --- a/docs/software/sciapps/cp2k.md +++ b/docs/software/sciapps/cp2k.md @@ -65,7 +65,7 @@ On our systems, CP2K is built with the following dependencies: ### Running on the HPC platform -To start a job, two bash scripts are potentially required: a [slurm] submission script, and a wrapper to start the [CUDA +To start a job, two bash scripts are potentially required: a [Slurm] submission script, and a wrapper to start the [CUDA MPS] daemon so that multiple MPI ranks can use the same GPU. ```bash title="run_cp2k.sh" @@ -138,7 +138,7 @@ sbatch run_cp2k.sh Each GH200 node has 4 modules, each of them composed of a ARM Grace CPU with 72 cores and a H200 GPU directly attached to it. Please see [Alps hardware][ref-alps-hardware] for more information. - It is important that the number of MPI ranks passed to [slurm] with `--ntasks-per-node` is a multiple of 4. + It is important that the number of MPI ranks passed to [Slurm] with `--ntasks-per-node` is a multiple of 4. ??? note @@ -524,5 +524,5 @@ As a workaround, you can disable CUDA acceleration for the grid backend: [OpenBLAS]: http://www.openmathlib.org/OpenBLAS/ [Intel MKL]: https://www.intel.com/content/www/us/en/developer/tools/oneapi/onemkl.html [Cray MPICH]: https://docs.nersc.gov/development/programming-models/mpi/cray-mpich/ -[slurm]: https://slurm.schedmd.com/ +[Slurm]: https://slurm.schedmd.com/ [CUDA MPS]: https://docs.nvidia.com/deploy/mps/index.html diff --git a/docs/software/uenv/index.md b/docs/software/uenv/index.md index 68393d20..205da87c 100644 --- a/docs/software/uenv/index.md +++ b/docs/software/uenv/index.md @@ -213,7 +213,7 @@ This is very useful for interactive sessions, for example if you want to work in $ make -j # run the affinity executable on two nodes - note how the uenv is - # automatically loaded by slurm on the compute nodes, because CUDA and MPI from + # automatically loaded by Slurm on the compute nodes, because CUDA and MPI from # the uenv are required to run. $ srun -n2 -N2 ./affinity.cuda GPU affinity test for 2 MPI ranks