Fixed the Acrolinx score

Padmalathas · web-flow · commit e1d6d337ac64 · 2025-06-13T17:25:41.000-07:00
diff --git a/articles/cyclecloud/slurm.md b/articles/cyclecloud/slurm.md
@@ -8,15 +8,15 @@ ms.author: adjohnso
 
 # Slurm
 
-[//]: # (Need to link to the scheduler README on Github)
+[//]: # (Need to link to the scheduler README on GitHub)
 
-Slurm is a highly configurable open source workload manager. See the [Slurm project site](https://www.schedmd.com/) for an overview.
+Slurm is a highly configurable open source workload manager. For more information, see the overview [Slurm project site](https://www.schedmd.com/).
 
 > [!NOTE]
-> As of CycleCloud 8.4.0, the Slurm integration has been rewritten to support new features and functionality. See the [Slurm 3.0](slurm-3.md) documentation for more information.
+> Starting with CycleCloud 8.4.0, the Slurm integration was rewritten to support new features and functionality. For more information, see [Slurm 3.0](slurm-3.md) documentation.
 
 ::: moniker range="=cyclecloud-7"
-Slurm can easily be enabled on a CycleCloud cluster by modifying the "run_list" in the configuration section of your cluster definition. The two basic components of a Slurm cluster are the 'master' (or 'scheduler') node which provides a shared filesystem on which the Slurm software runs, and the 'execute' nodes which are the hosts that mount the shared filesystem and execute the jobs submitted. For example, a simple cluster template snippet may look like:
+Slurm can easily be enabled on a CycleCloud cluster by modifying the "run_list", in the configuration section of your cluster definition. A Slurm cluster has two main parts: the master (or scheduler) node, which runs the Slurm software on a shared file system, and the execute nodes, which mount that file system and run the submitted jobs. For example, a simple cluster template snippet may look like:
 
 ``` ini
 [cluster custom-slurm]
@@ -78,11 +78,11 @@ Slurm can easily be enabled on a CycleCloud cluster by modifying the "run_list"
 ::: moniker-end
 ## Editing Existing Slurm Clusters
 
-Slurm clusters running in CycleCloud versions 7.8 and later implement an updated version of the autoscaling APIs that allows the clusters to utilize multiple nodearrays and partitions. To facilitate this functionality in Slurm, CycleCloud pre-populates the execute nodes in the cluster. Because of this, you need to run a command on the Slurm scheduler node after making any changes to the cluster, such as autoscale limits or VM types.
+Slurm clusters running in CycleCloud versions 7.8 and later implement an updated version of the autoscaling APIs that allows the clusters to utilize multiple nodearrays and partitions. To facilitate this functionality in Slurm, CycleCloud prepopulates the execute nodes in the cluster. Because of the prepopulation, you need to run a command on the Slurm scheduler node after making any changes to the cluster, such as autoscale limits or VM types.
 
 ### Making Cluster Changes
 
-The Slurm cluster deployed in CycleCloud contains a script that facilitates this. After making any changes to the cluster, run the following as root (e.g., by running `sudo -i`) on the Slurm scheduler node to rebuild the `slurm.conf` and update the nodes in the cluster:
+The Slurm cluster deployed in CycleCloud contains a script that facilitates the changes. After making any changes to the cluster, run the next as root (For example, by running `sudo -i`) on the Slurm scheduler node to rebuild the `slurm.conf` and update the nodes in the cluster:
 
 ::: moniker range="=cyclecloud-7"
 
@@ -92,12 +92,12 @@ The Slurm cluster deployed in CycleCloud contains a script that facilitates this
 ```
 
 > [!NOTE]
-> For CycleCloud versions < 7.9.10, the `cyclecloud_slurm.sh` script is located in _/opt/cycle/jetpack/system/bootstrap/slurm_.
+> For CycleCloud versions prior to 7.9.10, the `cyclecloud_slurm.sh` script is located in _/opt/cycle/jetpack/system/bootstrap/slurm_.
 
 > [!IMPORTANT]
 > If you make any changes that affect the VMs for nodes in an MPI partition (such as VM size, image, or cloud-init), the nodes **must** all be terminated first.
-> The `remove_nodes` command prints a warning in this case, but it does not exit with an error.
-> If there are running nodes, you will get an error of `This node does not match existing scaleset attribute` when new nodes are started.
+> The `remove_nodes` command prints a warning in this case, but it doesn't exit with an error.
+> If there're running nodes, you get an error of `This node doesn't match existing scaleset attribute` when new nodes are started.
 
 ::: moniker-end
 
@@ -110,26 +110,23 @@ The Slurm cluster deployed in CycleCloud contains a script that facilitates this
 > [!NOTE]
 > For CycleCloud versions < 8.2, the `cyclecloud_slurm.sh` script is located in _/opt/cycle/jetpack/system/bootstrap/slurm_.
 
-If you make changes that affect the VMs for nodes in an MPI partition (such as VM size, image, or cloud-init), and the nodes are running, you will get an error of `This node does not match existing scaleset attribute` when new nodes are started. For this reason, the `apply_changes` command makes sure the nodes are terminated, and fails with the following error message if not: _The following nodes must be fully terminated before applying changes_.
+If you make changes that affect the VMs for nodes in an MPI partition (such as VM size, image, or cloud-init), and the nodes are running, you get an error `This node doesn't match existing scaleset attribute` when new nodes are started. For this reason, the `apply_changes` command makes sure the nodes are terminated, and fails with this error message if not: _The following nodes must be fully terminated before applying changes_.
 
-If you are making a change that does NOT affect the VM properties for MPI nodes, you do not need to terminate running nodes first.
-In this case, you can make the changes by using the following two commands:
+If you're making a change that does NOT affect the VM properties for MPI nodes, you don't need to terminate running nodes first. In this case, you can make the changes by using these two commands:
 
 ``` bash
 /opt/cycle/slurm/cyclecloud_slurm.sh remove_nodes
 /opt/cycle/slurm/cyclecloud_slurm.sh scale
 ```
 
 > [!NOTE]
-> The `apply_changes` command only exists in CycleCloud 8.3+, so the only
-> way to make a change in earlier versions is with the above `remove_nodes` + `scale` commands.
-> Make sure that the `remove_nodes` command does not print a warning about nodes that need to be terminated.
+> The `apply_changes` command only exists in CycleCloud 8.3+, so the only way to make a change in earlier versions is with the overhead `remove_nodes` + `scale` commands. Ensure that the `remove_nodes` command doesn't print a warning about nodes that need to be terminated.
 
 ::: moniker-end
 
-### Creating additional partitions
+### Creating supplemental partitions
 
-The default template that ships with Azure CycleCloud has two partitions (`hpc` and `htc`), and you can define custom nodearrays that map directly to Slurm partitions. For example, to create a GPU partition, add the following section to your cluster template:
+The default template that ships with Azure CycleCloud has two partitions (`hpc` and `htc`), and you can define custom nodearrays that map directly to Slurm partitions. For example, to create a GPU partition, add the next section to your cluster template:
 
 ``` ini
     [[nodearray gpu]]
@@ -151,27 +148,27 @@ The default template that ships with Azure CycleCloud has two partitions (`hpc`
 
 ### Memory settings
 
-CycleCloud automatically sets the amount of available memory for Slurm to use for scheduling purposes. Because the amount of available memory can change slightly due to different Linux kernel options, and the OS and VM can use up a small amount of memory that would otherwise be available for jobs, CycleCloud automatically reduces the amount of memory in the Slurm configuration. By default, CycleCloud holds back 5% of the reported available memory in a VM, but this value can be overridden in the cluster template by setting `slurm.dampen_memory` to the percentage of memory to hold back. For example, to hold back 20% of a VM's memory:
+CycleCloud automatically sets the amount of available memory for Slurm to use for scheduling purposes. Because available memory can vary slightly due to Linux kernel options, and the OS and VM use a small amount of memory, CycleCloud reduces the memory value in the Slurm configuration automatically. By default, CycleCloud holds back 5% of the reported available memory in a VM, but this value can be overridden in the cluster template by setting `slurm.dampen_memory` to the percentage of memory to hold back. For example, to hold back 20% of a VM's memory:
 
 ``` ini
     slurm.dampen_memory=20
 ```
 
 ## Disabling autoscale for specific nodes or partitions
 
-While the built-in CycleCloud "KeepAlive" feature does not currently work for Slurm clusters, it is possible to disable autoscale for a running Slurm cluster by editing the slurm.conf file directly. You can exclude either individual nodes or entire partitions from being autoscaled.
+While the built-in CycleCloud "KeepAlive" feature doesn't currently work for Slurm clusters, it's possible to disable autoscale for a running Slurm cluster by editing the slurm.conf file directly. You can exclude either individual nodes or entire partitions from being autoscaled.
 
 ### Excluding a node
 
-To exclude a node or multiple nodes from autoscale, add `SuspendExcNodes=<listofnodes>` to the Slurm configuration file. For example, to exclude nodes 1 and 2 from the hpc partition, add the following to `/sched/slurm.conf`:
+To exclude a node or multiple nodes from autoscale, add `SuspendExcNodes=<listofnodes>` to the Slurm configuration file. For example, to exclude nodes 1 and 2 from the `hpc` partition, add the next to `/sched/slurm.conf`:
 
 ```bash
 SuspendExcNodes=hpc-pg0-[1-2]
 ```
 
 Then restart the `slurmctld` service for the new configuration to take effect.
 ### Excluding a partition
-Excluding entire partitions from autoscale is similar to excluding nodes. To exclude the entire `hpc` partition, add the following to `/sched/slurm.conf`
+Excluding entire partitions from autoscale is similar to excluding nodes. To exclude the entire `hpc` partition, add the next to `/sched/slurm.conf`
 
 ```bash
 SuspendExcParts=hpc
@@ -181,9 +178,9 @@ Then restart the `slurmctld` service.
 
 ## Troubleshooting
 
-### UID conflicts for Slurm and Munge users
+### UID conflicts for Slurm and munge users
 
-By default, this project uses a UID and GID of 11100 for the Slurm user and 11101 for the Munge user. If this causes a conflict with another user or group, these defaults may be overridden.
+By default, this project uses a UID and GID of 11100 for the Slurm user and 11101 for the munge user. If this causes a conflict with another user or group, these defaults may be overridden.
 
 To override the UID and GID, click the edit button for both the `scheduler` node:
 
@@ -198,8 +195,7 @@ To override the UID and GID, click the edit button for both the `scheduler` node
 And the `execute` nodearray:
 ![Edit Nodearray](~/articles/cyclecloud/images/slurmnodearraytab.png "Edit nodearray")
 
- and add the following attributes to the `Configuration` section:
-
+ and add the next attributes to the `Configuration` section:
 ![Edit Configuration](~/articles/cyclecloud/images/slurmnodearrayedit.png "Edit configuration")
 
 ``` ini
@@ -213,33 +209,33 @@ And the `execute` nodearray:
 
 ### Autoscale
 
-CycleCloud uses Slurm's [Elastic Computing](https://slurm.schedmd.com/elastic_computing.html) feature. To debug autoscale issues, there are a few logs on the scheduler node you can check. The first is making sure that the power save resume calls are being made by checking `/var/log/slurmctld/slurmctld.log`. You should see lines like:
+CycleCloud uses Slurm's [Elastic Computing](https://slurm.schedmd.com/elastic_computing.html) feature. To debug autoscale issues, there're a few logs on the scheduler node you can check. The first is making sure that the power save resume calls are being made by checking `/var/log/slurmctld/slurmctld.log`. You should see lines like:
 
 ``` bash
 [2019-12-09T21:19:03.400] power_save: pid 8629 waking nodes htc-1
 ```
 
-The other log to check is `/var/log/slurmctld/resume.log`. If the resume step is failing, there will also be a `/var/log/slurmctld/resume_fail.log`. If there are messages about unknown or invalid node names, make sure you haven't added nodes to the cluster without following the steps in the "Making Cluster Changes" section above.
+The other log to check is `/var/log/slurmctld/resume.log`. If the resume step is failing, there is `/var/log/slurmctld/resume_fail.log`. If there're messages about unknown or invalid node names, make sure you haven't added nodes to the cluster without next the steps in the "Making Cluster Changes" section above.
 
 ## Slurm Configuration Reference
 
-The following are the Slurm specific configuration options you can toggle to customize functionality:
+The next are the Slurm specific configuration options you can toggle to customize functionality:
 
 | Slurm Specific Configuration Options | Description |
 | ------------------------------------ | ----------- |
-| slurm.version                        | Default: '18.08.7-1'. This is the Slurm version to install and run. This is currently the default and *only* option. In the future additional versions of the Slurm software may be supported. |
-| slurm.autoscale                      | Default: 'false'. This is a per-nodearray setting that controls whether Slurm should automatically stop and start nodes in this nodearray. |
-| slurm.hpc                            | Default: 'true'. This is a per-nodearray setting that controls whether nodes in the nodearray will be placed in the same placement group. Primarily used for nodearrays using VM families with InfiniBand. It only applies when slurm.autoscale is set to 'true'. |
-| slurm.default_partition              | Default: 'false'. This is a per-nodearray setting that controls whether the nodearray should be the default partition for jobs that don't request a partition explicitly. |
+| slurm.version                        | Default: '18.08.7-1'. The Slurm version to install and run. This is currently the default and *only* option. In the future more versions of the Slurm software may be supported. |
+| slurm.autoscale                      | Default: 'false'. A per-nodearray setting that controls whether Slurm should automatically stop and start nodes in this nodearray. |
+| slurm.hpc                            | Default: 'true'.A per-nodearray setting that controls whether nodes in the nodearray will be placed in the same placement group. Primarily used for nodearrays using VM families with InfiniBand. It only applies when slurm.autoscale is set to 'true'. |
+| slurm.default_partition              | Default: 'false'. A per-nodearray setting that controls whether the nodearray should be the default partition for jobs that don't request a partition explicitly. |
 | slurm.dampen_memory                  | Default: '5'. The percentage of memory to hold back for OS/VM overhead. |
 | slurm.suspend_timeout                | Default: '600'. The amount of time (in seconds) between a suspend call and when that node can be used again. |
 | slurm.resume_timeout                 | Default: '1800'. The amount of time (in seconds) to wait for a node to successfully boot. |
-| slurm.install                        | Default: 'true'.  Determines if Slurm is installed at node boot ('true').  If Slurm is installed in a custom image this should be set to 'false'.  (proj version 2.5.0+) |
-| slurm.use_pcpu                       | Default: 'true'.  This is a per-nodearray setting to control scheduling with hyperthreaded vcpus.  Set to 'false' to set CPUs=vcpus in cyclecloud.conf. |
-| slurm.user.name                      | Default: 'slurm'. This is the username for the Slurm service to use. |
+| slurm.install                        | Default: 'true'.  Determines if the Slurm is installed at node boot ('true'). If Slurm is installed in a custom image this should be set to 'false' (proj version 2.5.0+). |
+| slurm.use_pcpu                       | Default: 'true'.  A per-nodearray setting to control scheduling with hyperthreaded vcpus. Set to 'false' to set CPUs=vcpus in cyclecloud.conf. |
+| slurm.user.name                      | Default: 'slurm'. The username for the Slurm service to use. |
 | slurm.user.uid                       | Default: '11100'. The User ID to use for the Slurm user. |
 | slurm.user.gid                       | Default: '11100'. The Group ID to use for the Slurm user. |
-| munge.user.name                      | Default: 'munge'. This is the username for the MUNGE authentication service to use. |
+| munge.user.name                      | Default: 'munge'. The username for the MUNGE authentication service to use. |
 | munge.user.uid                       | Default: '11101'. The User ID to use for the MUNGE user. |
 | munge.user.gid                       | Default: '11101'. The Group ID to use for the MUNGE user. |