You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/cyclecloud/slurm.md
+32-36Lines changed: 32 additions & 36 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -8,15 +8,15 @@ ms.author: adjohnso
8
8
9
9
# Slurm
10
10
11
-
[//]: #(Need to link to the scheduler README on Github)
11
+
[//]: #(Need to link to the scheduler README on GitHub)
12
12
13
-
Slurm is a highly configurable open source workload manager. See the [Slurm project site](https://www.schedmd.com/) for an overview.
13
+
Slurm is a highly configurable open source workload manager. For more information, see the overview [Slurm project site](https://www.schedmd.com/).
14
14
15
15
> [!NOTE]
16
-
> As of CycleCloud 8.4.0, the Slurm integration has been rewritten to support new features and functionality. See the [Slurm 3.0](slurm-3.md) documentation for more information.
16
+
> Starting with CycleCloud 8.4.0, the Slurm integration was rewritten to support new features and functionality. For more information, see [Slurm 3.0](slurm-3.md) documentation.
17
17
18
18
::: moniker range="=cyclecloud-7"
19
-
Slurm can easily be enabled on a CycleCloud cluster by modifying the "run_list" in the configuration section of your cluster definition. The two basic components of a Slurm cluster are the 'master' (or 'scheduler') node which provides a shared filesystem on which the Slurm software runs, and the 'execute' nodes which are the hosts that mount the shared filesystem and execute the jobs submitted. For example, a simple cluster template snippet may look like:
19
+
Slurm can easily be enabled on a CycleCloud cluster by modifying the "run_list", in the configuration section of your cluster definition. A Slurm cluster has two main parts: the master (or scheduler) node, which runs the Slurm software on a shared file system, and the execute nodes, which mount that file system and run the submitted jobs. For example, a simple cluster template snippet may look like:
20
20
21
21
```ini
22
22
[cluster custom-slurm]
@@ -78,11 +78,11 @@ Slurm can easily be enabled on a CycleCloud cluster by modifying the "run_list"
78
78
::: moniker-end
79
79
## Editing Existing Slurm Clusters
80
80
81
-
Slurm clusters running in CycleCloud versions 7.8 and later implement an updated version of the autoscaling APIs that allows the clusters to utilize multiple nodearrays and partitions. To facilitate this functionality in Slurm, CycleCloud pre-populates the execute nodes in the cluster. Because of this, you need to run a command on the Slurm scheduler node after making any changes to the cluster, such as autoscale limits or VM types.
81
+
Slurm clusters running in CycleCloud versions 7.8 and later implement an updated version of the autoscaling APIs that allows the clusters to utilize multiple nodearrays and partitions. To facilitate this functionality in Slurm, CycleCloud prepopulates the execute nodes in the cluster. Because of the prepopulation, you need to run a command on the Slurm scheduler node after making any changes to the cluster, such as autoscale limits or VM types.
82
82
83
83
### Making Cluster Changes
84
84
85
-
The Slurm cluster deployed in CycleCloud contains a script that facilitates this. After making any changes to the cluster, run the following as root (e.g., by running `sudo -i`) on the Slurm scheduler node to rebuild the `slurm.conf` and update the nodes in the cluster:
85
+
The Slurm cluster deployed in CycleCloud contains a script that facilitates the changes. After making any changes to the cluster, run the next as root (For example, by running `sudo -i`) on the Slurm scheduler node to rebuild the `slurm.conf` and update the nodes in the cluster:
86
86
87
87
::: moniker range="=cyclecloud-7"
88
88
@@ -92,12 +92,12 @@ The Slurm cluster deployed in CycleCloud contains a script that facilitates this
92
92
```
93
93
94
94
> [!NOTE]
95
-
> For CycleCloud versions < 7.9.10, the `cyclecloud_slurm.sh` script is located in _/opt/cycle/jetpack/system/bootstrap/slurm_.
95
+
> For CycleCloud versions prior to 7.9.10, the `cyclecloud_slurm.sh` script is located in _/opt/cycle/jetpack/system/bootstrap/slurm_.
96
96
97
97
> [!IMPORTANT]
98
98
> If you make any changes that affect the VMs for nodes in an MPI partition (such as VM size, image, or cloud-init), the nodes **must** all be terminated first.
99
-
> The `remove_nodes` command prints a warning in this case, but it does not exit with an error.
100
-
> If there are running nodes, you will get an error of `This node does not match existing scaleset attribute` when new nodes are started.
99
+
> The `remove_nodes` command prints a warning in this case, but it doesn't exit with an error.
100
+
> If there're running nodes, you get an error of `This node doesn't match existing scaleset attribute` when new nodes are started.
101
101
102
102
::: moniker-end
103
103
@@ -110,26 +110,23 @@ The Slurm cluster deployed in CycleCloud contains a script that facilitates this
110
110
> [!NOTE]
111
111
> For CycleCloud versions < 8.2, the `cyclecloud_slurm.sh` script is located in _/opt/cycle/jetpack/system/bootstrap/slurm_.
112
112
113
-
If you make changes that affect the VMs for nodes in an MPI partition (such as VM size, image, or cloud-init), and the nodes are running, you will get an error of `This node does not match existing scaleset attribute` when new nodes are started. For this reason, the `apply_changes` command makes sure the nodes are terminated, and fails with the following error message if not: _The following nodes must be fully terminated before applying changes_.
113
+
If you make changes that affect the VMs for nodes in an MPI partition (such as VM size, image, or cloud-init), and the nodes are running, you get an error `This node doesn't match existing scaleset attribute` when new nodes are started. For this reason, the `apply_changes` command makes sure the nodes are terminated, and fails with this error message if not: _The following nodes must be fully terminated before applying changes_.
114
114
115
-
If you are making a change that does NOT affect the VM properties for MPI nodes, you do not need to terminate running nodes first.
116
-
In this case, you can make the changes by using the following two commands:
115
+
If you're making a change that does NOT affect the VM properties for MPI nodes, you don't need to terminate running nodes first. In this case, you can make the changes by using these two commands:
117
116
118
117
```bash
119
118
/opt/cycle/slurm/cyclecloud_slurm.sh remove_nodes
120
119
/opt/cycle/slurm/cyclecloud_slurm.sh scale
121
120
```
122
121
123
122
> [!NOTE]
124
-
> The `apply_changes` command only exists in CycleCloud 8.3+, so the only
125
-
> way to make a change in earlier versions is with the above `remove_nodes` + `scale` commands.
126
-
> Make sure that the `remove_nodes` command does not print a warning about nodes that need to be terminated.
123
+
> The `apply_changes` command only exists in CycleCloud 8.3+, so the only way to make a change in earlier versions is with the overhead `remove_nodes` + `scale` commands. Ensure that the `remove_nodes` command doesn't print a warning about nodes that need to be terminated.
127
124
128
125
::: moniker-end
129
126
130
-
### Creating additional partitions
127
+
### Creating supplemental partitions
131
128
132
-
The default template that ships with Azure CycleCloud has two partitions (`hpc` and `htc`), and you can define custom nodearrays that map directly to Slurm partitions. For example, to create a GPU partition, add the following section to your cluster template:
129
+
The default template that ships with Azure CycleCloud has two partitions (`hpc` and `htc`), and you can define custom nodearrays that map directly to Slurm partitions. For example, to create a GPU partition, add the next section to your cluster template:
133
130
134
131
```ini
135
132
[[nodearray gpu]]
@@ -151,27 +148,27 @@ The default template that ships with Azure CycleCloud has two partitions (`hpc`
151
148
152
149
### Memory settings
153
150
154
-
CycleCloud automatically sets the amount of available memory for Slurm to use for scheduling purposes. Because the amount of available memory can change slightly due to different Linux kernel options, and the OS and VM can use up a small amount of memory that would otherwise be available for jobs, CycleCloud automatically reduces the amount of memory in the Slurm configuration. By default, CycleCloud holds back 5% of the reported available memory in a VM, but this value can be overridden in the cluster template by setting `slurm.dampen_memory` to the percentage of memory to hold back. For example, to hold back 20% of a VM's memory:
151
+
CycleCloud automatically sets the amount of available memory for Slurm to use for scheduling purposes. Because available memory can vary slightly due to Linux kernel options, and the OS and VM use a small amount of memory, CycleCloud reduces the memory value in the Slurm configuration automatically. By default, CycleCloud holds back 5% of the reported available memory in a VM, but this value can be overridden in the cluster template by setting `slurm.dampen_memory` to the percentage of memory to hold back. For example, to hold back 20% of a VM's memory:
155
152
156
153
```ini
157
154
slurm.dampen_memory=20
158
155
```
159
156
160
157
## Disabling autoscale for specific nodes or partitions
161
158
162
-
While the built-in CycleCloud "KeepAlive" feature does not currently work for Slurm clusters, it is possible to disable autoscale for a running Slurm cluster by editing the slurm.conf file directly. You can exclude either individual nodes or entire partitions from being autoscaled.
159
+
While the built-in CycleCloud "KeepAlive" feature doesn't currently work for Slurm clusters, it's possible to disable autoscale for a running Slurm cluster by editing the slurm.conf file directly. You can exclude either individual nodes or entire partitions from being autoscaled.
163
160
164
161
### Excluding a node
165
162
166
-
To exclude a node or multiple nodes from autoscale, add `SuspendExcNodes=<listofnodes>` to the Slurm configuration file. For example, to exclude nodes 1 and 2 from the hpc partition, add the following to `/sched/slurm.conf`:
163
+
To exclude a node or multiple nodes from autoscale, add `SuspendExcNodes=<listofnodes>` to the Slurm configuration file. For example, to exclude nodes 1 and 2 from the `hpc` partition, add the next to `/sched/slurm.conf`:
167
164
168
165
```bash
169
166
SuspendExcNodes=hpc-pg0-[1-2]
170
167
```
171
168
172
169
Then restart the `slurmctld` service for the new configuration to take effect.
173
170
### Excluding a partition
174
-
Excluding entire partitions from autoscale is similar to excluding nodes. To exclude the entire `hpc` partition, add the following to `/sched/slurm.conf`
171
+
Excluding entire partitions from autoscale is similar to excluding nodes. To exclude the entire `hpc` partition, add the next to `/sched/slurm.conf`
175
172
176
173
```bash
177
174
SuspendExcParts=hpc
@@ -181,9 +178,9 @@ Then restart the `slurmctld` service.
181
178
182
179
## Troubleshooting
183
180
184
-
### UID conflicts for Slurm and Munge users
181
+
### UID conflicts for Slurm and munge users
185
182
186
-
By default, this project uses a UID and GID of 11100 for the Slurm user and 11101 for the Munge user. If this causes a conflict with another user or group, these defaults may be overridden.
183
+
By default, this project uses a UID and GID of 11100 for the Slurm user and 11101 for the munge user. If this causes a conflict with another user or group, these defaults may be overridden.
187
184
188
185
To override the UID and GID, click the edit button for both the `scheduler` node:
189
186
@@ -198,8 +195,7 @@ To override the UID and GID, click the edit button for both the `scheduler` node
@@ -213,33 +209,33 @@ And the `execute` nodearray:
213
209
214
210
### Autoscale
215
211
216
-
CycleCloud uses Slurm's [Elastic Computing](https://slurm.schedmd.com/elastic_computing.html) feature. To debug autoscale issues, there are a few logs on the scheduler node you can check. The first is making sure that the power save resume calls are being made by checking `/var/log/slurmctld/slurmctld.log`. You should see lines like:
212
+
CycleCloud uses Slurm's [Elastic Computing](https://slurm.schedmd.com/elastic_computing.html) feature. To debug autoscale issues, there're a few logs on the scheduler node you can check. The first is making sure that the power save resume calls are being made by checking `/var/log/slurmctld/slurmctld.log`. You should see lines like:
The other log to check is `/var/log/slurmctld/resume.log`. If the resume step is failing, there will also be a `/var/log/slurmctld/resume_fail.log`. If there are messages about unknown or invalid node names, make sure you haven't added nodes to the cluster without following the steps in the "Making Cluster Changes" section above.
218
+
The other log to check is `/var/log/slurmctld/resume.log`. If the resume step is failing, there is `/var/log/slurmctld/resume_fail.log`. If there're messages about unknown or invalid node names, make sure you haven't added nodes to the cluster without next the steps in the "Making Cluster Changes" section above.
223
219
224
220
## Slurm Configuration Reference
225
221
226
-
The following are the Slurm specific configuration options you can toggle to customize functionality:
222
+
The next are the Slurm specific configuration options you can toggle to customize functionality:
227
223
228
224
| Slurm Specific Configuration Options | Description |
| slurm.version | Default: '18.08.7-1'. This is the Slurm version to install and run. This is currently the default and *only* option. In the future additional versions of the Slurm software may be supported. |
231
-
| slurm.autoscale | Default: 'false'. This is a per-nodearray setting that controls whether Slurm should automatically stop and start nodes in this nodearray. |
232
-
| slurm.hpc | Default: 'true'. This is a per-nodearray setting that controls whether nodes in the nodearray will be placed in the same placement group. Primarily used for nodearrays using VM families with InfiniBand. It only applies when slurm.autoscale is set to 'true'. |
233
-
| slurm.default_partition | Default: 'false'. This is a per-nodearray setting that controls whether the nodearray should be the default partition for jobs that don't request a partition explicitly. |
226
+
| slurm.version | Default: '18.08.7-1'. The Slurm version to install and run. This is currently the default and *only* option. In the future more versions of the Slurm software may be supported. |
227
+
| slurm.autoscale | Default: 'false'. A per-nodearray setting that controls whether Slurm should automatically stop and start nodes in this nodearray. |
228
+
| slurm.hpc | Default: 'true'.A per-nodearray setting that controls whether nodes in the nodearray will be placed in the same placement group. Primarily used for nodearrays using VM families with InfiniBand. It only applies when slurm.autoscale is set to 'true'. |
229
+
| slurm.default_partition | Default: 'false'. A per-nodearray setting that controls whether the nodearray should be the default partition for jobs that don't request a partition explicitly. |
234
230
| slurm.dampen_memory | Default: '5'. The percentage of memory to hold back for OS/VM overhead. |
235
231
| slurm.suspend_timeout | Default: '600'. The amount of time (in seconds) between a suspend call and when that node can be used again. |
236
232
| slurm.resume_timeout | Default: '1800'. The amount of time (in seconds) to wait for a node to successfully boot. |
237
-
| slurm.install | Default: 'true'. Determines if Slurm is installed at node boot ('true'). If Slurm is installed in a custom image this should be set to 'false'. (proj version 2.5.0+) |
238
-
| slurm.use_pcpu | Default: 'true'. This is a per-nodearray setting to control scheduling with hyperthreaded vcpus. Set to 'false' to set CPUs=vcpus in cyclecloud.conf. |
239
-
| slurm.user.name | Default: 'slurm'. This is the username for the Slurm service to use. |
233
+
| slurm.install | Default: 'true'. Determines if the Slurm is installed at node boot ('true'). If Slurm is installed in a custom image this should be set to 'false'(proj version 2.5.0+).|
234
+
| slurm.use_pcpu | Default: 'true'. A per-nodearray setting to control scheduling with hyperthreaded vcpus. Set to 'false' to set CPUs=vcpus in cyclecloud.conf. |
235
+
| slurm.user.name | Default: 'slurm'. The username for the Slurm service to use. |
240
236
| slurm.user.uid | Default: '11100'. The User ID to use for the Slurm user. |
241
237
| slurm.user.gid | Default: '11100'. The Group ID to use for the Slurm user. |
242
-
| munge.user.name | Default: 'munge'. This is the username for the MUNGE authentication service to use. |
238
+
| munge.user.name | Default: 'munge'. The username for the MUNGE authentication service to use. |
243
239
| munge.user.uid | Default: '11101'. The User ID to use for the MUNGE user. |
244
240
| munge.user.gid | Default: '11101'. The Group ID to use for the MUNGE user. |
0 commit comments