You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Microsoft is responsible for the end-to-end servicing lifecycle for the Azure CycleCloud software release and update packages, which can be downloaded directly from Microsoft. We recommend Azure CycleCloud Operators set maintenance windows when installing releases. Releases update your Azure CycleCloud installation's version.
21
+
Microsoft manages the end-to-end servicing lifecycle for the Azure CycleCloud software release and update packages. You can download these packages directly from Microsoft. We recommend that Azure CycleCloud operators set maintenance windows when installing releases. Each release updates your Azure CycleCloud installation's version.
22
22
23
-
There are three types of Azure CycleCloud release: major, minor, and hotfix.
23
+
There are three types of Azure CycleCloud releases: major, minor, and hotfix.
24
24
25
-
***Major Release**: These packages include the latest Azure CycleCloud features and functionality. Although we attempt to minimize any breaking changes between major versions, we do not guarantee backwards compatibility between major releases. We will make efforts to call out any relevant warnings or details for upgrading.
26
-
***Minor Update**: These packages can include the latest Azure CycleCloud security updates, bug fixes, and feature updates. We guarantee backwards compatibility within all the minor releases for a given major release.
27
-
***Hotfix**: Occasionally, Microsoft provides Azure CycleCloud hotfixes that address a specific issue or issues that are often preventative or time-sensitive. A separate hotfix may be provided for each supported version of Azure CycleCloud as appropriate. Each fix for a specific iteration is cumulative and includes the previous updates for that same version.
25
+
***Major release**: These packages include the latest Azure CycleCloud features and functionality. Although we attempt to minimize any breaking changes between major versions, we don't guarantee backwards compatibility between major releases. We make efforts to call out any relevant warnings or details for upgrading.
26
+
***Minor update**: These packages include the latest Azure CycleCloud security updates, bug fixes, and feature updates. We guarantee backwards compatibility within all the minor releases for a given major release.
27
+
***Hotfix**: Occasionally, Microsoft provides Azure CycleCloud hotfixes that address a specific issue or issues that are often preventative or time-sensitive. Microsoft might provide a separate hotfix for each supported version of Azure CycleCloud. Each fix for a specific version is cumulative and includes previous updates for that same version.
28
28
29
29
For more information about a specific release, see the [product documentation](/azure/cyclecloud/release-notes).
30
30
31
-
## Azure CycleCloud Release Cadence
31
+
## Azure CycleCloud release cadence
32
32
33
-
Microsoft expects to release Azure CycleCloud updates on a monthly cadence. However, it's possible to have multiple, or no update releases in a month.
33
+
Microsoft plans to release Azure CycleCloud updates every month. However, some months might have multiple updates or no updates.
34
34
35
-
For information about a specific update see the release notes for that update in our [product documentation](/azure/cyclecloud/release-notes).
35
+
For information about a specific update, see the release notes for that update in our [product documentation](/azure/cyclecloud/release-notes).
36
36
37
-
## Supported Azure CycleCloud Versions
37
+
## Supported Azure CycleCloud versions
38
38
39
-
To continue to receive support, you must keep your Azure CycleCloud deployment current to either the previous or latest major releases, running any minor update version for either major release.
39
+
To continue receiving support, keep your Azure CycleCloud deployment current with either the previous or latest major releases. You can run any minor update version for either major release.
40
40
41
-
If your Azure CycleCloud installation is behind bymore than two major release updates, it's considered out of compliance and must be updated to at least the minimum supported version.
41
+
If your Azure CycleCloud installation is behind bymore than two major release updates, it's considered out of compliance. You must update it to at least the minimum supported version.
42
42
43
-
For example, if the previous and current major releases are versions 6 and 7, and the three most recent update versions are:
43
+
For example, if the previous and current major releases are versions 6 and 7, the three most recent update versions are:
44
44
45
45
* Version 6: 6.8.0, 6.7.0, 6.6.0
46
46
* Version 7: 7.2.0, 7.1.1, 7.0.0
47
47
48
-
In this example, all versions listed above would be supported, but 5.x.x - and all prior versions - would be out of support. Additionally, when version 8.x.x is released, version 6.x would transition to legacy/unsupported.
48
+
In this example, all versions listed earlier are supported, but versions 5.x.x and earlier versions are out of support. When version 8.x.x is released, version 6.x.x transitions to legacyunsupported.
49
49
50
-
Azure CycleCloud software update packages are cumulative. If you decide to defer one or more updates, you may install the latest update package to become compliant.
50
+
Azure CycleCloud software update packages are cumulative. If you defer one or more updates, you can install the latest update package to become compliant.
51
51
52
-
## Keeping Your System Supported
52
+
## Keep your system supported
53
53
54
-
See product documentation to learn how to upgrade running deployments to a supported version: [Upgrade Azure CycleCloud](~/articles/cyclecloud/how-to/upgrade-and-migrate.md).
54
+
To learn how to upgrade running deployments to a supported version, see the product documentation: [Upgrade Azure CycleCloud](~/articles/cyclecloud/how-to/upgrade-and-migrate.md).
55
55
56
-
## Get Support
56
+
## Get support
57
57
58
58
Azure CycleCloud follows the same support process as Azure. Enterprise customers can follow the process described in [How to Create an Azure Support Request](/azure/azure-supportability/how-to-create-azure-support-request). For more information, see the [Azure Support FAQs](https://azure.microsoft.com/support/faq/).
Copy file name to clipboardExpand all lines: articles/cyclecloud/slurm-3.md
+41-41Lines changed: 41 additions & 41 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -2,43 +2,43 @@
2
2
title: Slurm Scheduler Integration version 3.0
3
3
description: New CycleCloud Slurm 3.0+ functionality.
4
4
author: anhoward
5
-
ms.date: 06/13/2025
5
+
ms.date: 07/01/2025
6
6
ms.author: anhoward
7
7
---
8
8
9
9
# CycleCloud Slurm 3.0
10
10
11
-
Slurm scheduler support was rewritten as part of the CycleCloud 8.4.0 release. Key features include:
11
+
We rewrote the Slurm scheduler support as part of the CycleCloud 8.4.0 release. Key features include:
12
12
13
-
* Support for dynamic nodes, and dynamic partitions via dynamic nodearays, supporting both single and multiple virtual machine (VM) sizes
14
-
* New Slurm versions 23.02 and 22.05.8
15
-
* Cost reporting via `azslurm` CLI
16
-
*`azslurm`cli based autoscaler
17
-
* Ubuntu 20 support
18
-
* Removed need for topology plugin, and therefore also any submit plugin
13
+
* Support for dynamic nodes and dynamic partitions through dynamic node arrays. This feature supports both single and multiple virtual machine (VM) sizes.
14
+
* New Slurm versions 23.02 and 22.05.8.
15
+
* Cost reporting through the `azslurm` CLI.
16
+
*`azslurm`CLI-based autoscaler.
17
+
* Ubuntu 20 support.
18
+
* Removed need for topology plugin and any associated submit plugin.
19
19
20
-
## Slurm Clusters in CycleCloud versions < 8.4.0
20
+
## Slurm Clusters in CycleCloud versions earlier than 8.4.0
21
21
22
22
For more information, see [Transitioning from 2.7 to 3.0](#transitioning-from-27-to-30).
23
23
24
-
### Making Cluster Changes
24
+
### Making cluster changes
25
25
26
-
The Slurm cluster deployed in CycleCloud contains a cli called `azslurm` to facilitate making changes to the cluster. After making any changes to the cluster, run the following command as root on the Slurm scheduler node to rebuild the `azure.conf` and update the nodes in the cluster:
26
+
The Slurm cluster that you deploy in CycleCloud includes a CLI called `azslurm` to help you make changes to the cluster. After making any changes to the cluster, run the following command as root on the Slurm scheduler node to rebuild the `azure.conf` and update the nodes in the cluster:
27
27
28
28
```bash
29
29
$ sudo -i
30
30
# azslurm scale
31
31
```
32
32
33
-
The command creates the partitions with the correct number of nodes, the proper `gres.conf` and restart the `slurmctld`.
33
+
The command creates the partitions with the correct number of nodes, sets up the proper `gres.conf`, and restarts the `slurmctld`.
34
34
35
-
### No longer precreating execute nodes
35
+
### Nodes aren't precreated anymore
36
36
37
-
Starting CycleCloud version 3.0.0 Slurm project, the nodes aren't precreating. Nodes are created when `azslurm resume`is invoked, or by manually creating them in CycleCloud using CLI.
37
+
Starting with CycleCloud version 3.0.0 Slurm project, the nodes aren't precreated. You create nodes when you invoke `azslurm resume` or when you manually create them in CycleCloud using the CLI.
38
38
39
39
### Creating extra partitions
40
40
41
-
The default template that ships with Azure CycleCloud has three partitions (`hpc`, `htc` and `dynamic`), and you can define custom nodearrays that map directly to Slurm partitions. For example, to create a GPU partition, add the following section to your cluster template:
41
+
The default template that ships with Azure CycleCloud has three partitions (`hpc`, `htc`, and `dynamic`), and you can define custom node arrays that map directly to Slurm partitions. For example, to create a GPU partition, add the following section to your cluster template:
42
42
43
43
```ini
44
44
[[nodearray gpu]]
@@ -58,9 +58,9 @@ The default template that ships with Azure CycleCloud has three partitions (`hpc
58
58
AssociatePublicIpAddress = $ExecuteNodesPublic
59
59
```
60
60
61
-
### Dynamic Partitions
61
+
### Dynamic partitions
62
62
63
-
Starting CycleCloud version 3.0.1, we support dynamic partitions. You can make a `nodearray` map to a dynamic partition by adding the following. The `myfeature`could be any desired feature description or more than one feature, separated by a comma.
63
+
Starting with CycleCloud version 3.0.1, the solution supports dynamic partitions. You can make a `nodearray` map to a dynamic partition by adding the following code. The `myfeature`value can be any desired feature description or more than one feature, separated by a comma.
64
64
65
65
```ini
66
66
[[[configuration]]]
@@ -71,37 +71,37 @@ Starting CycleCloud version 3.0.1, we support dynamic partitions. You can make a
The shared code snip generates a dynamic partition like the following
74
+
The shared code snippet generates a dynamic partition like the following code:
75
75
76
76
```ini
77
77
# Creating dynamic nodeset and partition using slurm.dynamic_config=-Z --conf "Feature=myfeature"
78
78
Nodeset=mydynamicns Feature=myfeature
79
79
PartitionName=mydynamicpart Nodes=mydynamicns
80
80
```
81
81
82
-
### Using Dynamic Partitions to Autoscale
82
+
### Using dynamic partitions to autoscale
83
83
84
-
By default, dynamic partition doesn't include any nodes. You can start nodes through CycleCloud or by running `azslurm resume` manually, they join the cluster using the name you choose. However, since Slurm isn't aware of these nodes ahead of time, it can't autoscale them up.
84
+
By default, a dynamic partition doesn't include any nodes. You can start nodes through CycleCloud or by running `azslurm resume` manually. The nodes join the cluster using the name you choose. However, since Slurm isn't aware of these nodes ahead of time, it can't autoscale them.
85
85
86
-
Instead, you can also precreate node records like so, which allows Slurm to autoscale them up.
86
+
Instead, you can precreate node records like so, which allows Slurm to autoscale them.
Either way, once you create these nodes in a `State=Cloud` they become available for autoscaling like other nodes.
102
+
Either way, when you create these nodes in a `State=Cloud` state, they become available for autoscaling like other nodes.
103
103
104
-
To support **multiple VM sizes in a CycleCloud nodearray**, you can alter the template to allow multiple VM sizes by adding `Config.Mutiselect = true`.
104
+
To support **multiple VM sizes in a CycleCloud node array**, you can change the template to allow multiple VM sizes by adding `Config.Mutiselect = true`.
105
105
106
106
```ini
107
107
[[[parameter DynamicMachineType]]]
@@ -112,34 +112,34 @@ To support **multiple VM sizes in a CycleCloud nodearray**, you can alter the te
112
112
Config.Multiselect = true
113
113
```
114
114
115
-
### Dynamic Scale down
115
+
### Dynamic scale down
116
116
117
-
By default, all nodes in the dynamic partition scales down just like the other partitions. To disable dynamic partition, see [SuspendExcParts](https://slurm.schedmd.com/slurm.conf.html).
117
+
By default, all nodes in the dynamic partition scale down just like the other partitions. To disable dynamic partition, see [SuspendExcParts](https://slurm.schedmd.com/slurm.conf.html).
118
118
119
119
### Manual scaling
120
120
121
-
If cyclecloud_slurm detects that autoscale is disabled (SuspendTime=-1), it uses the FUTURE state to denote nodes that're powered down instead of relying on the power state in Slurm. That is, when autoscale is enabled, off nodes are denoted as `idle~` in sinfo. When autoscaling is off, the inactive nodes don’t show up in sinfo. You can still see their definition with `scontrol show nodes --future`.
121
+
If cyclecloud_slurm detects that autoscale is disabled (SuspendTime=-1), it uses the FUTURE state to denote nodes that are powered down instead of relying on the power state in Slurm. When autoscale is enabled, `sinfo` shows off nodes as `idle~`. When autoscale is disabled, `sinfo` doesn't show inactive nodes. You can still see their definition with `scontrol show nodes --future`.
122
122
123
-
To start new nodes, run `/opt/azurehpc/slurm/resume_program.sh node_list` (for example, htc-[1-10]).
123
+
To start new nodes, run `/opt/azurehpc/slurm/resume_program.sh node_list` (for example, `htc-[1-10]`).
124
124
125
-
To shutdown nodes, run `/opt/azurehpc/slurm/suspend_program.sh node_list` (for example, htc-[1-10]).
125
+
To shut down nodes, run `/opt/azurehpc/slurm/suspend_program.sh node_list` (for example, `htc-[1-10]`).
126
126
127
-
To start a cluster in this mode, simply add `SuspendTime=-1` to the supplemental Slurm config in the template.
127
+
To start a cluster in this mode, add `SuspendTime=-1` to the supplemental Slurm config in the template.
128
128
129
-
To switch a cluster to this mode, add `SuspendTime=-1` to the slurm.confand run `scontrol reconfigure`. Then run `azslurm remove_nodes && azslurm scale`.
129
+
To switch a cluster to this mode, add `SuspendTime=-1` to the `slurm.conf` file and run `scontrol reconfigure`. Then run `azslurm remove_nodes` and `azslurm scale`.
130
130
131
131
## Troubleshooting
132
132
133
133
### Transitioning from 2.7 to 3.0
134
134
135
-
1. The installation folder changed
135
+
1. The installation folder changed from
136
136
`/opt/cycle/slurm`
137
-
->
138
-
`/opt/azurehpc/slurm`
137
+
to
138
+
`/opt/azurehpc/slurm`.
139
139
140
-
2. Autoscale logs are now in `/opt/azurehpc/slurm/logs` instead of `/var/log/slurmctld`. Note, that `slurmctld.log` is in this folder.
140
+
1. Autoscale logs are now in `/opt/azurehpc/slurm/logs` instead of `/var/log/slurmctld`. The `slurmctld.log` file is in this folder.
141
141
142
-
3. The `cyclecloud_slurm.sh` script no longer available. A new CLI tool called `azslurm`replaced`cyclecloud_slurm.sh`, and you can be run as root. `azslurm` also supports autocomplete.
142
+
1. The `cyclecloud_slurm.sh` script is no longer available. A new CLI tool called `azslurm`replaces`cyclecloud_slurm.sh`. You run `azslurm`as root, and it supports autocomplete.
143
143
144
144
```bash
145
145
[root@scheduler ~]# azslurm
@@ -167,10 +167,10 @@ To switch a cluster to this mode, add `SuspendTime=-1` to the slurm.conf and run
167
167
wait_for_resume - Wait for a set of nodes to converge.
168
168
```
169
169
170
-
5. CycleCloud no longer creates nodes ahead of time. It only creates them when they're needed.
170
+
1. CycleCloud doesn't create nodes ahead of time. It only creates nodes when you need them.
171
171
172
-
6. All Slurm binaries are inside the `azure-slurm-install-pkg*.tar.gz` file, under `slurm-pkgs`. They're pulled from a specific binary release. The current binary release is [4.0.0](https://github.com/Azure/cyclecloud-slurm/releases/tag/4.0.0)
172
+
6. All Slurm binaries are inside the `azure-slurm-install-pkg*.tar.gz` file, under `slurm-pkgs`. They're pulled from a specific binary release. The current binary release is [4.0.0](https://github.com/Azure/cyclecloud-slurm/releases/tag/4.0.0).
173
173
174
-
7. For MPI jobs, the only default network boundary is the partition. Unlike version 2.x, each pertition doesn't include multiple "placement groups". So you only have one colocated VMSS per partition. There's no need for the topology plugin anymore, so the job submission plugin isn't needed either. Instead, submitting to multiple partitions is the recommended option for use cases that require jobs submission to multiple placement groups.
174
+
7. For MPI jobs, the only default network boundary is the partition. Unlike version 2.x, each partition doesn't include multiple "placement groups". So you only have one colocated Virtual Machine Scale Set per partition. There's no need for the topology plugin anymore, so the job submission plugin isn't needed either. Instead, submitting to multiple partitions is the recommended option for use cases that require jobs submission to multiple placement groups.
0 commit comments