Skip to content

Commit e9f5ce4

Browse files
authored
Merge pull request #302119 from tfitzmac/0701edit8
copy edit
2 parents 79177bb + a0d4f74 commit e9f5ce4

File tree

3 files changed

+107
-105
lines changed

3 files changed

+107
-105
lines changed

articles/cyclecloud/service-policy.md

Lines changed: 22 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -4,55 +4,55 @@ description: Complete service policy for Azure CycleCloud. See release types, re
44
services: azure cyclecloud
55
author: KimliW
66
ms.topic: conceptual
7-
ms.date: 06/13/2025
7+
ms.date: 07/01/2025
88
ms.author: adjohnso
99
---
1010

11-
# Azure CycleCloud Service Policy
11+
# Azure CycleCloud service policy
1212

13-
This article describes the servicing policy for Azure CycleCloud, and what you must do to keep your system in a supported state.
13+
This article describes the servicing policy for Azure CycleCloud and what you need to do to keep your system in a supported state.
1414

15-
## Lifecycle Policy
15+
## Lifecycle policy
1616

1717
Azure CycleCloud follows Microsoft's [Modern Lifecycle Policy](https://support.microsoft.com/help/30881/modern-lifecycle-policy).
1818

19-
## Release Types
19+
## Release types
2020

21-
Microsoft is responsible for the end-to-end servicing lifecycle for the Azure CycleCloud software release and update packages, which can be downloaded directly from Microsoft. We recommend Azure CycleCloud Operators set maintenance windows when installing releases. Releases update your Azure CycleCloud installation's version.
21+
Microsoft manages the end-to-end servicing lifecycle for the Azure CycleCloud software release and update packages. You can download these packages directly from Microsoft. We recommend that Azure CycleCloud operators set maintenance windows when installing releases. Each release updates your Azure CycleCloud installation's version.
2222

23-
There are three types of Azure CycleCloud release: major, minor, and hotfix.
23+
There are three types of Azure CycleCloud releases: major, minor, and hotfix.
2424

25-
* **Major Release**: These packages include the latest Azure CycleCloud features and functionality. Although we attempt to minimize any breaking changes between major versions, we do not guarantee backwards compatibility between major releases. We will make efforts to call out any relevant warnings or details for upgrading.
26-
* **Minor Update**: These packages can include the latest Azure CycleCloud security updates, bug fixes, and feature updates. We guarantee backwards compatibility within all the minor releases for a given major release.
27-
* **Hotfix**: Occasionally, Microsoft provides Azure CycleCloud hotfixes that address a specific issue or issues that are often preventative or time-sensitive. A separate hotfix may be provided for each supported version of Azure CycleCloud as appropriate. Each fix for a specific iteration is cumulative and includes the previous updates for that same version.
25+
* **Major release**: These packages include the latest Azure CycleCloud features and functionality. Although we attempt to minimize any breaking changes between major versions, we don't guarantee backwards compatibility between major releases. We make efforts to call out any relevant warnings or details for upgrading.
26+
* **Minor update**: These packages include the latest Azure CycleCloud security updates, bug fixes, and feature updates. We guarantee backwards compatibility within all the minor releases for a given major release.
27+
* **Hotfix**: Occasionally, Microsoft provides Azure CycleCloud hotfixes that address a specific issue or issues that are often preventative or time-sensitive. Microsoft might provide a separate hotfix for each supported version of Azure CycleCloud. Each fix for a specific version is cumulative and includes previous updates for that same version.
2828

2929
For more information about a specific release, see the [product documentation](/azure/cyclecloud/release-notes).
3030

31-
## Azure CycleCloud Release Cadence
31+
## Azure CycleCloud release cadence
3232

33-
Microsoft expects to release Azure CycleCloud updates on a monthly cadence. However, it's possible to have multiple, or no update releases in a month.
33+
Microsoft plans to release Azure CycleCloud updates every month. However, some months might have multiple updates or no updates.
3434

35-
For information about a specific update see the release notes for that update in our [product documentation](/azure/cyclecloud/release-notes).
35+
For information about a specific update, see the release notes for that update in our [product documentation](/azure/cyclecloud/release-notes).
3636

37-
## Supported Azure CycleCloud Versions
37+
## Supported Azure CycleCloud versions
3838

39-
To continue to receive support, you must keep your Azure CycleCloud deployment current to either the previous or latest major releases, running any minor update version for either major release.
39+
To continue receiving support, keep your Azure CycleCloud deployment current with either the previous or latest major releases. You can run any minor update version for either major release.
4040

41-
If your Azure CycleCloud installation is behind bymore than two major release updates, it's considered out of compliance and must be updated to at least the minimum supported version.
41+
If your Azure CycleCloud installation is behind by more than two major release updates, it's considered out of compliance. You must update it to at least the minimum supported version.
4242

43-
For example, if the previous and current major releases are versions 6 and 7, and the three most recent update versions are:
43+
For example, if the previous and current major releases are versions 6 and 7, the three most recent update versions are:
4444

4545
* Version 6: 6.8.0, 6.7.0, 6.6.0
4646
* Version 7: 7.2.0, 7.1.1, 7.0.0
4747

48-
In this example, all versions listed above would be supported, but 5.x.x - and all prior versions - would be out of support. Additionally, when version 8.x.x is released, version 6.x would transition to legacy/unsupported.
48+
In this example, all versions listed earlier are supported, but versions 5.x.x and earlier versions are out of support. When version 8.x.x is released, version 6.x.x transitions to legacy unsupported.
4949

50-
Azure CycleCloud software update packages are cumulative. If you decide to defer one or more updates, you may install the latest update package to become compliant.
50+
Azure CycleCloud software update packages are cumulative. If you defer one or more updates, you can install the latest update package to become compliant.
5151

52-
## Keeping Your System Supported
52+
## Keep your system supported
5353

54-
See product documentation to learn how to upgrade running deployments to a supported version: [Upgrade Azure CycleCloud](~/articles/cyclecloud/how-to/upgrade-and-migrate.md).
54+
To learn how to upgrade running deployments to a supported version, see the product documentation: [Upgrade Azure CycleCloud](~/articles/cyclecloud/how-to/upgrade-and-migrate.md).
5555

56-
## Get Support
56+
## Get support
5757

5858
Azure CycleCloud follows the same support process as Azure. Enterprise customers can follow the process described in [How to Create an Azure Support Request](/azure/azure-supportability/how-to-create-azure-support-request). For more information, see the [Azure Support FAQs](https://azure.microsoft.com/support/faq/).

articles/cyclecloud/slurm-3.md

Lines changed: 41 additions & 41 deletions
Original file line numberDiff line numberDiff line change
@@ -2,43 +2,43 @@
22
title: Slurm Scheduler Integration version 3.0
33
description: New CycleCloud Slurm 3.0+ functionality.
44
author: anhoward
5-
ms.date: 06/13/2025
5+
ms.date: 07/01/2025
66
ms.author: anhoward
77
---
88

99
# CycleCloud Slurm 3.0
1010

11-
Slurm scheduler support was rewritten as part of the CycleCloud 8.4.0 release. Key features include:
11+
We rewrote the Slurm scheduler support as part of the CycleCloud 8.4.0 release. Key features include:
1212

13-
* Support for dynamic nodes, and dynamic partitions via dynamic nodearays, supporting both single and multiple virtual machine (VM) sizes
14-
* New Slurm versions 23.02 and 22.05.8
15-
* Cost reporting via `azslurm` CLI
16-
* `azslurm` cli based autoscaler
17-
* Ubuntu 20 support
18-
* Removed need for topology plugin, and therefore also any submit plugin
13+
* Support for dynamic nodes and dynamic partitions through dynamic node arrays. This feature supports both single and multiple virtual machine (VM) sizes.
14+
* New Slurm versions 23.02 and 22.05.8.
15+
* Cost reporting through the `azslurm` CLI.
16+
* `azslurm` CLI-based autoscaler.
17+
* Ubuntu 20 support.
18+
* Removed need for topology plugin and any associated submit plugin.
1919

20-
## Slurm Clusters in CycleCloud versions < 8.4.0
20+
## Slurm Clusters in CycleCloud versions earlier than 8.4.0
2121

2222
For more information, see [Transitioning from 2.7 to 3.0](#transitioning-from-27-to-30).
2323

24-
### Making Cluster Changes
24+
### Making cluster changes
2525

26-
The Slurm cluster deployed in CycleCloud contains a cli called `azslurm` to facilitate making changes to the cluster. After making any changes to the cluster, run the following command as root on the Slurm scheduler node to rebuild the `azure.conf` and update the nodes in the cluster:
26+
The Slurm cluster that you deploy in CycleCloud includes a CLI called `azslurm` to help you make changes to the cluster. After making any changes to the cluster, run the following command as root on the Slurm scheduler node to rebuild the `azure.conf` and update the nodes in the cluster:
2727

2828
```bash
2929
$ sudo -i
3030
# azslurm scale
3131
```
3232

33-
The command creates the partitions with the correct number of nodes, the proper `gres.conf` and restart the `slurmctld`.
33+
The command creates the partitions with the correct number of nodes, sets up the proper `gres.conf`, and restarts the `slurmctld`.
3434

35-
### No longer precreating execute nodes
35+
### Nodes aren't precreated anymore
3636

37-
Starting CycleCloud version 3.0.0 Slurm project, the nodes aren't precreating. Nodes are created when `azslurm resume` is invoked, or by manually creating them in CycleCloud using CLI.
37+
Starting with CycleCloud version 3.0.0 Slurm project, the nodes aren't precreated. You create nodes when you invoke `azslurm resume` or when you manually create them in CycleCloud using the CLI.
3838

3939
### Creating extra partitions
4040

41-
The default template that ships with Azure CycleCloud has three partitions (`hpc`, `htc` and `dynamic`), and you can define custom nodearrays that map directly to Slurm partitions. For example, to create a GPU partition, add the following section to your cluster template:
41+
The default template that ships with Azure CycleCloud has three partitions (`hpc`, `htc`, and `dynamic`), and you can define custom node arrays that map directly to Slurm partitions. For example, to create a GPU partition, add the following section to your cluster template:
4242

4343
```ini
4444
[[nodearray gpu]]
@@ -58,9 +58,9 @@ The default template that ships with Azure CycleCloud has three partitions (`hpc
5858
AssociatePublicIpAddress = $ExecuteNodesPublic
5959
```
6060

61-
### Dynamic Partitions
61+
### Dynamic partitions
6262

63-
Starting CycleCloud version 3.0.1, we support dynamic partitions. You can make a `nodearray` map to a dynamic partition by adding the following. The `myfeature` could be any desired feature description or more than one feature, separated by a comma.
63+
Starting with CycleCloud version 3.0.1, the solution supports dynamic partitions. You can make a `nodearray` map to a dynamic partition by adding the following code. The `myfeature` value can be any desired feature description or more than one feature, separated by a comma.
6464

6565
```ini
6666
[[[configuration]]]
@@ -71,37 +71,37 @@ Starting CycleCloud version 3.0.1, we support dynamic partitions. You can make a
7171
slurm.dynamic_config := "-Z --conf \"Feature=myfeature\""
7272
```
7373

74-
The shared code snip generates a dynamic partition like the following
74+
The shared code snippet generates a dynamic partition like the following code:
7575

7676
```ini
7777
# Creating dynamic nodeset and partition using slurm.dynamic_config=-Z --conf "Feature=myfeature"
7878
Nodeset=mydynamicns Feature=myfeature
7979
PartitionName=mydynamicpart Nodes=mydynamicns
8080
```
8181

82-
### Using Dynamic Partitions to Autoscale
82+
### Using dynamic partitions to autoscale
8383

84-
By default, dynamic partition doesn't include any nodes. You can start nodes through CycleCloud or by running `azslurm resume` manually, they join the cluster using the name you choose. However, since Slurm isn't aware of these nodes ahead of time, it can't autoscale them up.
84+
By default, a dynamic partition doesn't include any nodes. You can start nodes through CycleCloud or by running `azslurm resume` manually. The nodes join the cluster using the name you choose. However, since Slurm isn't aware of these nodes ahead of time, it can't autoscale them.
8585

86-
Instead, you can also precreate node records like so, which allows Slurm to autoscale them up.
86+
Instead, you can precreate node records like so, which allows Slurm to autoscale them.
8787

8888
```bash
8989
scontrol create nodename=f4-[1-10] Feature=myfeature State=CLOUD
9090
```
9191

92-
One other advantage of dynamic partitions is that you can support **multiple VM sizes in the same partition**.
93-
Simply add the VM Size name as a feature, and then `azslurm` can distinguish which VM size you want to use.
92+
Another advantage of dynamic partitions is that you can support **multiple VM sizes in the same partition**.
93+
Simply add the VM size name as a feature, and then `azslurm` can distinguish which VM size you want to use.
9494

95-
**_Note_ The VM Size is added implicitly. You do not need to add it to `slurm.dynamic_config`**
95+
**_Note_** The VM size is added implicitly. You don't need to add it to `slurm.dynamic_config`.
9696

9797
```bash
9898
scontrol create nodename=f4-[1-10] Feature=myfeature,Standard_F4 State=CLOUD
9999
scontrol create nodename=f8-[1-10] Feature=myfeature,Standard_F8 State=CLOUD
100100
```
101101

102-
Either way, once you create these nodes in a `State=Cloud` they become available for autoscaling like other nodes.
102+
Either way, when you create these nodes in a `State=Cloud` state, they become available for autoscaling like other nodes.
103103

104-
To support **multiple VM sizes in a CycleCloud nodearray**, you can alter the template to allow multiple VM sizes by adding `Config.Mutiselect = true`.
104+
To support **multiple VM sizes in a CycleCloud node array**, you can change the template to allow multiple VM sizes by adding `Config.Mutiselect = true`.
105105

106106
```ini
107107
[[[parameter DynamicMachineType]]]
@@ -112,34 +112,34 @@ To support **multiple VM sizes in a CycleCloud nodearray**, you can alter the te
112112
Config.Multiselect = true
113113
```
114114

115-
### Dynamic Scale down
115+
### Dynamic scale down
116116

117-
By default, all nodes in the dynamic partition scales down just like the other partitions. To disable dynamic partition, see [SuspendExcParts](https://slurm.schedmd.com/slurm.conf.html).
117+
By default, all nodes in the dynamic partition scale down just like the other partitions. To disable dynamic partition, see [SuspendExcParts](https://slurm.schedmd.com/slurm.conf.html).
118118

119119
### Manual scaling
120120

121-
If cyclecloud_slurm detects that autoscale is disabled (SuspendTime=-1), it uses the FUTURE state to denote nodes that're powered down instead of relying on the power state in Slurm. That is, when autoscale is enabled, off nodes are denoted as `idle~` in sinfo. When autoscaling is off, the inactive nodes don’t show up in sinfo. You can still see their definition with `scontrol show nodes --future`.
121+
If cyclecloud_slurm detects that autoscale is disabled (SuspendTime=-1), it uses the FUTURE state to denote nodes that are powered down instead of relying on the power state in Slurm. When autoscale is enabled, `sinfo` shows off nodes as `idle~`. When autoscale is disabled, `sinfo` doesn't show inactive nodes. You can still see their definition with `scontrol show nodes --future`.
122122

123-
To start new nodes, run `/opt/azurehpc/slurm/resume_program.sh node_list` (for example, htc-[1-10]).
123+
To start new nodes, run `/opt/azurehpc/slurm/resume_program.sh node_list` (for example, `htc-[1-10]`).
124124

125-
To shutdown nodes, run `/opt/azurehpc/slurm/suspend_program.sh node_list` (for example, htc-[1-10]).
125+
To shut down nodes, run `/opt/azurehpc/slurm/suspend_program.sh node_list` (for example, `htc-[1-10]`).
126126

127-
To start a cluster in this mode, simply add `SuspendTime=-1` to the supplemental Slurm config in the template.
127+
To start a cluster in this mode, add `SuspendTime=-1` to the supplemental Slurm config in the template.
128128

129-
To switch a cluster to this mode, add `SuspendTime=-1` to the slurm.conf and run `scontrol reconfigure`. Then run `azslurm remove_nodes && azslurm scale`.
129+
To switch a cluster to this mode, add `SuspendTime=-1` to the `slurm.conf` file and run `scontrol reconfigure`. Then run `azslurm remove_nodes` and `azslurm scale`.
130130

131131
## Troubleshooting
132132

133133
### Transitioning from 2.7 to 3.0
134134

135-
1. The installation folder changed
135+
1. The installation folder changed from
136136
`/opt/cycle/slurm`
137-
->
138-
`/opt/azurehpc/slurm`
137+
to
138+
`/opt/azurehpc/slurm`.
139139

140-
2. Autoscale logs are now in `/opt/azurehpc/slurm/logs` instead of `/var/log/slurmctld`. Note, that `slurmctld.log` is in this folder.
140+
1. Autoscale logs are now in `/opt/azurehpc/slurm/logs` instead of `/var/log/slurmctld`. The `slurmctld.log` file is in this folder.
141141

142-
3. The `cyclecloud_slurm.sh` script no longer available. A new CLI tool called `azslurm` replaced `cyclecloud_slurm.sh`, and you can be run as root. `azslurm` also supports autocomplete.
142+
1. The `cyclecloud_slurm.sh` script is no longer available. A new CLI tool called `azslurm` replaces `cyclecloud_slurm.sh`. You run `azslurm` as root, and it supports autocomplete.
143143

144144
```bash
145145
[root@scheduler ~]# azslurm
@@ -167,10 +167,10 @@ To switch a cluster to this mode, add `SuspendTime=-1` to the slurm.conf and run
167167
wait_for_resume - Wait for a set of nodes to converge.
168168
```
169169

170-
5. CycleCloud no longer creates nodes ahead of time. It only creates them when they're needed.
170+
1. CycleCloud doesn't create nodes ahead of time. It only creates nodes when you need them.
171171
172-
6. All Slurm binaries are inside the `azure-slurm-install-pkg*.tar.gz` file, under `slurm-pkgs`. They're pulled from a specific binary release. The current binary release is [4.0.0](https://github.com/Azure/cyclecloud-slurm/releases/tag/4.0.0)
172+
6. All Slurm binaries are inside the `azure-slurm-install-pkg*.tar.gz` file, under `slurm-pkgs`. They're pulled from a specific binary release. The current binary release is [4.0.0](https://github.com/Azure/cyclecloud-slurm/releases/tag/4.0.0).
173173

174-
7. For MPI jobs, the only default network boundary is the partition. Unlike version 2.x, each pertition doesn't include multiple "placement groups". So you only have one colocated VMSS per partition. There's no need for the topology plugin anymore, so the job submission plugin isn't needed either. Instead, submitting to multiple partitions is the recommended option for use cases that require jobs submission to multiple placement groups.
174+
7. For MPI jobs, the only default network boundary is the partition. Unlike version 2.x, each partition doesn't include multiple "placement groups". So you only have one colocated Virtual Machine Scale Set per partition. There's no need for the topology plugin anymore, so the job submission plugin isn't needed either. Instead, submitting to multiple partitions is the recommended option for use cases that require jobs submission to multiple placement groups.
175175
176176
[!INCLUDE [scheduler-integration](~/articles/cyclecloud/includes/scheduler-integration.md)]

0 commit comments

Comments
 (0)