Skip to content

Commit e008d30

Browse files
authored
Fixed the Acrolinx score
1 parent 196b8f0 commit e008d30

File tree

1 file changed

+16
-16
lines changed

1 file changed

+16
-16
lines changed

articles/cyclecloud/slurm-3.md

Lines changed: 16 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@ ms.author: anhoward
1010

1111
Slurm scheduler support was rewritten as part of the CycleCloud 8.4.0 release. Key features include:
1212

13-
* Support for dynamic nodes, and dynamic partitions via dynamic nodearays, supporting both single and multiple VM sizes
13+
* Support for dynamic nodes, and dynamic partitions via dynamic nodearays, supporting both single and multiple virtual machine (VM) sizes
1414
* New slurm versions 23.02 and 22.05.8
1515
* Cost reporting via `azslurm` CLI
1616
* `azslurm` cli based autoscaler
@@ -30,11 +30,11 @@ The Slurm cluster deployed in CycleCloud contains a cli called `azslurm` to faci
3030
# azslurm scale
3131
```
3232

33-
This should create the partitions with the correct number of nodes, the proper `gres.conf` and restart the `slurmctld`.
33+
This creates the partitions with the correct number of nodes, the proper `gres.conf` and restart the `slurmctld`.
3434

3535
### No longer pre-creating execute nodes
3636

37-
As of version 3.0.0 of the CycleCloud Slurm project, we are no longer pre-creating the nodes in CycleCloud. Nodes are created when `azslurm resume` is invoked, or by manually creating them in CycleCloud via CLI.
37+
Starting CycleCloud version 3.0.0 Slurm project, the nodes aren't pre-creating. Nodes are created when `azslurm resume` is invoked, or by manually creating them in CycleCloud using CLI.
3838

3939
### Creating additional partitions
4040

@@ -60,8 +60,8 @@ The default template that ships with Azure CycleCloud has three partitions (`hpc
6060

6161
### Dynamic Partitions
6262

63-
As of `3.0.1`, we support dynamic partitions. You can make a `nodearray` map to a dynamic partition by adding the following.
64-
Note that `myfeature` could be any desired Feature description. It can also be more than one feature, separated by a comma.
63+
Starting CycleCloud version 3.0.1, we support dynamic partitions. You can make a `nodearray` map to a dynamic partition by adding the following.
64+
Note that `myfeature` could be any desired feature description or more than one feature, separated by a comma.
6565

6666
```ini
6767
[[[configuration]]]
@@ -72,7 +72,7 @@ Note that `myfeature` could be any desired Feature description. It can also be m
7272
slurm.dynamic_config := "-Z --conf \"Feature=myfeature\""
7373
```
7474

75-
This will generate a dynamic partition like the following
75+
This generates a dynamic partition like the following
7676

7777
```ini
7878
# Creating dynamic nodeset and partition using slurm.dynamic_config=-Z --conf "Feature=myfeature"
@@ -82,7 +82,7 @@ PartitionName=mydynamicpart Nodes=mydynamicns
8282

8383
### Using Dynamic Partitions to Autoscale
8484

85-
By default, we define no nodes in the dynamic partition. Instead, you can start nodes via CycleCloud or by manually invoking `azslurm resume` and they will join the cluster with whatever name you picked. However, Slurm does not know about these nodes so it can not autoscale them up.
85+
By default, dynamic partition deosn't inclue any nodes. You can start nodes through CycleCloud or by running `azslurm resume` manually, they'll join the cluster using the name you choose. However, since Slurm isn't aware of these nodes ahead of time, it can't autoscale them up.
8686

8787
Instead, you can also pre-create node records like so, which allows Slurm to autoscale them up.
8888

@@ -100,7 +100,7 @@ scontrol create nodename=f4-[1-10] Feature=myfeature,Standard_F4 State=CLOUD
100100
scontrol create nodename=f8-[1-10] Feature=myfeature,Standard_F8 State=CLOUD
101101
```
102102

103-
Either way, once you have created these nodes in a `State=Cloud` they are now available to autoscale like other nodes.
103+
Either way, once you have created these nodes in a `State=Cloud` they're now available to autoscale like other nodes.
104104

105105
To support **multiple VM sizes in a CycleCloud nodearray**, you can alter the template to allow multiple VM sizes by adding `Config.Mutiselect = true`.
106106

@@ -115,11 +115,11 @@ To support **multiple VM sizes in a CycleCloud nodearray**, you can alter the te
115115

116116
### Dynamic Scaledown
117117

118-
By default, all nodes in the dynamic partition will scale down just like the other partitions. To disable this, see [SuspendExcParts](https://slurm.schedmd.com/slurm.conf.html).
118+
By default, all nodes in the dynamic partition scales down just like the other partitions. To disable this, see [SuspendExcParts](https://slurm.schedmd.com/slurm.conf.html).
119119

120120
### Manual scaling
121121

122-
If cyclecloud_slurm detects that autoscale is disabled (SuspendTime=-1), it will use the FUTURE state to denote nodes that are powered down instead of relying on the power state in Slurm. i.e. When autoscale is enabled, off nodes are denoted as `idle~` in sinfo. When autoscale is disabled, the off nodes will not appear in sinfo at all. You can still see their definition with `scontrol show nodes --future`.
122+
If cyclecloud_slurm detects that autoscale is disabled (SuspendTime=-1), it uses the FUTURE state to denote nodes that're powered down instead of relying on the power state in Slurm. That is, when autoscale is enabled, off nodes are denoted as `idle~` in sinfo. When autoscale is disabled, the off nodes will not appear in sinfo at all. You can still see their definition with `scontrol show nodes --future`.
123123

124124
To start new nodes, run `/opt/azurehpc/slurm/resume_program.sh node_list` (e.g. htc-[1-10]).
125125

@@ -138,9 +138,9 @@ To switch a cluster to this mode, add `SuspendTime=-1` to the slurm.conf and run
138138
->
139139
`/opt/azurehpc/slurm`
140140

141-
2. Autoscale logs are now in `/opt/azurehpc/slurm/logs` instead of `/var/log/slurmctld`. Note, `slurmctld.log` will still be in this folder.
141+
2. Autoscale logs are now in `/opt/azurehpc/slurm/logs` instead of `/var/log/slurmctld`. Note, that `slurmctld.log` will be in this folder.
142142

143-
3. `cyclecloud_slurm.sh` no longer exists. Instead there is a new `azslurm` cli, which can be run as root. `azslurm` supports autocomplete.
143+
3. The `cyclecloud_slurm.sh` script no longer available. It's been replaced by a new CLI tool called `azslurm`, which you can be run as root. `azslurm` also supports autocomplete.
144144

145145
```bash
146146
[root@scheduler ~]# azslurm
@@ -163,15 +163,15 @@ To switch a cluster to this mode, add `SuspendTime=-1` to the slurm.conf and run
163163
resume_fail - Equivalent to SuspendFailProgram, shuts down nodes
164164
retry_failed_nodes - Retries all nodes in a failed state.
165165
scale -
166-
shell - Interactive python shell with relevant objects in local scope. Use --script to run python scripts
166+
shell - Interactive python shell with relevant objects in local scope. Use the --script to run python scripts
167167
suspend - Equivalent to SuspendProgram, shuts down nodes
168168
wait_for_resume - Wait for a set of nodes to converge.
169169
```
170170

171-
4. Nodes are no longer pre-populated in CycleCloud. They are only created when needed.
171+
5. CycleCloud no longer creates nodes ahead of time. It only creates them when they're needed.
172172
173-
5. All slurm binaries are inside the `azure-slurm-install-pkg*.tar.gz` file, under `slurm-pkgs`. They are pulled from a specific binary release. The current binary releaes is [2023-03-13](https://github.com/Azure/cyclecloud-slurm/releases/tag/2023-03-13-bins)
173+
6. All slurm binaries are inside the `azure-slurm-install-pkg*.tar.gz` file, under `slurm-pkgs`. They're pulled from a specific binary release. The current binary release is [4.0.0](https://github.com/Azure/cyclecloud-slurm/releases/tag/4.0.0)
174174

175-
6. For MPI jobs, the only network boundary that exists by default is the partition. There are not multiple "placement groups" per partition like 2.x. So you only have one colocated VMSS per partition. There is also no use of the topology plugin, which necessitated the use of a job submission plugin that is also no longer needed. Instead, submitting to multiple partitions is now the recommended option for use cases that require submitting jobs to multiple placement groups.
175+
7. For MPI jobs, the only default network boundary is the partition. Unlike version 2.x, each pertition doesn't include multiple "placement groups". So you only have one colocated VMSS per partition. There's no need for the topology plugin anymore, so the job submission plugin isn't needed either. Instead, submitting to multiple partitions is the recommended option for use cases that require jobs submission to multiple placement groups.
176176
177177
[!INCLUDE [scheduler-integration](~/articles/cyclecloud/includes/scheduler-integration.md)]

0 commit comments

Comments
 (0)