Skip to content

Commit 79177bb

Browse files
Merge pull request #302118 from tfitzmac/0701edit7
copy edit
2 parents efea867 + 7318542 commit 79177bb

File tree

9 files changed

+213
-223
lines changed

9 files changed

+213
-223
lines changed

articles/cyclecloud/openpbs.md

Lines changed: 29 additions & 29 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22
title: OpenPBS Integration
33
description: OpenPBS scheduler configuration in Azure CycleCloud.
44
author: adriankjohnson
5-
ms.date: 06/11/2025
5+
ms.date: 07/01/2025
66
ms.author: adjohnso
77
---
88

@@ -11,7 +11,7 @@ ms.author: adjohnso
1111
[//]: # (Need to link to the scheduler README on GitHub)
1212

1313
::: moniker range="=cyclecloud-7"
14-
[OpenPBS](http://openpbs.org/) can easily be enabled on a CycleCloud cluster by modifying the "run_list", in the configuration section of your cluster definition. A PBS Professional (PBS Pro) cluster has two main parts: the 'master' node, which runs the software on a shared filesystem, and the 'execute' nodes, which mount that filesystem and run the submitted jobs. For example, a simple cluster template snippet may look like:
14+
You can enable [OpenPBS](http://openpbs.org/) on a CycleCloud cluster by changing the `run_list` in the configuration section of your cluster definition. A PBS Professional (PBS Pro) cluster has two main parts: the **primary** node, which runs the software on a shared filesystem, and the **execute** nodes, which mount that filesystem and run the submitted jobs. For example, a simple cluster template snippet might look like:
1515

1616
``` ini
1717
[cluster my-pbspro]
@@ -31,30 +31,30 @@ ms.author: adjohnso
3131
run_list = role[pbspro_execute_role]
3232
```
3333

34-
Importing and starting a cluster with definition in CycleCloud yields a single 'master' node. Execute nodes can be added to the cluster via the `cyclecloud add_node` command. For example, to add 10 more execute nodes:
34+
When you import and start a cluster with this definition in CycleCloud, you get a single **primary** node. You can add **execute** nodes to the cluster by using the `cyclecloud add_node` command. For example, to add 10 more **execute** nodes, use:
3535

3636
```azurecli-interactive
3737
cyclecloud add_node my-pbspro -t execute -c 10
3838
```
3939

40-
## PBS Resource-based Autoscaling
40+
## PBS resource-based autoscaling
4141

42-
Cyclecloud maintains two resources to expand the dynamic provisioning capability. These resources are *nodearray* and *machinetype*.
42+
CycleCloud maintains two resources to expand the dynamic provisioning capability. These resources are *nodearray* and *machinetype*.
4343

44-
If you submit a job and specify a nodearray resource by `qsub -l nodearray=highmem -- /bin/hostname` then, CycleCloud adds nodes to the nodearray named 'highmem'. If no such nodearray exists, the job stays idle.
44+
When you submit a job and specify a node array resource with `qsub -l nodearray=highmem -- /bin/hostname`, CycleCloud adds nodes to the node array named `highmem`. If the node array doesn't exist, the job stays idle.
4545

46-
Similarly, if a machinetype resource is specified which a job submission, for example, `qsub -l machinetype:Standard_L32s_v2 my-job.sh`, then CycleCloud autoscales the 'Standard_L32s_v2' in the 'execute' (default) nodearray. If that machine type isnt available in the 'execute' node array, the job stays idle.
46+
When you specify a machine type resource in a job submission, such as `qsub -l machinetype:Standard_L32s_v2 my-job.sh`, CycleCloud autoscales the `Standard_L32s_v2` machines in the `execute` (default) node array. If the machine type isn't available in the `execute` node array, the job stays idle.
4747

48-
These resources can be used in combination as:
48+
You can use these resources together as:
4949

5050
```bash
5151
qsub -l nodes=8:ppn=16:nodearray=hpc:machinetype=Standard_HB60rs my-simulation.sh
5252
```
53-
Which autoscales only if the 'Standard_HB60rs' machines are specified in the 'hpc' node array.
53+
Autoscales only if you specify the `Standard_HB60rs` machines in the `hpc` node array.
5454

55-
## Adding extra queues assigned to nodearrays
55+
## Adding extra queues assigned to node arrays
5656

57-
On clusters with multiple nodearrays, it's common to create separate queues to automatically route jobs to the appropriate VM type. In this example, we assume the following "gpu" nodearray is defined in your cluster template:
57+
On clusters with multiple node arrays, create separate queues to automatically route jobs to the appropriate VM type. In this example, assume the following `gpu` node array is defined in your cluster template:
5858

5959
```bash
6060
[[nodearray gpu]]
@@ -65,7 +65,7 @@ On clusters with multiple nodearrays, it's common to create separate queues to a
6565
pbspro.slot_type = gpu
6666
```
6767

68-
After importing the cluster template and starting the cluster, the following commands can be ran on the server node to create the "gpu" queue:
68+
After you import the cluster template and start the cluster, run the following commands on the server node to create the `gpu` queue:
6969

7070
```bash
7171
/opt/pbs/bin/qmgr -c "create queue gpu"
@@ -80,25 +80,25 @@ After importing the cluster template and starting the cluster, the following com
8080
```
8181

8282
> [!NOTE]
83-
> As shown in the example, queue definition packs all VMs in the queue into a single VM scale set to support MPI jobs. To define the queue for serial jobs and allow multiple VM Scalesets, set `ungrouped = true` for both `resources_default` and `default_chunk`. You can also set `resources_default.place = pack` if you want the scheduler to pack jobs onto VMs instead of round-robin allocation of jobs. For more information on PBS job packing, see the official [PBS Professional OSS documentation](https://www.altair.com/pbs-works-documentation/).
83+
> As shown in the example, the queue definition packs all VMs in the queue into a single virtual machine scale set to support MPI jobs. To define the queue for serial jobs and allow multiple virtual machine scale sets, set `ungrouped = true` for both `resources_default` and `default_chunk`. Set `resources_default.place = pack` if you want the scheduler to pack jobs onto VMs instead of round-robin allocation of jobs. For more information on PBS job packing, see the official [PBS Professional OSS documentation](https://www.altair.com/pbs-works-documentation/).
8484
85-
## PBS Professional Configuration Reference
85+
## PBS Professional configuration reference
8686

87-
The following are the PBS Professional(PBS Pro) specific configuration options you can toggle to customize functionality:
87+
The following table describes the PBS Professional (PBS Pro) specific configuration options you can toggle to customize functionality:
8888

8989
| PBS Pro Options | Description |
9090
| --------------- | ----------- |
91-
| pbspro.slots | The number of slots for a given node to report to PBS Pro. The number of slots is the number of concurrent jobs a node can execute, this value defaults to the number of CPUs on a given machine. You can override this value in cases where you don't run jobs based on CPU but on memory, GPUs, etc. |
92-
| pbspro.slot_type | The name of type of 'slot' a node provides. The default is 'execute'. When a job is tagged with the hard resource `slot_type=<type>`, that job runs *only* on the machine of the same slot type. It allows you to create a different software and hardware configurations per node and ensure an appropriate job is always scheduled on the correct type of node. |
93-
| pbspro.version | Default: '18.1.3-0'. This is currently the default version and *only* option to install and run. This is currently the default version and *only* option. In the future more versions of the PBS Pro software may be supported. |
91+
| pbspro.slots | The number of slots for a given node to report to PBS Pro. The number of slots is the number of concurrent jobs a node can execute. This value defaults to the number of CPUs on a given machine. You can override this value in cases where you don't run jobs based on CPU but on memory, GPUs, and other resources. |
92+
| pbspro.slot_type | The name of the type of 'slot' a node provides. The default is 'execute'. When you tag a job with the hard resource `slot_type=<type>`, the job runs *only* on the machines with the same slot type. This setting lets you create different software and hardware configurations for each node and ensures that the right job is always scheduled on the correct type of node. |
93+
| pbspro.version | Default: '18.1.3-0'. This version is currently the default and *only* option to install and run. In the future, more versions of the PBS Pro software might be supported. |
9494

9595
::: moniker-end
9696

9797
::: moniker range=">=cyclecloud-8"
9898

9999
## Connect PBS with CycleCloud
100100

101-
CycleCloud manages [OpenPBS](http://openpbs.org/) clusters through an installable agent called [`azpbs`](https://github.com/Azure/cyclecloud-pbspro). This agent connects to CycleCloud to read cluster and VM configurations and also integrates with OpenPBS to effectively process the job and host information. All `azpbs` configurations are found in the `autoscale.json` file, normally `/opt/cycle/pbspro/autoscale.json`.
101+
CycleCloud manages [OpenPBS](http://openpbs.org/) clusters through an installable agent called [`azpbs`](https://github.com/Azure/cyclecloud-pbspro). This agent connects to CycleCloud to read cluster and VM configurations. It also integrates with OpenPBS to process the job and host information. You can find all `azpbs` configurations in the `autoscale.json` file, usually located at `/opt/cycle/pbspro/autoscale.json`.
102102

103103
```
104104
"password": "260D39rWX13X",
@@ -110,24 +110,24 @@ CycleCloud manages [OpenPBS](http://openpbs.org/) clusters through an installab
110110
"cluster_name": "mechanical_grid",
111111
```
112112

113-
### Important Files
113+
### Important files
114114

115-
The `azpbs` agent parses the PBS configuration each time it's called - jobs, queues, resources. Information is provided in the stderr and stdout of the command and to a log file, both at configurable levels. All PBS management commands (`qcmd`) with arguments are logged to file as well.
115+
The `azpbs` agent parses the PBS configuration each time it's called - jobs, queues, resources. The agent provides this information in the stderr and stdout of the command and to a log file, both at configurable levels. The agent also logs all PBS management commands (`qcmd`) with arguments to a file.
116116

117-
All these files can be found in the _/opt/cycle/pbspro/_ directory where the agent is installed.
117+
You can find all these files in the _/opt/cycle/pbspro/_ directory where you install the agent.
118118

119119
| File | Location | Description |
120120
|---|---|---|
121121
| Autoscale Config | autoscale.json | Configuration for Autoscale, Resource Map, CycleCloud access information |
122122
| Autoscale Log | autoscale.log | Agent main thread logging including CycleCloud host management |
123123
| Demand Log | demand.log | Detailed log for resource matching |
124-
| qcmd Trace Log | qcmd.log | Logging the agent `qcmd` calls |
124+
| qcmd Trace Log | qcmd.log | Logging the agent `qcmd` calls |
125125
| Logging Config | logging.conf | Configurations for logging masks and file locations |
126126

127127

128128
### Defining OpenPBS Resources
129-
This project allows general association of OpenPBS resources with Azure VM resources via the cyclecloud-pbspro (azpbs) project. This resource relationship defined in `autoscale.json`.
130-
The default resources defined with the cluster template we ship with are
129+
This project enables you to associate OpenPBS resources with Azure VM resources through the cyclecloud-pbspro (azpbs) project. You define this resource relationship in `autoscale.json`.
130+
The cluster template includes the following default resources:
131131

132132
```json
133133
{"default_resources": [
@@ -164,9 +164,9 @@ The default resources defined with the cluster template we ship with are
164164
}
165165
```
166166

167-
The OpenPBS resource named `mem` is equated to a node attribute named `node.memory`, which is the total memory of any virtual machine. This configuration allows `azpbs` to process a resource request such as `-l mem=4gb` by comparing the value of the job resource requirements to node resources.
167+
The OpenPBS resource named `mem` corresponds to a node attribute named `node.memory`, which represents the total memory of any virtual machine. This configuration lets `azpbs` handle a resource request like `-l mem=4gb` by comparing the value of the job resource requirements to node resources.
168168

169-
Currently, disk size is hardcoded to `size::20g`. Here's an example of handling VM Size specific disk size
169+
Currently, the disk size is set to `size::20g`. Here's an example of how to handle VM Size specific disk size:
170170
```json
171171
{
172172
"select": {"node.vm_size": "Standard_F2"},
@@ -180,7 +180,7 @@ Currently, disk size is hardcoded to `size::20g`. Here's an example of handling
180180
}
181181
```
182182

183-
### Autoscale and Scalesets
183+
### Autoscale and scale sets
184184

185185
CycleCloud treats spanning and serial jobs differently in OpenPBS clusters. Spanning jobs land on nodes that are part of the same placement group. The placement group has a particular platform meaning VirtualMachineScaleSet with SinglePlacementGroup=true) and CycleCloud manages a named placement group for each spanned node set. Use the PBS resource `group_id` for this placement group name.
186186

@@ -189,7 +189,7 @@ The `hpc` queue appends the equivalent of `-l place=scatter:group=group_id` by u
189189

190190
### Installing the CycleCloud OpenPBS Agent `azpbs`
191191

192-
The OpenPBS CycleCloud cluster manages the installation and configuration of the agent on the server node. The preparation includes setting PBS resources, queues, and hooks. A scripted install can be done outside of CycleCloud as well.
192+
The OpenPBS CycleCloud cluster manages the installation and configuration of the agent on the server node. The preparation steps include setting PBS resources, queues, and hooks. You can also perform a scripted installation outside of CycleCloud.
193193

194194
```bash
195195
# Prerequisite: python3, 3.6 or newer, must be installed and in the PATH

0 commit comments

Comments
 (0)