Skip to content

Commit 0dd0229

Browse files
Merge pull request #301222 from Padmalathas/Cyclecloud_Content_Fixes_Part2
CycleCloud_Content_Quality_Improvements_Part2
2 parents 39b9f2a + b6987c3 commit 0dd0229

File tree

7 files changed

+141
-166
lines changed

7 files changed

+141
-166
lines changed

articles/cyclecloud/openpbs.md

Lines changed: 23 additions & 45 deletions
Original file line numberDiff line numberDiff line change
@@ -2,16 +2,16 @@
22
title: OpenPBS Integration
33
description: OpenPBS scheduler configuration in Azure CycleCloud.
44
author: adriankjohnson
5-
ms.date: 07/29/2021
5+
ms.date: 06/11/2025
66
ms.author: adjohnso
77
---
88

99
# OpenPBS
1010

11-
[//]: # (Need to link to the scheduler README on Github)
11+
[//]: # (Need to link to the scheduler README on GitHub)
1212

1313
::: moniker range="=cyclecloud-7"
14-
[OpenPBS](http://openpbs.org/) can easily be enabled on a CycleCloud cluster by modifying the "run_list" in the configuration section of your cluster definition. The two basic components of a PBS Professional cluster are the 'master' node which provides a shared filesystem on which the PBS Professional software runs, and the 'execute' nodes which are the hosts that mount the shared filesystem and execute the jobs submitted. For example, a simple cluster template snippet may look like:
14+
[OpenPBS](http://openpbs.org/) can easily be enabled on a CycleCloud cluster by modifying the "run_list", in the configuration section of your cluster definition. A PBS Professional (PBS Pro) cluster has two main parts: the 'master' node, which runs the software on a shared filesystem, and the 'execute' nodes, which mount that filesystem and run the submitted jobs. For example, a simple cluster template snippet may look like:
1515

1616
``` ini
1717
[cluster my-pbspro]
@@ -31,7 +31,7 @@ ms.author: adjohnso
3131
run_list = role[pbspro_execute_role]
3232
```
3333

34-
Importing and starting a cluster with definition in CycleCloud will yield a single 'master' node. Execute nodes can be added to the cluster via the `cyclecloud add_node` command. For example, to add 10 more execute nodes:
34+
Importing and starting a cluster with definition in CycleCloud yields a single 'master' node. Execute nodes can be added to the cluster via the `cyclecloud add_node` command. For example, to add 10 more execute nodes:
3535

3636
```azurecli-interactive
3737
cyclecloud add_node my-pbspro -t execute -c 10
@@ -41,22 +41,20 @@ cyclecloud add_node my-pbspro -t execute -c 10
4141

4242
Cyclecloud maintains two resources to expand the dynamic provisioning capability. These resources are *nodearray* and *machinetype*.
4343

44-
If you submit a job and specify a nodearray resource by `qsub -l nodearray=highmem -- /bin/hostname`
45-
then CycleCloud will add nodes to the nodearray named 'highmem'. If there is no such nodearray then the job will remain idle.
44+
If you submit a job and specify a nodearray resource by `qsub -l nodearray=highmem -- /bin/hostname` then, CycleCloud adds nodes to the nodearray named 'highmem'. If no such nodearray exists, the job stays idle.
4645

47-
Similarly if a machinetype resource is specified which a job submission, e.g. `qsub -l machinetype:Standard_L32s_v2 my-job.sh`, then CycleCloud autoscales the 'Standard_L32s_v2' in the 'execute' (default) nodearray. If that machine type is not available in the 'execute' node array then the job will remain idle.
46+
Similarly, if a machinetype resource is specified which a job submission, for example, `qsub -l machinetype:Standard_L32s_v2 my-job.sh`, then CycleCloud autoscales the 'Standard_L32s_v2' in the 'execute' (default) nodearray. If that machine type isn’t available in the 'execute' node array, the job stays idle.
4847

4948
These resources can be used in combination as:
5049

5150
```bash
5251
qsub -l nodes=8:ppn=16:nodearray=hpc:machinetype=Standard_HB60rs my-simulation.sh
5352
```
53+
Which autoscales only if the 'Standard_HB60rs' machines are specified in the 'hpc' node array.
5454

55-
which will autoscale only if the 'Standard_HB60rs' machines are specified an the 'hpc' node array.
55+
## Adding extra queues assigned to nodearrays
5656

57-
## Adding additional queues assigned to nodearrays
58-
59-
On clusters with multiple nodearrays, it's common to create separate queues to automatically route jobs to the appropriate VM type. In this example, we'll assume the following "gpu" nodearray has been defined in your cluster template:
57+
On clusters with multiple nodearrays, it's common to create separate queues to automatically route jobs to the appropriate VM type. In this example, we assume the following "gpu" nodearray is defined in your cluster template:
6058

6159
```bash
6260
[[nodearray gpu]]
@@ -82,29 +80,25 @@ After importing the cluster template and starting the cluster, the following com
8280
```
8381

8482
> [!NOTE]
85-
> The above queue definition will pack all VMs in the queue into a single VM scale set to support MPI jobs. To define the queue for serial jobs and allow multiple VM Scalesets, set `ungrouped = true` for both `resources_default` and `default_chunk`. You can also set `resources_default.place = pack` if you want the scheduler to pack jobs onto VMs instead of round-robin allocation of jobs. For more information on PBS job packing, see the official [PBS Professional OSS documentation](https://www.altair.com/pbs-works-documentation/).
83+
> As shown in the example, queue definition packs all VMs in the queue into a single VM scale set to support MPI jobs. To define the queue for serial jobs and allow multiple VM Scalesets, set `ungrouped = true` for both `resources_default` and `default_chunk`. You can also set `resources_default.place = pack` if you want the scheduler to pack jobs onto VMs instead of round-robin allocation of jobs. For more information on PBS job packing, see the official [PBS Professional OSS documentation](https://www.altair.com/pbs-works-documentation/).
8684
8785
## PBS Professional Configuration Reference
8886

89-
The following are the PBS Professional specific configuration options you can toggle to customize functionality:
87+
The following are the PBS Professional(PBS Pro) specific configuration options you can toggle to customize functionality:
9088

9189
| PBS Pro Options | Description |
9290
| --------------- | ----------- |
9391
| pbspro.slots | The number of slots for a given node to report to PBS Pro. The number of slots is the number of concurrent jobs a node can execute, this value defaults to the number of CPUs on a given machine. You can override this value in cases where you don't run jobs based on CPU but on memory, GPUs, etc. |
94-
| pbspro.slot_type | The name of type of 'slot' a node provides. The default is 'execute'. When a job is tagged with the hard resource `slot_type=<type>`, that job will *only* run on a machine of the same slot type. This allows you to create different software and hardware configurations per node and ensure an appropriate job is always scheduled on the correct type of node. |
95-
| pbspro.version | Default: '18.1.3-0'. This is the PBS Professional version to install and run. This is currently the default and *only* option. In the future additional versions of the PBS Professional software may be supported. |
92+
| pbspro.slot_type | The name of type of 'slot' a node provides. The default is 'execute'. When a job is tagged with the hard resource `slot_type=<type>`, that job runs *only* on the machine of the same slot type. It allows you to create a different software and hardware configurations per node and ensure an appropriate job is always scheduled on the correct type of node. |
93+
| pbspro.version | Default: '18.1.3-0'. This is currently the default version and *only* option to install and run. This is currently the default version and *only* option. In the future more versions of the PBS Pro software may be supported. |
9694

9795
::: moniker-end
9896

9997
::: moniker range=">=cyclecloud-8"
10098

10199
## Connect PBS with CycleCloud
102100

103-
CycleCloud manages [OpenPBS](http://openpbs.org/) clusters through an installable agent called
104-
[`azpbs`](https://github.com/Azure/cyclecloud-pbspro). This agent connect to
105-
CycleCloud to read cluster and VM configurations and also integrates with OpenPBS
106-
to effectively process the job and host information. All `azpbs` configurations
107-
are found in the `autoscale.json` file, normally `/opt/cycle/pbspro/autoscale.json`.
101+
CycleCloud manages [OpenPBS](http://openpbs.org/) clusters through an installable agent called [`azpbs`](https://github.com/Azure/cyclecloud-pbspro). This agent connects to CycleCloud to read cluster and VM configurations and also integrates with OpenPBS to effectively process the job and host information. All `azpbs` configurations are found in the `autoscale.json` file, normally `/opt/cycle/pbspro/autoscale.json`.
108102

109103
```
110104
"password": "260D39rWX13X",
@@ -118,9 +112,7 @@ are found in the `autoscale.json` file, normally `/opt/cycle/pbspro/autoscale.js
118112

119113
### Important Files
120114

121-
The `azpbs` agent parses the PBS configuration each time it's called - jobs, queues, resources.
122-
Information is provided in the stderr and stdout of the command as well as to a log file, both
123-
at configurable levels. All PBS management commands (`qcmd`) with arguments are logged to file as well.
115+
The `azpbs` agent parses the PBS configuration each time it's called - jobs, queues, resources. Information is provided in the stderr and stdout of the command and to a log file, both at configurable levels. All PBS management commands (`qcmd`) with arguments are logged to file as well.
124116

125117
All these files can be found in the _/opt/cycle/pbspro/_ directory where the agent is installed.
126118

@@ -134,10 +126,7 @@ All these files can be found in the _/opt/cycle/pbspro/_ directory where the age
134126

135127

136128
### Defining OpenPBS Resources
137-
This project allows for a generally association of OpenPBS resources with Azure
138-
VM resources via the cyclecloud-pbspro (azpbs) project. This resource relationship
139-
defined in `autoscale.json`.
140-
129+
This project allows general association of OpenPBS resources with Azure VM resources via the cyclecloud-pbspro (azpbs) project. This resource relationship defined in `autoscale.json`.
141130
The default resources defined with the cluster template we ship with are
142131

143132
```json
@@ -175,13 +164,9 @@ The default resources defined with the cluster template we ship with are
175164
}
176165
```
177166

178-
The OpenPBS resource named `mem` is equated to a node attribute named `node.memory`,
179-
which is the total memory of any virtual machine. This configuration allows `azpbs`
180-
to process a resource request such as `-l mem=4gb` by comparing the value of the
181-
job resource requirements to node resources.
167+
The OpenPBS resource named `mem` is equated to a node attribute named `node.memory`, which is the total memory of any virtual machine. This configuration allows `azpbs` to process a resource request such as `-l mem=4gb` by comparing the value of the job resource requirements to node resources.
182168

183-
Note that disk is currently hardcoded to `size::20g`.
184-
Here is an example of handling VM Size specific disk size
169+
Currently, disk size is hardcoded to `size::20g`. Here's an example of handling VM Size specific disk size
185170
```json
186171
{
187172
"select": {"node.vm_size": "Standard_F2"},
@@ -197,21 +182,14 @@ Here is an example of handling VM Size specific disk size
197182

198183
### Autoscale and Scalesets
199184

200-
CycleCloud treats spanning and serial jobs differently in OpenPBS clusters.
201-
Spanning jobs will land on nodes that are part of the same placement group. The
202-
placement group has a particular platform meaning (VirtualMachineScaleSet with
203-
SinglePlacementGroup=true) and CC will managed a named placement group for each
204-
spanned node set. Use the PBS resource `group_id` for this placement group name.
185+
CycleCloud treats spanning and serial jobs differently in OpenPBS clusters. Spanning jobs land on nodes that are part of the same placement group. The placement group has a particular platform meaning VirtualMachineScaleSet with SinglePlacementGroup=true) and CycleCloud manages a named placement group for each spanned node set. Use the PBS resource `group_id` for this placement group name.
205186

206-
The `hpc` queue appends
207-
the equivalent of `-l place=scatter:group=group_id` by using native queue defaults.
187+
The `hpc` queue appends the equivalent of `-l place=scatter:group=group_id` by using native queue defaults.
208188

209189

210190
### Installing the CycleCloud OpenPBS Agent `azpbs`
211191

212-
The OpenPBS CycleCloud cluster will manage the installation and configuration of
213-
the agent on the server node. The preparation includes setting PBS resources,
214-
queues, and hooks. A scripted install can be done outside of CycleCloud as well.
192+
The OpenPBS CycleCloud cluster manages the installation and configuration of the agent on the server node. The preparation includes setting PBS resources, queues, and hooks. A scripted install can be done outside of CycleCloud as well.
215193

216194
```bash
217195
# Prerequisite: python3, 3.6 or newer, must be installed and in the PATH
@@ -243,7 +221,7 @@ azpbs validate
243221
[!INCLUDE [scheduler-integration](~/articles/cyclecloud/includes/scheduler-integration.md)]
244222

245223
> [!NOTE]
246-
> CycleCloud does not support the bursting configuration with Open PBS.
224+
> CycleCloud doesn't support the bursting configuration with Open PBS.
247225
248226
> [!NOTE]
249-
> Even though Windows is an officially supported Open PBS platform, CycleCloud does not support running Open PBS on Windows at this time.
227+
> Even though Windows is an officially supported Open PBS platform, CycleCloud doesn't support running Open PBS on Windows at this time.

0 commit comments

Comments
 (0)