You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@@ -113,16 +113,6 @@ You can get the value of these service-defined variables to make adjustments tha
113
113
| Variable | Description |
114
114
| --- | --- |
115
115
| $CPUPercent |The average percentage of CPU usage. |
116
-
| $WallClockSeconds |The number of seconds consumed. Retiring after 2024-Mar-31. |
117
-
| $MemoryBytes |The average number of megabytes used. Retiring after 2024-Mar-31. |
118
-
| $DiskBytes |The average number of gigabytes used on the local disks. Retiring after 2024-Mar-31. |
119
-
| $DiskReadBytes |The number of bytes read. Retiring after 2024-Mar-31. |
120
-
| $DiskWriteBytes |The number of bytes written. Retiring after 2024-Mar-31. |
121
-
| $DiskReadOps |The count of read disk operations performed. Retiring after 2024-Mar-31. |
122
-
| $DiskWriteOps |The count of write disk operations performed. Retiring after 2024-Mar-31. |
123
-
| $NetworkInBytes |The number of inbound bytes. Retiring after 2024-Mar-31. |
124
-
| $NetworkOutBytes |The number of outbound bytes. Retiring after 2024-Mar-31. |
125
-
| $SampleNodeCount |The count of compute nodes. Retiring after 2024-Mar-31. |
126
116
| $ActiveTasks |The number of tasks that are ready to execute but aren't yet executing. This includes all tasks that are in the active state and whose dependencies have been satisfied. Any tasks that are in the active state but whose dependencies haven't been satisfied are excluded from the `$ActiveTasks` count. For a multi-instance task, `$ActiveTasks` includes the number of instances set on the task.|
127
117
| $RunningTasks |The number of tasks in a running state. |
128
118
| $PendingTasks |The sum of `$ActiveTasks` and `$RunningTasks`. |
@@ -239,7 +229,7 @@ You can use both resource and task metrics when you define a formula. You adjust
239
229
240
230
| Metric | Description |
241
231
|----------|--------------|
242
-
| Resource | Resource metrics are based on the CPU, the bandwidth, the memory usage of compute nodes, and the number of nodes.<br><br>These service-defined variables are useful for making adjustments based on node count:<br>- $TargetDedicatedNodes <br>- $TargetLowPriorityNodes <br>- $CurrentDedicatedNodes <br>- $CurrentLowPriorityNodes <br>- $PreemptedNodeCount <br>- $UsableNodeCount <br><br>These service-defined variables are useful for making adjustments based on node resource usage: <br>- $CPUPercent <br>- $WallClockSeconds <br>- $MemoryBytes <br>- $DiskBytes <br>- $DiskReadBytes <br>- $DiskWriteBytes <br>- $DiskReadOps <br>- $DiskWriteOps <br>- $NetworkInBytes <br>- $NetworkOutBytes |
232
+
| Resource | Resource metrics are based on the CPU, the bandwidth, the memory usage of compute nodes, and the number of nodes.<br><br>These service-defined variables are useful for making adjustments based on node count:<br>- $TargetDedicatedNodes <br>- $TargetLowPriorityNodes <br>- $CurrentDedicatedNodes <br>- $CurrentLowPriorityNodes <br>- $PreemptedNodeCount <br>- $UsableNodeCount <br><br>These service-defined variables are useful for making adjustments based on node resource usage: <br>- $CPUPercent |
243
233
| Task | Task metrics are based on the status of tasks, such as Active, Pending, and Completed. The following service-defined variables are useful for making pool-size adjustments based on task metrics: <br>- $ActiveTasks <br>- $RunningTasks <br>- $PendingTasks <br>- $SucceededTasks <br>- $FailedTasks |
Copy file name to clipboardExpand all lines: articles/batch/batch-docker-container-workloads.md
+1-7Lines changed: 1 addition & 7 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -2,16 +2,13 @@
2
2
title: Container workloads on Azure Batch
3
3
description: Learn how to run and scale apps from container images on Azure Batch. Create a pool of compute nodes that support running container tasks.
> This article references CentOS, a Linux distribution that is nearing End Of Life (EOL) status. Please consider your use and planning accordingly. For more information, see the [CentOS End Of Life guidance](~/articles/virtual-machines/workloads/centos/centos-end-of-life.md).
14
-
15
12
Azure Batch lets you run and scale large numbers of batch computing jobs on Azure. Batch tasks can run directly on virtual machines (nodes) in a Batch pool, but you can also set up a Batch pool to run tasks in Docker-compatible containers on the nodes. This article shows you how to create a pool of compute nodes that support running container tasks, and then run container tasks on the pool.
16
13
17
14
The code examples here use the Batch .NET and Python SDKs. You can also use other Batch SDKs and tools, including the Azure portal, to create container-enabled Batch pools and to run container tasks.
@@ -86,8 +83,6 @@ without the need for a custom image.
86
83
Currently there are other images published by `microsoft-azure-batch` that support container workloads:
87
84
88
85
- Publisher: `microsoft-azure-batch`
89
-
- Offer: `centos-container`
90
-
- Offer: `centos-container-rdma` (For use exclusively on VM SKUs with Infiniband)
91
86
- Offer: `ubuntu-server-container`
92
87
- Offer: `ubuntu-server-container-rdma` (For use exclusively on VM SKUs with Infiniband)
93
88
@@ -97,7 +92,6 @@ Currently there are other images published by `microsoft-azure-batch` that suppo
97
92
98
93
#### Notes
99
94
The docker data root of the above images lies in different places:
100
-
- For the Azure Batch published `microsoft-azure-batch` images (Offer: `centos-container-rdma`, etc.), the docker data root is mapped to _/mnt/batch/docker_, which is located on the temporary disk.
101
95
- For the HPC image, or `microsoft-dsvm` (Offer: `ubuntu-hpc`, etc.), the docker data root is unchanged from the Docker default, which is _/var/lib/docker_ on Linux and _C:\ProgramData\Docker_ on Windows. These folders are located on the OS disk.
102
96
103
97
For non-Batch published images, the OS disk has the potential risk of being filled up quickly as container images are downloaded.
Copy file name to clipboardExpand all lines: articles/batch/batch-pool-compute-intensive-sizes.md
+11-33Lines changed: 11 additions & 33 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -3,16 +3,14 @@ title: Use compute-intensive Azure VMs with Batch
3
3
description: How to take advantage of HPC and GPU virtual machine sizes in Azure Batch pools. Learn about OS dependencies and see several scenario examples.
4
4
ms.topic: how-to
5
5
ms.custom: linux-related-content
6
-
ms.date: 05/01/2023
6
+
ms.date: 06/07/2024
7
7
---
8
8
# Use RDMA or GPU instances in Batch pools
9
9
10
-
> [!CAUTION]
11
-
> This article references CentOS, a Linux distribution that is nearing End Of Life (EOL) status. Please consider your use and planning accordingly. For more information, see the [CentOS End Of Life guidance](~/articles/virtual-machines/workloads/centos/centos-end-of-life.md).
12
10
13
11
To run certain Batch jobs, you can take advantage of Azure VM sizes designed for large-scale computation. For example:
14
12
15
-
* To run multi-instance [MPI workloads](batch-mpi.md), choose H-series or other sizes that have a network interface for Remote Direct Memory Access (RDMA). These sizes connect to an InfiniBand network for inter-node communication, which can accelerate MPI applications.
13
+
* To run multi-instance [MPI workloads](batch-mpi.md), choose HB, HC, NC, or ND series or other sizes that have a network interface for Remote Direct Memory Access (RDMA). These sizes connect to an InfiniBand network for inter-node communication, which can accelerate MPI applications.
16
14
17
15
* For CUDA applications, choose N-series sizes that include NVIDIA Tesla graphics processing unit (GPU) cards.
18
16
@@ -27,15 +25,15 @@ This article provides guidance and examples to use some of Azure's specialized s
27
25
28
26
## Dependencies
29
27
30
-
The RDMA or GPU capabilities of compute-intensive sizes in Batch are supported only in certain operating systems. (The list of supported operating systems is a subset of those supported for virtual machines created in these sizes.) Depending on how you create your Batch pool, you might need to install or configure additional driver or other software on the nodes. The following tables summarize these dependencies. See linked articles for details. For options to configure Batch pools, see later in this article.
28
+
The RDMA or GPU capabilities of compute-intensive sizes in Batch are supported only in certain operating systems. The supported operating systems for these VM sizes include only a subset of those available for virtual machine creation. Depending on how you create your Batch pool, you might need to install or configure extra driver or other software on the nodes. The following tables summarize these dependencies. See linked articles for details. For options to configure Batch pools, see later in this article.
31
29
32
30
### Linux pools - Virtual machine configuration
33
31
34
32
| Size | Capability | Operating systems | Required software | Pool settings |
<sup>*</sup>RDMA-capable N-series sizes also include NVIDIA Tesla GPUs
41
39
@@ -71,19 +69,15 @@ To configure a specialized VM size for your Batch pool, you have several options
71
69
72
70
* For pools in the virtual machine configuration, choose a preconfigured [Azure Marketplace](https://azuremarketplace.microsoft.com/marketplace/) VM image that has drivers and software preinstalled. Examples:
73
71
74
-
*[CentOS-based 8.1 HPC](https://azuremarketplace.microsoft.com/marketplace/apps/openlogic.centos-hpc?tab=Overview) - includes RDMA drivers and Intel MPI 5.1
72
+
*[Data Science Virtual Machine](../machine-learning/data-science-virtual-machine/overview.md) for Linux or Windows - includes NVIDIA CUDA drivers
75
73
76
-
*[Data Science Virtual Machine](../machine-learning/data-science-virtual-machine/overview.md) for Linux or Windows - includes NVIDIA CUDA drivers
74
+
* Linux images for Batch container workloads that also include GPU and RDMA drivers:
77
75
78
-
* Linux images for Batch container workloads that also include GPU and RDMA drivers:
76
+
*[Ubuntu Server (with GPU and RDMA drivers) for Azure Batch container pools](https://azuremarketplace.microsoft.com/marketplace/apps/microsoft-azure-batch.ubuntu-server-container-rdma?tab=Overview)
79
77
80
-
*[CentOS (with GPU and RDMA drivers) for Azure Batch container pools](https://azuremarketplace.microsoft.com/marketplace/apps/microsoft-azure-batch.centos-container-rdma?tab=Overview)
78
+
* Create a [custom Windows or Linux VM image](batch-sig-images.md)with installed drivers, software, or other settings required for the VM size.
81
79
82
-
*[Ubuntu Server (with GPU and RDMA drivers) for Azure Batch container pools](https://azuremarketplace.microsoft.com/marketplace/apps/microsoft-azure-batch.ubuntu-server-container-rdma?tab=Overview)
83
-
84
-
* Create a [custom Windows or Linux VM image](batch-sig-images.md) on which you have installed drivers, software, or other settings required for the VM size.
85
-
86
-
* Create a Batch [application package](batch-application-packages.md) from a zipped driver or application installer, and configure Batch to deploy the package to pool nodes and install once when each node is created. For example, if the application package is an installer, create a [start task](jobs-and-tasks.md#start-task) command line to silently install the app on all pool nodes. Consider using an application package and a pool start task if your workload depends on a particular driver version.
80
+
* Create a Batch [application package](batch-application-packages.md) from a zipped driver or application installer. Then, configure Batch to deploy this package to pool nodes and install once when each node is created. For example, if the application package is an installer, create a [start task](jobs-and-tasks.md#start-task) command line to silently install the app on all pool nodes. Consider using an application package and a pool start task if your workload depends on a particular driver version.
87
81
88
82
> [!NOTE]
89
83
> The start task must run with elevated (admin) permissions, and it must wait for success. Long-running tasks will increase the time to provision a Batch pool.
@@ -145,22 +139,6 @@ To run Windows MPI applications on a pool of Azure H16r VM nodes, you need to co
145
139
|**Internode communication enabled**| True |
146
140
|**Max tasks per node**| 1 |
147
141
148
-
## Example: Intel MPI on a Linux H16r VM pool
149
-
150
-
To run MPI applications on a pool of Linux HB-series nodes, one option is to use the [CentOS-based 8.1 HPC](https://azuremarketplace.microsoft.com/marketplace/apps/openlogic.centos-hpc?tab=Overview) image from the Azure Marketplace. Linux RDMA drivers and Intel MPI are preinstalled. This image also supports Docker container workloads.
151
-
152
-
Using the Batch APIs or Azure portal, create a pool using this image and with the desired number of nodes and scale. The following table shows sample pool settings:
153
-
154
-
| Setting | Value |
155
-
| ---- | ---- |
156
-
|**Image Type**| Marketplace (Linux/Windows) |
157
-
|**Publisher**| OpenLogic |
158
-
|**Offer**| CentOS-HPC |
159
-
|**Sku**| 8.1 |
160
-
|**Node size**| H16r Standard |
161
-
|**Internode communication enabled**| True |
162
-
|**Max tasks per node**| 1 |
163
-
164
142
## Next steps
165
143
166
144
* To run MPI jobs on an Azure Batch pool, see the [Windows](batch-mpi.md) or [Linux](/archive/blogs/windowshpc/introducing-mpi-support-for-linux-on-azure-batch) examples.
Copy file name to clipboardExpand all lines: articles/batch/batch-pool-node-error-checking.md
+2-5Lines changed: 2 additions & 5 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,15 +1,12 @@
1
1
---
2
2
title: Pool and node errors
3
3
description: Learn about background operations, errors to check for, and how to avoid errors when you create Azure Batch pools and nodes.
4
-
ms.date: 04/11/2023
4
+
ms.date: 06/10/2024
5
5
ms.topic: how-to
6
6
---
7
7
8
8
# Azure Batch pool and node errors
9
9
10
-
> [!CAUTION]
11
-
> This article references CentOS, a Linux distribution that is nearing End Of Life (EOL) status. Please consider your use and planning accordingly. For more information, see the [CentOS End Of Life guidance](~/articles/virtual-machines/workloads/centos/centos-end-of-life.md).
12
-
13
10
Some Azure Batch pool creation and management operations happen immediately. Detecting failures for these operations is straightforward, because errors usually return immediately from the API, command line, or user interface. However, some operations are asynchronous, run in the background, and take several minutes to complete. This article describes ways to detect and avoid failures that can occur in the background operations for pools and nodes.
14
11
15
12
Make sure to set your applications to implement comprehensive error checking, especially for asynchronous operations. Comprehensive error checking can help you promptly identify and diagnose issues.
@@ -106,7 +103,7 @@ Other reasons for `unusable` nodes might include the following causes:
106
103
107
104
- A custom VM image is invalid. For example, the image isn't properly prepared.
108
105
- A VM is moved because of an infrastructure failure or a low-level upgrade. Batch recovers the node.
109
-
- A VM image has been deployed on hardware that doesn't support it. For example, a CentOS HPC image is deployed on a [Standard_D1_v2](/azure/virtual-machines/dv2-dsv2-series) VM.
106
+
- A VM image has been deployed on hardware that doesn't support it.
110
107
- The VMs are in an [Azure virtual network](batch-virtual-network.md), and traffic has been blocked to key ports.
111
108
- The VMs are in a virtual network, but outbound traffic to Azure Storage is blocked.
112
109
- The VMs are in a virtual network with a custom DNS configuration, and the DNS server can't resolve Azure storage.
Copy file name to clipboardExpand all lines: articles/batch/batch-rendering-functionality.md
+1-4Lines changed: 1 addition & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,15 +1,12 @@
1
1
---
2
2
title: Rendering capabilities
3
3
description: Standard Azure Batch capabilities are used to run rendering workloads and apps. Batch includes specific features to support rendering workloads.
4
-
ms.date: 02/28/2024
4
+
ms.date: 06/10/2024
5
5
ms.topic: how-to
6
6
---
7
7
8
8
# Azure Batch rendering capabilities
9
9
10
-
> [!CAUTION]
11
-
> This article references CentOS, a Linux distribution that is nearing End Of Life (EOL) status. Please consider your use and planning accordingly. For more information, see the [CentOS End Of Life guidance](~/articles/virtual-machines/workloads/centos/centos-end-of-life.md).
12
-
13
10
Standard Azure Batch capabilities are used to run rendering workloads and applications. Batch also includes specific features to support rendering workloads.
14
11
15
12
For an overview of Batch concepts, including pools, jobs, and tasks, see [this article](./batch-service-workflow-features.md).
0 commit comments