Skip to content

Commit d6ba764

Browse files
authored
Merge pull request #107162 from LauraBrenner/laura-batch-pool-node
Laura batch pool node
2 parents 00bc5da + b6d40ac commit d6ba764

File tree

2 files changed

+75
-2
lines changed

2 files changed

+75
-2
lines changed
Lines changed: 60 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,60 @@
1+
---
2+
title: Manage your account- Azure Batch | Microsoft Docs
3+
description: Learn what comprises an Azure Batch account
4+
services: batch
5+
documentationcenter: ''
6+
author: LauraBrenner
7+
manager: evansma
8+
editor: ''
9+
10+
ms.assetid: 3fbae545-245f-4c66-aee2-e25d7d5d36db
11+
ms.service: batch
12+
ms.workload: big-compute
13+
ms.tgt_pltfrm: na
14+
ms.topic: conceptual
15+
ms.date: 03/05/2020
16+
ms.author: labrenne
17+
ms.custom: H1Hack27Feb2017
18+
19+
---
20+
21+
# Manage your Batch account
22+
23+
A Batch account is a uniquely identified entity within the Batch service. All processing is associated with a Batch account.
24+
25+
You can create an Azure Batch account using the [Azure portal](batch-account-create-portal.md) or programmatically, such as with the [Batch Management .NET library](batch-management-dotnet.md). When creating the account, you can associate an Azure storage account for storing job-related input and output data or applications.
26+
27+
You can run multiple Batch workloads in a single Batch account, or distribute your workloads among Batch accounts that are in the same subscription, but in different Azure regions.
28+
29+
## Components of the Batch account
30+
31+
The Batch account enables you to run large-scale parallel and high-performance computing (HPC) batch jobs efficiently in Azure. Within the account you manage:
32+
33+
- the applications you are running
34+
35+
- the allocation of pools and nodes within pools
36+
37+
- the number and types of tasks
38+
39+
- the input and output of data. You don't need to install additional software to manage tasks.
40+
41+
- When you create the Batch account, you are asked to assign a name to it. This name is its ID and once assigned cannot be changed.
42+
43+
- To change the name of an account, you need to delete it and create a new Batch account.
44+
45+
- The account is created within the subscription you want to use.
46+
47+
- Use the account to identify and retrieve primary and secondary account keys from any Batch account within your subscription.
48+
49+
- The account maintains information about pool allocation and core quotas.
50+
51+
- The account contains location information.
52+
53+
- The account identifies your storage account.
54+
55+
## Next steps
56+
57+
- Create a Batch account using the [Azure portal](batch-account-create-portal.md).
58+
- Create a Batch account programmatically, such as with the [Batch Management .NET library](batch-management-dotnet.md).
59+
- [Configure or disable remote access to compute nodes in an Azure Batch pool](pool-endpoint-configuration.md).
60+
- [Run job preparation and job release tasks on Batch compute nodes](batch-job-prep-release.md)

articles/batch/batch-pool-node-error-checking.md

Lines changed: 15 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -99,7 +99,7 @@ Additional examples of causes for **unusable** nodes include:
9999

100100
- A VM is moved because of an infrastructure failure or a low-level upgrade. Batch recovers the node.
101101

102-
- A VM image has been deployed on hardware that doesnt support it. For example, trying to run a CentOS HPC image on a [Standard_D1_v2](../virtual-machines/dv2-dsv2-series.md) VM.
102+
- A VM image has been deployed on hardware that doesn't support it. For example, trying to run a CentOS HPC image on a [Standard_D1_v2](../virtual-machines/dv2-dsv2-series.md) VM.
103103

104104
- The VMs are in an [Azure virtual network](batch-virtual-network.md), and traffic has been blocked to key ports.
105105

@@ -132,8 +132,21 @@ The size of the temporary drive depends on the VM size. One consideration when p
132132

133133
For files written out by each task, a retention time can be specified for each task that determines how long the task files are kept before being automatically cleaned up. The retention time can be reduced to lower the storage requirements.
134134

135-
If temporary disk space does fill, then currently the node will stop running tasks. In the future, a [node error](https://docs.microsoft.com/rest/api/batchservice/computenode/get#computenodeerror) will be reported.
135+
If the temporary disk runs out of space (or is very close to running out of space), the node will move to [Unusable](https://docs.microsoft.com/rest/api/batchservice/computenode/get#computenodestate) state and a node error (use the link already there) will be reported saying that the disk is full.
136136

137+
### What to do when a disk is full
138+
139+
Determine why the disk is full: If you're not sure what is taking up space on the node, it is recommended to remote to the node and investigate manually where the space has gone. You can also make use of the [Batch List Files API](https://docs.microsoft.com/rest/api/batchservice/file/listfromcomputenode) to examine files in Batch managed folders (for example, task outputs). Note that this API only lists files in the Batch managed directories and if your tasks created files elsewhere you will not see them.
140+
141+
Make sure that any data you need has been retrieved from the node or uploaded to a durable store. All mitigation of the disk-full issue involve deleting data to free up space.
142+
143+
### Recovering the node
144+
145+
1. If your pool is a [C.loudServiceConfiguration](https://docs.microsoft.com/rest/api/batchservice/pool/add#cloudserviceconfiguration) pool, you can re-image the node via the [Batch re-image API](https://docs.microsoft.com/rest/api/batchservice/computenode/reimage).This will clean the entire disk. Re-image is not currently supported for [VirtualMachineConfiguration](https://docs.microsoft.com/rest/api/batchservice/pool/add#virtualmachineconfiguration) pools.
146+
147+
2. If your pool is a [VirtualMachineConfiguration](https://docs.microsoft.com/rest/api/batchservice/pool/add#virtualmachineconfiguration), you can remove the node from the pool using the [remove nodes API](https://docs.microsoft.com/rest/api/batchservice/pool/removenodes). Then, you can grow the pool again to replace the bad node with a fresh one.
148+
149+
3. Delete old completed jobs or old completed tasks whose task data is still on the nodes. For a hint at what jobs/tasks data is on the nodes you can look in the [RecentTasks collection](https://docs.microsoft.com/rest/api/batchservice/computenode/get#taskinformation) on the node, or at the [files on the node](https://docs.microsoft.com//rest/api/batchservice/file/listfromcomputenode). Deleting the job will delete all the tasks in the job, and deleting the tasks in the job will trigger data in the task directories on the node to be deleted, thus freeing up space. Once you've freed up enough space, reboot the node and it should move out of "Unusable" state and into "Idle" again.
137150

138151
## Next steps
139152

0 commit comments

Comments
 (0)