You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/batch/batch-pool-node-error-checking.md
+13-12Lines changed: 13 additions & 12 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -59,17 +59,17 @@ Batch sets the [pool state](https://docs.microsoft.com/rest/api/batchservice/poo
59
59
60
60
## Pool compute node errors
61
61
62
-
Even when Batch successfully allocates nodes in a pool, various issues can cause some of the nodes to be unhealthy and unable to run tasks. These nodes still incur charges, so it's important to detect problems to avoid paying for nodes that can't be used. In addition to common node errors, knowing the current [job state](https://docs.microsoft.com/rest/api/batchservice/job/get#jobstate) is useful for troubleshooting.
62
+
Even when Batch successfully allocates nodes in a pool, various issues can cause some of the nodes to be unhealthy and unable to run tasks. These nodes still incur charges, so it's important to detect problems to avoid paying for nodes that can't be used. In addition to common node errors, knowing the current [job state](/rest/api/batchservice/job/get#jobstate) is useful for troubleshooting.
63
63
64
64
### Start task failures
65
65
66
-
You might want to specify an optional [start task](https://docs.microsoft.com/rest/api/batchservice/pool/add#starttask) for a pool. As with any task, you can use a command line and resource files to download from storage. The start task is run for each node after it's been started. The **waitForSuccess** property specifies whether Batch waits until the start task completes successfully before it schedules any tasks to a node.
66
+
You might want to specify an optional [start task](/rest/api/batchservice/pool/add#starttask) for a pool. As with any task, you can use a command line and resource files to download from storage. The start task is run for each node after it's been started. The **waitForSuccess** property specifies whether Batch waits until the start task completes successfully before it schedules any tasks to a node.
67
67
68
68
What if you've configured the node to wait for successful start task completion, but the start task fails? In that case, the node will not be usable, but will still incur charges.
69
69
70
-
You can detect start task failures by using the [result](https://docs.microsoft.com/rest/api/batchservice/computenode/get#taskexecutionresult) and [failureInfo](https://docs.microsoft.com/rest/api/batchservice/computenode/get#taskfailureinformation) properties of the top-level [startTaskInfo](https://docs.microsoft.com/rest/api/batchservice/computenode/get#starttaskinformation) node property.
70
+
You can detect start task failures by using the [result](/rest/api/batchservice/computenode/get#taskexecutionresult) and [failureInfo](/rest/api/batchservice/computenode/get#taskfailureinformation) properties of the top-level [startTaskInfo](/rest/api/batchservice/computenode/get#starttaskinformation) node property.
71
71
72
-
A failed start task also causes Batch to set the node [state](https://docs.microsoft.com/rest/api/batchservice/computenode/get#computenodestate) to **starttaskfailed** if **waitForSuccess** was set to **true**.
72
+
A failed start task also causes Batch to set the node [state](/rest/api/batchservice/computenode/get#computenodestate) to **starttaskfailed** if **waitForSuccess** was set to **true**.
73
73
74
74
As with any task, there can be many causes for the start task failing. To troubleshoot, check the stdout, stderr, and any further task-specific log files.
75
75
@@ -79,19 +79,19 @@ Start tasks must be re-entrant, as it is possible the start task is run multiple
79
79
80
80
You can specify one or more application packages for a pool. Batch downloads the specified package files to each node and uncompresses the files after the node has started, but before tasks are scheduled. It's common to use a start task command line in conjunction with application packages. For example, to copy files to a different location or to run setup.
81
81
82
-
The node [errors](https://docs.microsoft.com/rest/api/batchservice/computenode/get#computenodeerror) property reports a failure to download and un-compress an application package; the node state is set to **unusable**.
82
+
The node [errors](/rest/api/batchservice/computenode/get#computenodeerror) property reports a failure to download and un-compress an application package; the node state is set to **unusable**.
83
83
84
84
### Container download failure
85
85
86
-
You can specify one or more container references on a pool. Batch downloads the specified containers to each node. The node [errors](https://docs.microsoft.com/rest/api/batchservice/computenode/get#computenodeerror) property reports a failure to download a container and sets the node state to **unusable**.
86
+
You can specify one or more container references on a pool. Batch downloads the specified containers to each node. The node [errors](/rest/api/batchservice/computenode/get#computenodeerror) property reports a failure to download a container and sets the node state to **unusable**.
87
87
88
88
### Node in unusable state
89
89
90
-
Azure Batch might set the [node state](https://docs.microsoft.com/rest/api/batchservice/computenode/get#computenodestate) to **unusable** for many reasons. With the node state set to **unusable**, tasks can't be scheduled to the node, but it still incurs charges.
90
+
Azure Batch might set the [node state](/rest/api/batchservice/computenode/get#computenodestate) to **unusable** for many reasons. With the node state set to **unusable**, tasks can't be scheduled to the node, but it still incurs charges.
91
91
92
-
Nodes in an **unusable** state, but without [errors](https://docs.microsoft.com/rest/api/batchservice/computenode/get#computenodeerror) means that Batch is unable to communicate with the VM. In this case, Batch always tries to recover the VM. Batch will not automatically attempt to recover VMs that failed to install application packages or containers even though their state is **unusable**.
92
+
Nodes in an **unusable** state, but without [errors](/rest/api/batchservice/computenode/get#computenodeerror) means that Batch is unable to communicate with the VM. In this case, Batch always tries to recover the VM. Batch will not automatically attempt to recover VMs that failed to install application packages or containers even though their state is **unusable**.
93
93
94
-
If Batch can determine the cause, the node [errors](https://docs.microsoft.com/rest/api/batchservice/computenode/get#computenodeerror) property reports it.
94
+
If Batch can determine the cause, the node [errors](/rest/api/batchservice/computenode/get#computenodeerror) property reports it.
95
95
96
96
Additional examples of causes for **unusable** nodes include:
97
97
@@ -109,7 +109,7 @@ Additional examples of causes for **unusable** nodes include:
109
109
110
110
### Node agent log files
111
111
112
-
The Batch agent process that runs on each pool node can provide log files that might be helpful if you need to contact support about a pool node issue. Log files for a node can be uploaded via the Azure portal, Batch Explorer, or an [API](https://docs.microsoft.com/rest/api/batchservice/computenode/uploadbatchservicelogs). It's useful to upload and save the log files. Afterward, you can delete the node or pool to save the cost of the running nodes.
112
+
The Batch agent process that runs on each pool node can provide log files that might be helpful if you need to contact support about a pool node issue. Log files for a node can be uploaded via the Azure portal, Batch Explorer, or an [API](/rest/api/batchservice/computenode/uploadbatchservicelogs). It's useful to upload and save the log files. Afterward, you can delete the node or pool to save the cost of the running nodes.
113
113
114
114
### Node disk full
115
115
@@ -128,11 +128,12 @@ Other files are written out for each task that is run on a node, such as stdout
128
128
The size of the temporary drive depends on the VM size. One consideration when picking a VM size is to ensure the temporary drive has enough space.
129
129
130
130
- In the Azure portal when adding a pool, the full list of VM sizes can be displayed and there is a 'Resource Disk Size' column.
131
-
- The articles describing all VM sizes have tables with a 'Temp Storage' column; for example [Compute Optimized VM sizes](https://docs.microsoft.com/azure/virtual-machines/windows/sizes-compute)
131
+
- The articles describing all VM sizes have tables with a 'Temp Storage' column; for example [Compute Optimized VM sizes](/azure/virtual-machines/windows/sizes-compute)
132
132
133
133
For files written out by each task, a retention time can be specified for each task that determines how long the task files are kept before being automatically cleaned up. The retention time can be reduced to lower the storage requirements.
134
134
135
-
If the temporary disk runs out of space (or is very close to running out of space), the node will move to [Unusable](https://docs.microsoft.com/rest/api/batchservice/computenode/get#computenodestate) state and a node error (use the link already there) will be reported saying that the disk is full.
135
+
136
+
If the temporary disk runs out of space (or is very close to running out of space), the node will move to [Unusable](/rest/api/batchservice/computenode/get#computenodestate) state and a node error will be reported saying that the disk is full.
0 commit comments