Skip to content

Commit 176cd3d

Browse files
authored
Merge pull request #234133 from v-thepet/batch4
Freshness Pass for User Story: 79612 (5)
2 parents 27de991 + 75a61b3 commit 176cd3d

File tree

5 files changed

+63
-50
lines changed

5 files changed

+63
-50
lines changed

articles/batch/batch-job-prep-release.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22
title: Job preparation and release tasks on Batch compute nodes
33
description: Use job-level preparation tasks to minimize data transfer to Azure Batch compute nodes, and release tasks for node cleanup at job completion.
44
ms.topic: how-to
5-
ms.date: 04/06/2023
5+
ms.date: 04/11/2023
66
ms.devlang: csharp
77
ms.custom: "seodec18, devx-track-csharp"
88

@@ -12,7 +12,7 @@ ms.custom: "seodec18, devx-track-csharp"
1212
An Azure Batch job often requires setup before its tasks are executed, and post-job maintenance when its tasks are completed. For example, you might need to download common task input data to your compute nodes, or upload task output data to Azure Storage after the job completes. You can use *job preparation* and *job release* tasks for these operations.
1313

1414
- A job preparation task runs before a job's tasks, on all compute nodes scheduled to run at least one task.
15-
- A job release task runs once the job is completed, on each node in the pool that executed at least one task.
15+
- A job release task runs once the job is completed, on each node in the pool that ran a job preparation task.
1616

1717
As with other Batch tasks, you can specify a command line to invoke when a job preparation or release task runs. Job preparation and release tasks offer familiar Batch task features such as:
1818

@@ -53,7 +53,7 @@ The job preparation task runs only on nodes that are scheduled to run a task. Th
5353
5454
## Job release task
5555

56-
Once you mark a job as completed, the job release task runs on each node in the pool that ran at least one task. You mark a job as completed by issuing a terminate request. This request sets the job state to *terminating*, terminates any active or running tasks associated with the job, and runs the job release task. The job then moves to the *completed* state.
56+
Once you mark a job as completed, the job release task runs on each node in the pool that ran a job preparation task. You mark a job as completed by issuing a terminate request. This request sets the job state to *terminating*, terminates any active or running tasks associated with the job, and runs the job release task. The job then moves to the *completed* state.
5757

5858
> [!NOTE]
5959
> Deleting a job also executes the job release task. However, if a job is already terminated, the release task doesn't run a second time if the job is later deleted.
Lines changed: 60 additions & 47 deletions
Original file line numberDiff line numberDiff line change
@@ -1,88 +1,101 @@
11
---
22
title: Check for job and task errors
3-
description: Learn about errors to check for and how to troubleshoot jobs and tasks.
3+
description: Learn how to check for and handle errors that occur after Azure Batch jobs and tasks are submitted.
44
ms.topic: how-to
5-
ms.date: 09/08/2021
5+
ms.date: 04/11/2023
66
---
77

8-
# Job and task error checking
8+
# Azure Batch job and task errors
99

10-
There are various errors that can occur when adding jobs and tasks. Detecting failures for these operations is straightforward because any failures are returned immediately by the API, CLI, or UI. However, there are also failures that can happen later, when jobs and tasks are scheduled and run.
10+
Various errors can happen when you add, schedule, or run Azure Batch jobs and tasks. It's straightforward to detect errors that occur when you add jobs and tasks. The API, command line, or user interface usually returns any failures immediately. This article covers how to check for and handle errors that occur after jobs and tasks are submitted.
1111

12-
This article covers the errors that can occur after jobs and tasks are submitted and how to check for and handle them.
12+
## Job failures
1313

14-
## Jobs
14+
A job is a group of one or more tasks, which specify command lines to run. You can specify the following optional parameters when you add a job. These parameters influence how the job can fail.
1515

16-
A job is a grouping of one or more tasks, with the tasks actually specifying the command lines to be run.
16+
- [JobConstraints](/rest/api/batchservice/job/add#jobconstraints). You can optionally use the `maxWallClockTime` property to set the maximum amount of time a job can be active or running. If the job exceeds the `maxWallClockTime`, the job terminates with the `terminateReason` property set to `MaxWallClockTimeExpiry` in the [JobExecutionInformation](/rest/api/batchservice/job/get#jobexecutioninformation).
1717

18-
When adding a job, the following parameters can be specified which can influence how the job can fail:
18+
- [JobPreparationTask](/rest/api/batchservice/job/add#jobpreparationtask). You can optionally specify a job preparation task to run on each compute node scheduled to run a job task. The node runs the job preparation task before the first time it runs a task for the job. If the job preparation task fails, the task doesn't run and the job doesn't complete.
1919

20-
- [Job Constraints](/rest/api/batchservice/job/add#jobconstraints)
21-
- The `maxWallClockTime` property can optionally be specified to set the maximum amount of time a job can be active or running. If exceeded, the job will be terminated with the `terminateReason` property set in the [executionInfo](/rest/api/batchservice/job/get#jobexecutioninformation) for the job.
22-
- [Job Preparation Task](/rest/api/batchservice/job/add#jobpreparationtask)
23-
- If specified, a job preparation task is run the first time a task is run for a job on a node. The job preparation task can fail, which will lead to the task not being run and the job not completing.
24-
- [Job Release Task](/rest/api/batchservice/job/add#jobreleasetask)
25-
- A job release task can only be specified if a job preparation task is configured. When a job is being terminated, the job release task is run on the each of pool nodes where a job preparation task was run. A job release task can fail, but the job will still move to a `completed` state.
20+
- [JobReleaseTask](/rest/api/batchservice/job/add#jobreleasetask). You can optionally specify a job release task for jobs that have a job preparation task. When a job is being terminated, the job release task runs on each pool node that ran a job preparation task. If a job release task fails, the job still moves to a `completed` state.
21+
22+
In the Azure portal, you can set these parameters in the **Job manager, preparation and release tasks** and **Advanced** sections of the Batch **Add job** screen.
2623

2724
### Job properties
2825

29-
The following job properties should be checked for errors:
26+
Check the following job properties in the [JobExecutionInformation](/rest/api/batchservice/job/get#jobexecutioninformation) for errors:
27+
28+
- The `terminateReason` property indicates `MaxWallClockTimeExpiry` if the job exceeded the `maxWallClockTime` specified in the job constraints and therefore the job terminated. This property can also be set to `taskFailed` if the job's `onTaskFailure` attribute is set to `performExitOptionsJobAction`, and a task fails with an exit condition that specifies a `jobAction` of `terminatejob`.
3029

31-
- '[executionInfo](/rest/api/batchservice/job/get#jobexecutioninformation)':
32-
- The `terminateReason` property can have values to indicate that the `maxWallClockTime`, specified in the job constraints, was exceeded and therefore the job was terminated. It can also be set to indicate a task failed if the job `onTaskFailure` property was set appropriately.
33-
- The [schedulingError](/rest/api/batchservice/job/get#jobschedulingerror) property is set if there has been a scheduling error.
30+
- The [JobSchedulingError](/rest/api/batchservice/job/get#jobschedulingerror) property is set if there has been a scheduling error.
3431

3532
### Job preparation tasks
3633

37-
If a [job preparation task](batch-job-prep-release.md#job-preparation-task) is specified for a job, then an instance of that task will be run the first time a task for the job is run on a node. The job preparation task configured on the job can be thought of as a task template, with multiple job preparation task instances being run, up to the number of nodes in a pool.
34+
An instance of a [job preparation task](batch-job-prep-release.md#job-preparation-task) runs on each compute node the first time the node runs a task for the job. You can think of the job preparation task as a task template, with multiple instances being run, up to the number of nodes in a pool. Check the job preparation task instances to determine if there were errors.
35+
36+
You can use the [Job - List Preparation and Release Task Status](/rest/api/batchservice/job/listpreparationandreleasetaskstatus) API to list the execution status of all instances of job preparation and release tasks for a specified job. As with other tasks, [JobPreparationTaskExecutionInformation](/rest/api/batchservice/job/listpreparationandreleasetaskstatus#jobpreparationtaskexecutioninformation) is available with properties such as `failureInfo`, `exitCode`, and `result`.
3837

39-
The job preparation task instances should be checked to determine if there were errors:
38+
When a job preparation task runs, the task that triggered the job preparation task moves to a [taskState](/rest/api/batchservice/task/get#taskstate) of `preparing`. If the job preparation task fails, the triggering task reverts to the `active` state and doesn't run.
4039

41-
- When a job preparation task is run, then the task that triggered the job preparation task will move to a [state](/rest/api/batchservice/task/get#taskstate) of `preparing`; if the job preparation task then fails, the triggering task will revert to the `active` state and will not be run.
42-
- All the instances of the job preparation task that have been run can be obtained from the job using the [List Preparation and Release Task Status](/rest/api/batchservice/job/listpreparationandreleasetaskstatus) API. As with any task, there is [execution information](/rest/api/batchservice/job/listpreparationandreleasetaskstatus#jobpreparationandreleasetaskexecutioninformation) available with properties such as `failureInfo`, `exitCode`, and `result`.
43-
- If job preparation tasks fail, then the triggering job tasks will not be run, the job will not complete and will be stuck. The pool may go unutilized if there are no other jobs with tasks that can be scheduled.
40+
If a job preparation task fails, the triggering job task doesn't run. The job doesn't complete and is stuck. If there are no other jobs with tasks that can be scheduled, the pool might not be used.
4441

4542
### Job release tasks
4643

47-
If a [job release task](batch-job-prep-release.md#job-release-task) is specified for a job, then when a job is being terminated, an instance of the job release task is run on each pool node where a job preparation task was run. The job release task instances should be checked to determine if there were errors:
44+
An instance of a [job release task](batch-job-prep-release.md#job-release-task) runs when the job is being terminated on each node that ran a job preparation task. Check the job release task instances to determine if there were errors.
45+
46+
You can use the [Job - List Preparation and Release Task Status](/rest/api/batchservice/job/listpreparationandreleasetaskstatus) API to list the execution status of all instances of job preparation and release tasks for a specified job. As with other tasks, [JobReleaseTaskExecutionInformation](/rest/api/batchservice/job/listpreparationandreleasetaskstatus#jobreleasetaskexecutioninformation) is available with properties such as `failureInfo`, `exitCode`, and `result`.
47+
48+
If one or more job release tasks fail, the job is still terminated and moves to a `completed` state.
49+
50+
## Task failures
51+
52+
Job tasks can fail for the following reasons:
53+
54+
- The task command line fails and returns with a nonzero exit code.
55+
- One or more `resourceFiles` specified for a task don't download.
56+
- One or more `outputFiles` specified for a task don't upload.
57+
- The elapsed time for the task exceeds the `maxWallClockTime` property specified in the [TaskConstraints](/rest/api/batchservice/task/add#taskconstraints).
58+
59+
In all cases, check the following properties for errors and information about the errors:
60+
61+
- The [TaskExecutionInformation](/rest/api/batchservice/task/get#taskexecutioninformation) property has multiple properties that provide information about an error. The [taskExecutionResult](/rest/api/batchservice/task/get#taskexecutionresult) indicates if the task failed for any reason, and `exitCode` and `failureInfo` provide more information about the failure.
62+
63+
- The task always moves to the `completed` [TaskState](/rest/api/batchservice/task/get#taskstate), whether it succeeded or failed.
64+
65+
Consider the impact of task failures on the job and on any task dependencies. You can specify [ExitConditions](/rest/api/batchservice/task/add#exitconditions) to configure actions for dependencies and for the job.
4866

49-
- All the instances of the job release task being run can be obtained from the job using the API [List Preparation and Release Task Status](/rest/api/batchservice/job/listpreparationandreleasetaskstatus). As with any task, there is [execution information](/rest/api/batchservice/job/listpreparationandreleasetaskstatus#jobpreparationandreleasetaskexecutioninformation) available with properties such as `failureInfo`, `exitCode`, and `result`.
50-
- If one or more job release tasks fail, then the job will still be terminated and move to a `completed` state.
67+
- [DependencyAction](/rest/api/batchservice/task/add#dependencyaction) controls whether to block or run tasks that depend on the failed task.
68+
- [JobAction](/rest/api/batchservice/task/add#jobaction) controls whether the failed task causes the job to be disabled, terminated, or unchanged.
5169

52-
## Tasks
70+
### Task command lines
5371

54-
Job tasks can fail for multiple reasons:
72+
Task command lines don't run under a shell on compute nodes, so they can't natively use shell features such as environment variable expansion. To take advantage of such features, you must invoke the shell in the command line. For more information, see [Command-line expansion of environment variables](batch-compute-node-environment-variables.md#command-line-expansion-of-environment-variables).
5573

56-
- The task command line fails, returning with a non-zero exit code.
57-
- There are `resourceFiles` specified for a task, but there was a failure that meant one or more files didn't download.
58-
- There are `outputFiles` specified for a task, but there was a failure that meant one or more files didn't upload.
59-
- The elapsed time for the task, specified by the `maxWallClockTime` property in the task [constraints](/rest/api/batchservice/task/add#taskconstraints), was exceeded.
74+
Task command line output writes to *stderr.txt* and *stdout.txt* files. Your application might also write to application-specific log files. Make sure to implement comprehensive error checking for your application to promptly detect and diagnose issues.
6075

61-
In all cases the following properties must be checked for errors and information about the errors:
76+
### Task logs
6277

63-
- The tasks [executionInfo](/rest/api/batchservice/task/get#taskexecutioninformation) property contains multiple properties that provide information about an error. [result](/rest/api/batchservice/task/get#taskexecutionresult) indicates if the task failed for any reason, with `exitCode` and `failureInfo` providing more information about the failure.
64-
- The task will always move to the `completed` [state](/rest/api/batchservice/task/get#taskstate), independent of whether it succeeded or failed.
78+
If the pool node that ran a task still exists, you can get and view the task log files. Several APIs allow listing and getting task files, such as [File - Get From Task](/rest/api/batchservice/file/getfromtask). You can also list and view log files for a task or node by using the [Azure portal](https://portal.azure.com).
6579

66-
The impact of task failures on the job and any task dependencies must be considered. The [exitConditions](/rest/api/batchservice/task/add#exitconditions) property can be specified for a task to configure an action for dependencies and for the job.
80+
1. At the top of the **Overview** page for a node, select **Upload batch logs**.
6781

68-
- For dependencies, [DependencyAction](/rest/api/batchservice/task/add#dependencyaction) controls whether the tasks dependent on the failed task are blocked or are run.
69-
- For the job, [JobAction](/rest/api/batchservice/task/add#jobaction) controls whether the failed task leads to the job being disabled, terminated, or left unchanged.
82+
![Screenshot of a node overview page with Upload batch logs highlighted.](media/batch-job-task-error-checking/node-page.png)
7083

71-
### Task command line failures
84+
1. On the **Upload Batch logs** page, select **Pick storage container**, select an Azure Storage container to upload to, and then select **Start upload**.
7285

73-
When the task command line is run, output is written to `stderr.txt` and `stdout.txt`. In addition, the application may write to application-specific log files.
86+
![Screenshot of the Upload batch logs page.](media/batch-job-task-error-checking/upload-batch-logs.png)
7487

75-
If the pool node on which a task has run still exists, then the log files can be obtained and viewed. For example, the Azure portal lists and can view log files for a task or a pool node. Multiple APIs also allow task files to be listed and obtained, such as [Get From Task](/rest/api/batchservice/file/getfromtask).
88+
1. You can view, open, or download the logs from the storage container page.
7689

77-
Since pools and pool nodes are frequently ephemeral, with nodes being continuously added and deleted, we recommend saving log files. [Task output files](./batch-task-output-files.md) are a convenient way to save log files to Azure Storage.
90+
![Screenshot of task logs in a storage container.](media/batch-job-task-error-checking/task-logs.png)
7891

79-
The command lines executed by tasks on compute nodes do not run under a shell, so they can't natively take advantage of shell features such as environment variable expansion. To take advantage of such features, you must [invoke the shell in the command line](batch-compute-node-environment-variables.md#command-line-expansion-of-environment-variables).
92+
### Output files
8093

81-
### Output file failures
94+
Because Batch pools and pool nodes are often ephemeral, with nodes being continuously added and deleted, it's best to save the log files when the job runs. Task output files are a convenient way to save log files to Azure Storage. For more information, see [Persist task data to Azure Storage with the Batch service API](batch-task-output-files.md).
8295

83-
On every file upload, Batch writes two log files to the compute node, `fileuploadout.txt` and `fileuploaderr.txt`. You can examine these log files to learn more about a specific failure. In cases where the file upload was never attempted, for example because the task itself couldn't run, then these log files will not exist.
96+
On every file upload, Batch writes two log files to the compute node, *fileuploadout.txt* and *fileuploaderr.txt*. You can examine these log files to learn more about a specific failure. If the file upload wasn't attempted, for example because the task itself couldn't run, these log files don't exist.
8497

8598
## Next steps
8699

87-
- Check that your application implements comprehensive error checking; it can be critical to promptly detect and diagnose issues.
88-
- Learn more about [jobs and tasks](jobs-and-tasks.md) and [job preparation and release tasks](batch-job-prep-release.md).
100+
- Learn more about [Batch jobs and tasks](jobs-and-tasks.md) and [job preparation and release tasks](batch-job-prep-release.md).
101+
- Learn about [Batch pool and node errors](batch-pool-node-error-checking.md).
27.8 KB
Loading
35 KB
Loading
23.6 KB
Loading

0 commit comments

Comments
 (0)