|
| 1 | +--- |
| 2 | +title: Check for job and task errors - Azure Batch | Microsoft Docs |
| 3 | +description: Errors to check for and troubleshooting jobs and tasks |
| 4 | +services: batch |
| 5 | +author: mscurrell |
| 6 | + |
| 7 | +ms.service: batch |
| 8 | +ms.topic: article |
| 9 | +ms.date: 12/01/2019 |
| 10 | +ms.author: markscu |
| 11 | +--- |
| 12 | + |
| 13 | +# Job and task error checking |
| 14 | + |
| 15 | +There are various errors that can occur when adding jobs and tasks. Detecting failures for these operations is straightforward because any failures are returned immediately by the API, CLI, or UI. However, there are failures that can happen later when jobs and tasks are scheduled and run. |
| 16 | + |
| 17 | +This article covers the errors that can occur after jobs and tasks are submitted. It lists and explains the errors that need to be checked and handled. |
| 18 | + |
| 19 | +## Jobs |
| 20 | + |
| 21 | +A job is a grouping of one or more tasks, the tasks actually specifying the command lines to be run. |
| 22 | + |
| 23 | +When adding a job, the following parameters can be specified which can influence how the job can fail: |
| 24 | + |
| 25 | +- [Job Constraints](https://docs.microsoft.com/rest/api/batchservice/job/add#jobconstraints) |
| 26 | + - The `maxWallClockTime` property can optionally be specified to set the maximum amount of time a job can be active or running. If exceeded, the job will be terminated with the `terminateReason` property set in the [executionInfo](https://docs.microsoft.com/rest/api/batchservice/job/get#cloudjob) for the job. |
| 27 | +- [Job Preparation Task](https://docs.microsoft.com/rest/api/batchservice/job/add#jobpreparationtask) |
| 28 | + - If specified, a job preparation task is run the first time a task is run for a job on a node. The job preparation task can fail, which will lead to the task not being run and the job not completing. |
| 29 | +- [Job Release Task](https://docs.microsoft.com/rest/api/batchservice/job/add#jobreleasetask) |
| 30 | + - A job release task can only be specified if a job preparation task is configured. When a job is being terminated, the job release task is run on the each of pool nodes where a job preparation task was run. A job release task can fail, but the job will still move to a `completed` state. |
| 31 | + |
| 32 | +### Job properties |
| 33 | + |
| 34 | +The following job properties should be checked for errors: |
| 35 | + |
| 36 | +- '[executionInfo](https://docs.microsoft.com/rest/api/batchservice/job/get#jobexecutioninformation)': |
| 37 | + - The `terminateReason` property can have values to indicate that the `maxWallClockTime`, specified in the job constraints, was exceeded and therefore the job was terminated. It can also be set to indicate a task failed if the job `onTaskFailure` property was set appropriately. |
| 38 | + - The [schedulingError](https://docs.microsoft.com/rest/api/batchservice/job/get#jobschedulingerror) property is set if there has been a scheduling error. |
| 39 | + |
| 40 | +### Job preparation tasks |
| 41 | + |
| 42 | +If a job preparation task is specified for a job, then an instance of that task will be run the first time a task for the job is run on a node. The job preparation task configured on the job can be thought of as a task template, with multiple job preparation task instances being run, up to the number of nodes in a pool. |
| 43 | + |
| 44 | +The job preparation task instances should be checked to determine if there were errors: |
| 45 | +- When a job preparation task is run, then the task that triggered the job preparation task will move to a [state](https://docs.microsoft.com/rest/api/batchservice/task/get#taskstate) of `preparing`; if the job preparation task then fails, the triggering task will revert to the `active` state and will not be run. |
| 46 | +- All the instances of the job preparation task that have been run can be obtained from the job using the [List Preparation and Release Task Status](https://docs.microsoft.com/rest/api/batchservice/job/listpreparationandreleasetaskstatus) API. As with any task, there is [execution information](https://docs.microsoft.com/rest/api/batchservice/job/listpreparationandreleasetaskstatus#jobpreparationandreleasetaskexecutioninformation) available with properties such as `failureInfo`, `exitCode`, and `result`. |
| 47 | +- If job preparation tasks fail, then the triggering job tasks will not be run, the job will not complete and will be stuck. The pool may go unutilized if there are no other jobs with tasks that can be scheduled. |
| 48 | + |
| 49 | +### Job release tasks |
| 50 | + |
| 51 | +If a job release task is specified for a job, then when a job is being terminated an instance of the job release task is run on each of the pool nodes where a job preparation task was run. The job release task instances should be checked to determine if there were errors: |
| 52 | +- All the instances of the job release task being run can be obtained from the job using the API [List Preparation and Release Task Status](https://docs.microsoft.com/rest/api/batchservice/job/listpreparationandreleasetaskstatus). As with any task, there is [execution information](https://docs.microsoft.com/rest/api/batchservice/job/listpreparationandreleasetaskstatus#jobpreparationandreleasetaskexecutioninformation) available with properties such as `failureInfo`, `exitCode`, and `result`. |
| 53 | +- If one or more job release tasks fail, then the job will still be terminated and move to a `completed` state. |
| 54 | + |
| 55 | +## Tasks |
| 56 | + |
| 57 | +Job tasks can fail for multiple reasons: |
| 58 | + |
| 59 | +- The task command line fails, returning with a non-zero exit code. |
| 60 | +- There are `resourceFiles` specified for a task, but there was a failure that meant one or more files did not download. |
| 61 | +- There are `outputFiles` specified for a task, but there was a failure that meant one or more files did not upload. |
| 62 | +- The elapsed time for the task, specified by the `maxWallClockTime` property in the task [constraints](https://docs.microsoft.com/rest/api/batchservice/task/add#taskconstraints), was exceeded. |
| 63 | + |
| 64 | +In all cases the following properties must be checked for errors and information about the errors: |
| 65 | +- The tasks [executionInfo](https://docs.microsoft.com/rest/api/batchservice/task/get#taskexecutioninformation) property contains multiple properties that provide information about an error. [result](https://docs.microsoft.com/rest/api/batchservice/task/get#taskexecutionresult) indicates if the task failed for any reason, with `exitCode` and `failureInfo` providing more information about the failure. |
| 66 | +- The task will always move to the `completed` [state](https://docs.microsoft.com/rest/api/batchservice/task/get#taskstate), independent of whether it succeeded or failed. |
| 67 | + |
| 68 | +The impact of task failures on the job and any task dependencies must be considered. The [exitConditions](https://docs.microsoft.com/rest/api/batchservice/task/add#exitconditions) property can be specified for a task to configure an action for dependencies and for the job. |
| 69 | +- For dependencies, [DependencyAction](https://docs.microsoft.com/rest/api/batchservice/task/add#dependencyaction) controls whether the tasks dependent on the failed task are blocked or are run. |
| 70 | +- For the job, [JobAction](https://docs.microsoft.com/rest/api/batchservice/task/add#jobaction) controls whether the failed task leads to the job being disabled, terminated, or left unchanged. |
| 71 | + |
| 72 | +## Next steps |
| 73 | + |
| 74 | +Check that your application implements comprehensive error checking; it can be critical to promptly detect and diagnose issues. |
0 commit comments