You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/batch/large-number-tasks.md
+16-16Lines changed: 16 additions & 16 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -13,20 +13,20 @@ ms.devlang: multiple
13
13
ms.topic: article
14
14
ms.tgt_pltfrm:
15
15
ms.workload: big-compute
16
-
ms.date: 08/23/2018
16
+
ms.date: 08/24/2018
17
17
ms.author: danlep
18
18
ms.custom:
19
19
20
20
---
21
-
# Submit a large number of tasks in a Batch job
21
+
# Submit a large number of tasks to a Batch job
22
22
23
-
When you run large-scale Azure Batch workloads, you might want to submit tens of thousands, hundreds of thousands, or even more tasks in a single job.
23
+
When you run large-scale Azure Batch workloads, you might want to submit tens of thousands, hundreds of thousands, or even more tasks to a single job.
24
24
25
-
This article gives guidance and some code examples to efficiently submit large numbers of tasks to a single Batch job. By submitting tasks in this way, your Batch script or client app can run without waiting a long time for task submission to complete. After tasks are submitted, they enter the Batch queue for processing on the pool you specify for the job.
25
+
This article gives guidance and some code examples to submit large numbers of tasks with substantially increased throughput to a single Batch job. After tasks are submitted, they enter the Batch queue for processing on the pool you specify for the job.
26
26
27
27
## Use task collections
28
28
29
-
The Batch APIs provide efficient methods to add tasks to a job as a *collection*, in addition to one at a time. When adding a large number of tasks, you should use the appropriate methods or overloads to add tasks as a collection. Generally, you construct a task collection by defining tasks as you iterate over a set of input files or parameters for your job.
29
+
The Batch APIs provide methods to efficiently add tasks to a job as a *collection*, in addition to one at a time. When adding a large number of tasks, you should use the appropriate methods or overloads to add tasks as a collection. Generally, you construct a task collection by defining tasks as you iterate over a set of input files or parameters for your job.
30
30
31
31
The maximum size of the task collection that you can add in a single call depends on the Batch API you use:
32
32
@@ -36,26 +36,26 @@ The maximum size of the task collection that you can add in a single call depend
When using these APIs, you need to provide logic to divide the number of tasks to meet the collection limit, and to handle errors and retries in case addition of tasks fails. If a task collection is too large to add, the request fails with code `RequestBodyTooLarge` and should be retried again with fewer tasks.
39
+
When using these APIs, you need to provide logic to divide the number of tasks to meet the collection limit, and to handle errors and retries in case addition of tasks fails. If a task collection is too large to add, the request generates an error and should be retried again with fewer tasks.
40
40
41
-
* The following APIs support much larger task collections - up to hundreds of thousands of tasks in a single call. These APIs transparently handle dividing the task collection into "chunks" for the lower-level APIs and retries if addition of tasks fails.
41
+
* The following APIs support much larger task collections - limited only by RAM availability on the submitting client. These APIs transparently handle dividing the task collection into "chunks" for the lower-level APIs and retries if addition of tasks fails.
It can take some time to add a large collection of tasks to a job - for example, up to 1 minute to add 20,000 tasks via the .NET API. Depending on the Batch API and your workload, you can improve the task throughput by modifying one or more of the following:
51
51
52
-
***Task size** - Adding large tasks takes longer than adding smaller ones. To reduce the size of each task in a collection, you can simplify the task command line, reduce the number of environment variables, or handle task dependencies more efficiently. For example, instead of using a large number of resource files, install task dependencies using a [start task](batch-api-basics.md#start-task) on the pool or use an [application package](batch-application-packages.md) or [Docker container](batch-docker-container-workloads.md).
52
+
***Task size** - Adding large tasks takes longer than adding smaller ones. To reduce the size of each task in a collection, you can simplify the task command line, reduce the number of environment variables, or handle requirements for task execution more efficiently. For example, instead of using a large number of resource files, install task dependencies using a [start task](batch-api-basics.md#start-task) on the pool or use an [application package](batch-application-packages.md) or [Docker container](batch-docker-container-workloads.md).
53
53
54
54
***Number of parallel operations** - Depending on the Batch API, increase throughput by increasing the maximum number of concurrent operations by the Batch client. Configure this setting using the [BatchClientParallelOptions.MaxDegreeOfParallelism](/dotnet/api/microsoft.azure.batch.batchclientparalleloptions.maxdegreeofparallelism) property in the .NET API, or the `threads` parameter of methods such as [TaskOperations.add_collection](/python/api/azure-batch/azure.batch.operations.TaskOperations?view=azure-python#add-collection) in the Batch Python SDK extension. (This property is not available in the native Batch Python SDK.) By default, this property is set to 1, but set it higher to improve throughput of operations. You trade off increased throughput by consuming network bandwidth and some CPU performance. Task throughput increases by up to 100 times the `MaxDegreeOfParallelism` or `threads`. In practice, you should set the number of concurrent operations below 100.
55
55
56
-
The Azure CLI with Batch CLI templates increases the number of concurrent operations automatically based on the pool configuration, but this property is not configurable in the CLI.
56
+
The Azure Batch CLI extension with Batch templates increases the number of concurrent operations automatically based on the number of available cores, but this property is not configurable in the CLI.
57
57
58
-
***HTTP connection limits** - The number of concurrent HTTP connections can throttle the performance of the Batch client when it is adding large numbers of tasks. The number of HTTP connections is limited with certain APIs. When developing with the .NET API, for example, the [ServicePointManager.DefaultConnectionLimit](/dotnet/api/system.net.servicepointmanager.defaultconnectionlimit) property is set to 2 by default, but you can increase the value.
58
+
***HTTP connection limits** - The number of concurrent HTTP connections can throttle the performance of the Batch client when it is adding large numbers of tasks. The number of HTTP connections is limited with certain APIs. When developing with the .NET API, for example, the [ServicePointManager.DefaultConnectionLimit](/dotnet/api/system.net.servicepointmanager.defaultconnectionlimit) property is set to 2 by default. We recommend that you increase the value to a number close to or greater than the number of parallel operations.
59
59
60
60
## Example: Batch .NET
61
61
@@ -77,13 +77,13 @@ Add a task collection to the job using the appropriate overload of the [AddTaskA
77
77
// Add a list of tasks as a collection
78
78
List<CloudTask>tasksToAdd=newList<CloudTask>(); // Populate with your tasks
Using the Azure CLI with [Batch CLI templates](batch-cli-templates.md), create a job template JSON file that includes a [task factory](https://github.com/Azure/azure-batch-cli-extensions/blob/master/doc/taskFactories.md). The task factory configures a collection of related tasks for a job from a single task definition.
86
+
Using the Azure Batch CLI extensions with [Batch CLI templates](batch-cli-templates.md), create a job template JSON file that includes a [task factory](https://github.com/Azure/azure-batch-cli-extensions/blob/master/doc/taskFactories.md). The task factory configures a collection of related tasks for a job from a single task definition.
87
87
88
88
The following is a sample job template for a one-dimensional parametric sweep job with a large number of tasks - in this case, 250,000. The task command line is a simple `echo` command.
89
89
@@ -208,5 +208,5 @@ except Exception as e:
208
208
209
209
## Next steps
210
210
211
-
* Learn more about using the the Azure CLI with [Batch CLI templates](batch-cli-templates.md).
211
+
* Learn more about using the Azure Batch CLI extension with [Batch CLI templates](batch-cli-templates.md).
212
212
* Learn more about the [Batch Python SDK extension](https://pypi.org/project/azure-batch-extensions/).
0 commit comments