You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/batch/large-number-tasks.md
+15-8Lines changed: 15 additions & 8 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -13,7 +13,7 @@ ms.devlang: multiple
13
13
ms.topic: article
14
14
ms.tgt_pltfrm:
15
15
ms.workload: big-compute
16
-
ms.date: 08/21/2018
16
+
ms.date: 08/23/2018
17
17
ms.author: danlep
18
18
ms.custom:
19
19
@@ -26,22 +26,30 @@ This article gives guidance and some code examples to efficiently submit large n
26
26
27
27
## Use task collections
28
28
29
-
If you need to create multiple tasks, Batch provides efficient methods to add tasks as a *collection*, as opposed to singly, using the [REST API](/rest/api/batchservice/task/addcollection) or the Batch [.NET](/dotnet/api/microsoft.azure.batch.joboperations.addtaskasync)
30
-
, [Python](/python/api/azure-batch/azure.batch.operations.TaskOperations?view=azure-python#azure_batch_operations_TaskOperations_add_collection), and other APIs. Generally, you construct a task collection by defining tasks as you iterate over some set of input files or parameters for your job.
29
+
The Batch APIs provide efficient methods to add tasks to a job as a *collection*, in addition to one at a time. When adding a large number of tasks, you should use the appropriate methods or overloads to add tasks as a collection. Generally, you construct a task collection by defining tasks as you iterate over a set of input files or parameters for your job.
31
30
32
31
The maximum size of the task collection that you can add in a single call depends on the Batch API you use:
33
32
34
-
* The Batch **REST API** and closely related APIs such as the Python API and Node.js API limit the collection to **100 tasks**. The maximum size of the task collection could be smaller depending on the size of the tasks - for example, the number of resource files and environment variables used in each task.
33
+
* The following Batch APIs limit the collection to **100 tasks**. The limit could be smaller depending on the size of the tasks - for example, if the tasks have a large number of resource files or environment variables.
When using these APIs, you need to provide logic to divide the number of tasks to meet the collection limit, and to handle errors and retries in case addition of tasks fails. If a task collection is too large to add, the request fails with code `RequestBodyTooLarge` and should be retried again with fewer tasks.
37
40
38
-
* The Batch .NET and Java APIs, the Azure CLI with [Batch CLI templates](batch-cli-templates.md), and the [Batch Python SDK extension](https://pypi.org/project/azure-batch-extensions/) support much larger task collections. Testing shows that these APIs support adding hundreds of thousands of tasks to a job in a single call. These APIs transparently handle dividing the task collection into "chunks" for the lower-level APIs and retries if addition of tasks fails.
41
+
* The following APIs support much larger task collections - up to hundreds of thousands of tasks in a single call. These APIs transparently handle dividing the task collection into "chunks" for the lower-level APIs and retries if addition of tasks fails.
It can take some time to add a large collection of tasks to a job - for example, up to 1 minute to add 20,000 tasks via the .NET API. Depending on the Batch API and your workload, you can improve the task throughput by modifying one or more of the following:
43
51
44
-
***Task size** - Adding large tasks takes longer than adding smaller ones. To reduce the size of each task in a collection, you can simplify the task command line, reduce the number of environment variables, or handle task dependencies more efficiently. For example, install task dependencies using a [start task](batch-api-basics.md#start-task) on the pool or using an [application package](batch-application-packages.md) or a [Docker container](batch-docker-container-workloads.md), rather than resource files.
52
+
***Task size** - Adding large tasks takes longer than adding smaller ones. To reduce the size of each task in a collection, you can simplify the task command line, reduce the number of environment variables, or handle task dependencies more efficiently. For example, instead of using a large number of resource files, install task dependencies using a [start task](batch-api-basics.md#start-task) on the pool or use an [application package](batch-application-packages.md) or [Docker container](batch-docker-container-workloads.md).
45
53
46
54
***Number of parallel operations** - Depending on the Batch API, increase throughput by increasing the maximum number of concurrent operations by the Batch client. Configure this setting using the [BatchClientParallelOptions.MaxDegreeOfParallelism](/dotnet/api/microsoft.azure.batch.batchclientparalleloptions.maxdegreeofparallelism) property in the .NET API, or the `threads` parameter of methods such as [TaskOperations.add_collection](/python/api/azure-batch/azure.batch.operations.TaskOperations?view=azure-python#add-collection) in the Batch Python SDK extension. (This property is not available in the native Batch Python SDK.) By default, this property is set to 1, but set it higher to improve throughput of operations. You trade off increased throughput by consuming network bandwidth and some CPU performance. Task throughput increases by up to 100 times the `MaxDegreeOfParallelism` or `threads`. In practice, you should set the number of concurrent operations below 100.
47
55
@@ -201,5 +209,4 @@ except Exception as e:
201
209
## Next steps
202
210
203
211
* Learn more about using the the Azure CLI with [Batch CLI templates](batch-cli-templates.md).
204
-
* Learn more about the [Batch Python SDK extension](https://pypi.org/project/azure-batch-extensions/).
205
-
212
+
* Learn more about the [Batch Python SDK extension](https://pypi.org/project/azure-batch-extensions/).
0 commit comments