Skip to content

Commit 4f7e6d2

Browse files
committed
edit
1 parent b701461 commit 4f7e6d2

File tree

1 file changed

+15
-8
lines changed

1 file changed

+15
-8
lines changed

articles/batch/large-number-tasks.md

Lines changed: 15 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@ ms.devlang: multiple
1313
ms.topic: article
1414
ms.tgt_pltfrm:
1515
ms.workload: big-compute
16-
ms.date: 08/21/2018
16+
ms.date: 08/23/2018
1717
ms.author: danlep
1818
ms.custom:
1919

@@ -26,22 +26,30 @@ This article gives guidance and some code examples to efficiently submit large n
2626

2727
## Use task collections
2828

29-
If you need to create multiple tasks, Batch provides efficient methods to add tasks as a *collection*, as opposed to singly, using the [REST API](/rest/api/batchservice/task/addcollection) or the Batch [.NET](/dotnet/api/microsoft.azure.batch.joboperations.addtaskasync)
30-
, [Python](/python/api/azure-batch/azure.batch.operations.TaskOperations?view=azure-python#azure_batch_operations_TaskOperations_add_collection), and other APIs. Generally, you construct a task collection by defining tasks as you iterate over some set of input files or parameters for your job.
29+
The Batch APIs provide efficient methods to add tasks to a job as a *collection*, in addition to one at a time. When adding a large number of tasks, you should use the appropriate methods or overloads to add tasks as a collection. Generally, you construct a task collection by defining tasks as you iterate over a set of input files or parameters for your job.
3130

3231
The maximum size of the task collection that you can add in a single call depends on the Batch API you use:
3332

34-
* The Batch **REST API** and closely related APIs such as the Python API and Node.js API limit the collection to **100 tasks**. The maximum size of the task collection could be smaller depending on the size of the tasks - for example, the number of resource files and environment variables used in each task.
33+
* The following Batch APIs limit the collection to **100 tasks**. The limit could be smaller depending on the size of the tasks - for example, if the tasks have a large number of resource files or environment variables.
34+
35+
* [REST API](/rest/api/batchservice/task/addcollection)
36+
* [Python API](/python/api/azure-batch/azure.batch.operations.TaskOperations?view=azure-python#azure_batch_operations_TaskOperations_add_collection)
37+
* [Node.js API](/javascript/api/azure-batch/task?view=azure-node-latest#addcollection)
3538

3639
When using these APIs, you need to provide logic to divide the number of tasks to meet the collection limit, and to handle errors and retries in case addition of tasks fails. If a task collection is too large to add, the request fails with code `RequestBodyTooLarge` and should be retried again with fewer tasks.
3740

38-
* The Batch .NET and Java APIs, the Azure CLI with [Batch CLI templates](batch-cli-templates.md), and the [Batch Python SDK extension](https://pypi.org/project/azure-batch-extensions/) support much larger task collections. Testing shows that these APIs support adding hundreds of thousands of tasks to a job in a single call. These APIs transparently handle dividing the task collection into "chunks" for the lower-level APIs and retries if addition of tasks fails.
41+
* The following APIs support much larger task collections - up to hundreds of thousands of tasks in a single call. These APIs transparently handle dividing the task collection into "chunks" for the lower-level APIs and retries if addition of tasks fails.
42+
43+
* [.NET API](/dotnet/api/microsoft.azure.batch.cloudjob.addtaskasync?view=azure-dotnet)
44+
* [Java API](java/api/com.microsoft.azure.batch.protocol._tasks.addcollectionasync?view=azure-java-stable)
45+
* [Azure CLI with Batch CLI templates](batch-cli-templates.md)
46+
* [Python SDK extension](https://pypi.org/project/azure-batch-extensions/)
3947

4048
## Increase task throughput
4149

4250
It can take some time to add a large collection of tasks to a job - for example, up to 1 minute to add 20,000 tasks via the .NET API. Depending on the Batch API and your workload, you can improve the task throughput by modifying one or more of the following:
4351

44-
* **Task size** - Adding large tasks takes longer than adding smaller ones. To reduce the size of each task in a collection, you can simplify the task command line, reduce the number of environment variables, or handle task dependencies more efficiently. For example, install task dependencies using a [start task](batch-api-basics.md#start-task) on the pool or using an [application package](batch-application-packages.md) or a [Docker container](batch-docker-container-workloads.md), rather than resource files.
52+
* **Task size** - Adding large tasks takes longer than adding smaller ones. To reduce the size of each task in a collection, you can simplify the task command line, reduce the number of environment variables, or handle task dependencies more efficiently. For example, instead of using a large number of resource files, install task dependencies using a [start task](batch-api-basics.md#start-task) on the pool or use an [application package](batch-application-packages.md) or [Docker container](batch-docker-container-workloads.md).
4553

4654
* **Number of parallel operations** - Depending on the Batch API, increase throughput by increasing the maximum number of concurrent operations by the Batch client. Configure this setting using the [BatchClientParallelOptions.MaxDegreeOfParallelism](/dotnet/api/microsoft.azure.batch.batchclientparalleloptions.maxdegreeofparallelism) property in the .NET API, or the `threads` parameter of methods such as [TaskOperations.add_collection](/python/api/azure-batch/azure.batch.operations.TaskOperations?view=azure-python#add-collection) in the Batch Python SDK extension. (This property is not available in the native Batch Python SDK.) By default, this property is set to 1, but set it higher to improve throughput of operations. You trade off increased throughput by consuming network bandwidth and some CPU performance. Task throughput increases by up to 100 times the `MaxDegreeOfParallelism` or `threads`. In practice, you should set the number of concurrent operations below 100.
4755

@@ -201,5 +209,4 @@ except Exception as e:
201209
## Next steps
202210

203211
* Learn more about using the the Azure CLI with [Batch CLI templates](batch-cli-templates.md).
204-
* Learn more about the [Batch Python SDK extension](https://pypi.org/project/azure-batch-extensions/).
205-
212+
* Learn more about the [Batch Python SDK extension](https://pypi.org/project/azure-batch-extensions/).

0 commit comments

Comments
 (0)