Skip to content

Commit 091d025

Browse files
committed
Incorp'd review comments
1 parent 5a27b7c commit 091d025

File tree

2 files changed

+17
-17
lines changed

2 files changed

+17
-17
lines changed

articles/batch/TOC.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -93,7 +93,7 @@
9393
href: batch-task-dependencies.md
9494
- name: User accounts for running tasks
9595
href: batch-user-accounts.md
96-
- name: Submit large numbers of tasks
96+
- name: Submit a large number of tasks
9797
href: large-number-tasks.md
9898
- name: Persist job and task output
9999
href: batch-task-output.md

articles/batch/large-number-tasks.md

Lines changed: 16 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -13,20 +13,20 @@ ms.devlang: multiple
1313
ms.topic: article
1414
ms.tgt_pltfrm:
1515
ms.workload: big-compute
16-
ms.date: 08/23/2018
16+
ms.date: 08/24/2018
1717
ms.author: danlep
1818
ms.custom:
1919

2020
---
21-
# Submit a large number of tasks in a Batch job
21+
# Submit a large number of tasks to a Batch job
2222

23-
When you run large-scale Azure Batch workloads, you might want to submit tens of thousands, hundreds of thousands, or even more tasks in a single job.
23+
When you run large-scale Azure Batch workloads, you might want to submit tens of thousands, hundreds of thousands, or even more tasks to a single job.
2424

25-
This article gives guidance and some code examples to efficiently submit large numbers of tasks to a single Batch job. By submitting tasks in this way, your Batch script or client app can run without waiting a long time for task submission to complete. After tasks are submitted, they enter the Batch queue for processing on the pool you specify for the job.
25+
This article gives guidance and some code examples to submit large numbers of tasks with substantially increased throughput to a single Batch job. After tasks are submitted, they enter the Batch queue for processing on the pool you specify for the job.
2626

2727
## Use task collections
2828

29-
The Batch APIs provide efficient methods to add tasks to a job as a *collection*, in addition to one at a time. When adding a large number of tasks, you should use the appropriate methods or overloads to add tasks as a collection. Generally, you construct a task collection by defining tasks as you iterate over a set of input files or parameters for your job.
29+
The Batch APIs provide methods to efficiently add tasks to a job as a *collection*, in addition to one at a time. When adding a large number of tasks, you should use the appropriate methods or overloads to add tasks as a collection. Generally, you construct a task collection by defining tasks as you iterate over a set of input files or parameters for your job.
3030

3131
The maximum size of the task collection that you can add in a single call depends on the Batch API you use:
3232

@@ -36,26 +36,26 @@ The maximum size of the task collection that you can add in a single call depend
3636
* [Python API](/python/api/azure-batch/azure.batch.operations.TaskOperations?view=azure-python#azure_batch_operations_TaskOperations_add_collection)
3737
* [Node.js API](/javascript/api/azure-batch/task?view=azure-node-latest#addcollection)
3838

39-
When using these APIs, you need to provide logic to divide the number of tasks to meet the collection limit, and to handle errors and retries in case addition of tasks fails. If a task collection is too large to add, the request fails with code `RequestBodyTooLarge` and should be retried again with fewer tasks.
39+
When using these APIs, you need to provide logic to divide the number of tasks to meet the collection limit, and to handle errors and retries in case addition of tasks fails. If a task collection is too large to add, the request generates an error and should be retried again with fewer tasks.
4040

41-
* The following APIs support much larger task collections - up to hundreds of thousands of tasks in a single call. These APIs transparently handle dividing the task collection into "chunks" for the lower-level APIs and retries if addition of tasks fails.
41+
* The following APIs support much larger task collections - limited only by RAM availability on the submitting client. These APIs transparently handle dividing the task collection into "chunks" for the lower-level APIs and retries if addition of tasks fails.
4242

4343
* [.NET API](/dotnet/api/microsoft.azure.batch.cloudjob.addtaskasync?view=azure-dotnet)
4444
* [Java API](/java/api/com.microsoft.azure.batch.protocol._tasks.addcollectionasync?view=azure-java-stable)
45-
* [Azure CLI with Batch CLI templates](batch-cli-templates.md)
45+
* [Azure Batch CLI extension](batch-cli-templates.md) with Batch CLI templates
4646
* [Python SDK extension](https://pypi.org/project/azure-batch-extensions/)
4747

48-
## Increase task throughput
48+
## Increase throughput of task submission
4949

5050
It can take some time to add a large collection of tasks to a job - for example, up to 1 minute to add 20,000 tasks via the .NET API. Depending on the Batch API and your workload, you can improve the task throughput by modifying one or more of the following:
5151

52-
* **Task size** - Adding large tasks takes longer than adding smaller ones. To reduce the size of each task in a collection, you can simplify the task command line, reduce the number of environment variables, or handle task dependencies more efficiently. For example, instead of using a large number of resource files, install task dependencies using a [start task](batch-api-basics.md#start-task) on the pool or use an [application package](batch-application-packages.md) or [Docker container](batch-docker-container-workloads.md).
52+
* **Task size** - Adding large tasks takes longer than adding smaller ones. To reduce the size of each task in a collection, you can simplify the task command line, reduce the number of environment variables, or handle requirements for task execution more efficiently. For example, instead of using a large number of resource files, install task dependencies using a [start task](batch-api-basics.md#start-task) on the pool or use an [application package](batch-application-packages.md) or [Docker container](batch-docker-container-workloads.md).
5353

5454
* **Number of parallel operations** - Depending on the Batch API, increase throughput by increasing the maximum number of concurrent operations by the Batch client. Configure this setting using the [BatchClientParallelOptions.MaxDegreeOfParallelism](/dotnet/api/microsoft.azure.batch.batchclientparalleloptions.maxdegreeofparallelism) property in the .NET API, or the `threads` parameter of methods such as [TaskOperations.add_collection](/python/api/azure-batch/azure.batch.operations.TaskOperations?view=azure-python#add-collection) in the Batch Python SDK extension. (This property is not available in the native Batch Python SDK.) By default, this property is set to 1, but set it higher to improve throughput of operations. You trade off increased throughput by consuming network bandwidth and some CPU performance. Task throughput increases by up to 100 times the `MaxDegreeOfParallelism` or `threads`. In practice, you should set the number of concurrent operations below 100.
5555

56-
The Azure CLI with Batch CLI templates increases the number of concurrent operations automatically based on the pool configuration, but this property is not configurable in the CLI.
56+
The Azure Batch CLI extension with Batch templates increases the number of concurrent operations automatically based on the number of available cores, but this property is not configurable in the CLI.
5757

58-
* **HTTP connection limits** - The number of concurrent HTTP connections can throttle the performance of the Batch client when it is adding large numbers of tasks. The number of HTTP connections is limited with certain APIs. When developing with the .NET API, for example, the [ServicePointManager.DefaultConnectionLimit](/dotnet/api/system.net.servicepointmanager.defaultconnectionlimit) property is set to 2 by default, but you can increase the value.
58+
* **HTTP connection limits** - The number of concurrent HTTP connections can throttle the performance of the Batch client when it is adding large numbers of tasks. The number of HTTP connections is limited with certain APIs. When developing with the .NET API, for example, the [ServicePointManager.DefaultConnectionLimit](/dotnet/api/system.net.servicepointmanager.defaultconnectionlimit) property is set to 2 by default. We recommend that you increase the value to a number close to or greater than the number of parallel operations.
5959

6060
## Example: Batch .NET
6161

@@ -77,13 +77,13 @@ Add a task collection to the job using the appropriate overload of the [AddTaskA
7777
// Add a list of tasks as a collection
7878
List<CloudTask> tasksToAdd = new List<CloudTask>(); // Populate with your tasks
7979
...
80-
await batchClient.JobOperations.AddTaskAsync(jobId, tasksToAdd);
80+
await batchClient.JobOperations.AddTaskAsync(jobId, tasksToAdd, parallelOptions);
8181
```
8282

8383

84-
## Example: Batch CLI template
84+
## Example: Batch CLI extension
8585

86-
Using the Azure CLI with [Batch CLI templates](batch-cli-templates.md), create a job template JSON file that includes a [task factory](https://github.com/Azure/azure-batch-cli-extensions/blob/master/doc/taskFactories.md). The task factory configures a collection of related tasks for a job from a single task definition.
86+
Using the Azure Batch CLI extensions with [Batch CLI templates](batch-cli-templates.md), create a job template JSON file that includes a [task factory](https://github.com/Azure/azure-batch-cli-extensions/blob/master/doc/taskFactories.md). The task factory configures a collection of related tasks for a job from a single task definition.
8787

8888
The following is a sample job template for a one-dimensional parametric sweep job with a large number of tasks - in this case, 250,000. The task command line is a simple `echo` command.
8989

@@ -208,5 +208,5 @@ except Exception as e:
208208

209209
## Next steps
210210

211-
* Learn more about using the the Azure CLI with [Batch CLI templates](batch-cli-templates.md).
211+
* Learn more about using the Azure Batch CLI extension with [Batch CLI templates](batch-cli-templates.md).
212212
* Learn more about the [Batch Python SDK extension](https://pypi.org/project/azure-batch-extensions/).

0 commit comments

Comments
 (0)