You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: doc/taskFactories.md
+115Lines changed: 115 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -188,6 +188,64 @@ Where the following tasks would be created:
188
188
]
189
189
```
190
190
191
+
A `mergeTask` may also be specified to run after all tasks in the parametric sweep factory. This allows for map-reduce patterns where one task does parsing or combining of output upon completion of
192
+
all tasks in the task factory. Similar to the `repeatTask` both the `id` and `dependsOn` properties are not allowed, as these are auto-populated with an `id` of 'merge' and the dependant tasks
193
+
automatically.
194
+
195
+
**Note:** It is not advised to use file groups as inputs to a `mergeTask`. File groups are expanded at the time of adding a task to the job and as such, will not contain output generated from dependant
196
+
tasks. If you desire to output data from dependant tasks to the `mergeTask` please look into using either `autoStorageContainerName` or `containerUrl` REST API properties.
197
+
198
+
A basic example of using mergeTask:
199
+
```json
200
+
"job": {
201
+
"type": "Microsoft.Batch/batchAccounts/jobs",
202
+
"apiVersion": "2018-12-01",
203
+
"properties": {
204
+
"id": "mergetask",
205
+
"poolInfo": {
206
+
"poolId": "my-mergetask-pool"
207
+
},
208
+
"taskFactory": {
209
+
"type": "parametricSweep",
210
+
"parameterSets": [
211
+
{
212
+
"start": 1,
213
+
"end": 500,
214
+
"step": 1
215
+
}
216
+
],
217
+
"repeatTask": {
218
+
"commandLine": "/bin/bash -c 'echo {0}'",
219
+
"outputFiles": [
220
+
{
221
+
"filePattern": "**/stdout.txt",
222
+
"destination": {
223
+
"autoStorage": {
224
+
"path": "output-{0}",
225
+
"fileGroup": "outputData"
226
+
}
227
+
},
228
+
"uploadOptions": {
229
+
"uploadCondition": "TaskSuccess"
230
+
}
231
+
}
232
+
]
233
+
},
234
+
"mergeTask" : {
235
+
"displayName": "myMergeTask",
236
+
"commandLine": "/bin/bash -c 'ls'",
237
+
"resourceFiles": [
238
+
{
239
+
"autoStorageContainerName": "fgrp-outputData"
240
+
}
241
+
]
242
+
}
243
+
}
244
+
}
245
+
}
246
+
```
247
+
248
+
191
249
### Samples
192
250
193
251
The following samples use the parametric sweep task factory:
@@ -305,6 +363,63 @@ The above task factory would be expanded into the following tasks:
305
363
]
306
364
```
307
365
366
+
A `mergeTask` may also be specified to run after all tasks in the task per file factory. This allows for map-reduce patterns where one task does parsing or combining of output upon completion of
367
+
all tasks in the task factory. Similar to the `repeatTask` both the `id` and `dependsOn` properties are not allowed, as these are auto-populated with an `id` of 'merge' and the dependant tasks
368
+
automatically.
369
+
370
+
**Note:** It is not advised to use file groups as inputs to a `mergeTask`. File groups are expanded at the time of adding a task to the job and as such, will not contain output generated from dependant
371
+
tasks. If you desire to output data from dependant tasks to the `mergeTask` please look into using either `autoStorageContainerName` or `containerUrl` REST API properties.
372
+
373
+
A basic example of using mergeTask:
374
+
375
+
```json
376
+
"job": {
377
+
"type": "Microsoft.Batch/batchAccounts/jobs",
378
+
"apiVersion": "2018-12-01",
379
+
"properties": {
380
+
"id": "mergetask",
381
+
"poolInfo": {
382
+
"poolId": "my-mergetask-pool"
383
+
},
384
+
"taskFactory": {
385
+
"type": "taskPerFile",
386
+
"source" : {
387
+
"fileGroup" : "inputData"
388
+
},
389
+
"repeatTask": {
390
+
"commandLine": "/bin/bash -c 'cat {fileName}'",
391
+
"resourceFiles": [
392
+
{
393
+
"httpUrl" : "{url}",
394
+
"filePath" : "{fileName}"
395
+
}
396
+
],
397
+
"outputFiles": [
398
+
{
399
+
"filePattern": "**/stdout.txt",
400
+
"destination": {
401
+
"autoStorage": {
402
+
"path": "output-{fileName}",
403
+
"fileGroup": "outputData"
404
+
}
405
+
},
406
+
"uploadOptions": {
407
+
"uploadCondition": "TaskSuccess"
408
+
}
409
+
}
410
+
]
411
+
},
412
+
"mergeTask" : {
413
+
"displayName": "myMergeTask",
414
+
"commandLine": "/bin/bash -c 'ls'",
415
+
"resourceFiles": [
416
+
{
417
+
"autoStorageContainerName": "fgrp-outputData"
418
+
}
419
+
]
420
+
}
421
+
```
422
+
308
423
### Samples
309
424
310
425
The following samples use the task per file task factory:
0 commit comments