Skip to content

Commit 35790ff

Browse files
committed
fix
1 parent 312190b commit 35790ff

File tree

1 file changed

+17
-10
lines changed

1 file changed

+17
-10
lines changed

articles/machine-learning/how-to-use-parallel-job-in-pipeline.md

Lines changed: 17 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,7 @@ For example, in a scenario where you're running an object detection model on a l
2323

2424
Machine learning engineers always have scale requirements on their training or inferencing tasks. For example, when a data scientist provides a single script to train a sales prediction model, machine learning engineers need to apply this training task to each individual data store. Challenges of this scale-out process include long execution times that cause delays, and unexpected issues that require manual intervention to keep the task running.
2525

26-
The core job of Azure Machine Learning parallelization is to split a single serial task into mini-batches and dispatch those mini-batches to multiple computes to execute in parallel. Parallel jobs significantly reduce end-to-end execution time and also handle errors automatically. Consider using Azure Machine Learning Parallel job if you plan to train many models on top of your partitioned data or you want to accelerate your large-scale batch inferencing tasks.
26+
The core job of Azure Machine Learning parallelization is to split a single serial task into mini-batches and dispatch those mini-batches to multiple computes to execute in parallel. Parallel jobs significantly reduce end-to-end execution time and also handle errors automatically. Consider using Azure Machine Learning Parallel job to train many models on top of your partitioned data or to accelerate your large-scale batch inferencing tasks.
2727

2828
## Prerequisites
2929

@@ -40,8 +40,12 @@ The core job of Azure Machine Learning parallelization is to split a single seri
4040
- Install the [Azure Machine Learning SDK v2 for Python](/python/api/overview/azure/ai-ml-readme).
4141
- Understand how to [create and run Azure Machine Learning pipelines and components with the Python SDK v2](how-to-create-component-pipeline-python.md).
4242

43+
---
44+
4345
## Create and run a pipeline with a parallel job step
4446

47+
An Azure Machine Learning parallel job can be used only as a step in a pipeline job.
48+
4549
# [Azure CLI](#tab/cli)
4650

4751
The following examples come from [Run a pipeline job using parallel job in pipeline](https://github.com/Azure/azureml-examples/tree/main/cli/jobs/pipelines/iris-batch-prediction-using-parallel/) in the [Azure Machine Learning examples](https://github.com/Azure/azureml-examples) repository.
@@ -50,9 +54,11 @@ The following examples come from [Run a pipeline job using parallel job in pipel
5054

5155
The following examples come from the [Build a simple machine learning pipeline with parallel component](https://github.com/Azure/azureml-examples/blob/main/sdk/python/jobs/pipelines/1g_pipeline_with_parallel_nodes/pipeline_with_parallel_nodes.ipynb) notebook in the [Azure Machine Learning examples](https://github.com/Azure/azureml-examples) repository.
5256

57+
---
58+
5359
### Prepare for parallelization
5460

55-
An Azure Machine Learning parallel job can be used only as a step in a pipeline job. This parallel job step requires preparation. In your parallel job definition, you need to set attributes that:
61+
This parallel job step requires preparation. In your parallel job definition, you need to set attributes that:
5662

5763
- Define and bind your input data.
5864
- Set the data division method.
@@ -69,7 +75,7 @@ Different data formats have different input types, input modes, and data divisio
6975

7076
| Data format | Input type | Input mode | Data division method |
7177
|: ---------- |: ------------- |: ------------- |: --------------- |
72-
| File list | `mltable` or `uri_folder` | ro_mount or download | By size (number of files) or by partitios |
78+
| File list | `mltable` or `uri_folder` | ro_mount or download | By size (number of files) or by partition |
7379
| Tabular data | `mltable` | direct | By size (estimated physical size) or by partition |
7480

7581
> [!NOTE]
@@ -126,8 +132,8 @@ To use the entry script, set the following two attributes in your parallel job d
126132

127133
| Attribute name | Type | Description |
128134
|: ------------- | ---- |: ---------- |
129-
| `code` | string | Local path to the source code directory to upload and use for the job. | |
130-
| `entry_script` | string | The Python file that contains the implementation of predefined parallel functions. | |
135+
| `code` | string | Local path to the source code directory to upload and use for the job. |
136+
| `entry_script` | string | The Python file that contains the implementation of predefined parallel functions. |
131137

132138
#### Examples
133139

@@ -152,12 +158,12 @@ Azure Machine Learning parallel job exposes many settings that can automatically
152158
|--|--|--|--|--|--|--|
153159
| `mini_batch_error_threshold` | integer | Number of failed mini-batches to ignore in this parallel job. If the count of failed mini-batches is higher than this threshold, the parallel job is marked as failed.<br><br>The mini-batch is marked as failed if:<br>- The count of return from `run()` is less than the mini-batch input count.<br>- Exceptions are caught in custom `run()` code.<br><br>`-1` is the default, meaning to ignore all failed mini-batches. | [-1, int.max] | `-1` | `mini_batch_error_threshold` | N/A |
154160
| `mini_batch_max_retries` | integer | Number of retries when the mini-batch fails or times out. If all retries fail, the mini-batch is marked as failed per the `mini_batch_error_threshold` calculation. | `[0, int.max]` | `2` | `retry_settings.max_retries` | N/A |
155-
| `mini_batch_timeout` | integer | Timeout in seconds for executing the custom `run()` function. If execution time is higher than this threshold, the mini-batch is aborted and marked as failed to trigger retry. | `(0, 259200]` | `60` | `retry_settings.timeout` | N/A |
161+
| `mini_batch_timeout` | integer | Time out in seconds for executing the custom `run()` function. If execution time is higher than this threshold, the mini-batch is aborted and marked as failed to trigger retry. | `(0, 259200]` | `60` | `retry_settings.timeout` | N/A |
156162
| `item_error_threshold` | integer | The threshold of failed items. Failed items are counted by the number gap between inputs and returns from each mini-batch. If the sum of failed items is higher than this threshold, the parallel job is marked as failed.<br><br>Note: `-1` is the default, meaning to ignore all failures during parallel job. | `[-1, int.max]` | `-1` | N/A | `--error_threshold` |
157163
| `allowed_failed_percent` | integer | Similar to `mini_batch_error_threshold`, but uses the percent of failed mini-batches instead of the count. | `[0, 100]` | `100` | N/A | `--allowed_failed_percent` |
158-
| `overhead_timeout` | integer | Timeout in seconds for initialization of each mini-batch. For example, load mini-batch data and pass it to the `run()` function. | `(0, 259200]` | `600` | N/A | `--task_overhead_timeout` |
159-
| `progress_update_timeout` | integer | Timeout in seconds for monitoring the progress of mini-batch execution. If no progress updates are received within this timeout setting, the parallel job is marked as failed. | `(0, 259200]` | Dynamically calculated by other settings. | N/A | `--progress_update_timeout` |
160-
| `first_task_creation_timeout` | integer | Timeout in seconds for monitoring the time between the job start and the run of the first mini-batch. | `(0, 259200]` | `600` | N/A | --first_task_creation_timeout |
164+
| `overhead_timeout` | integer | Time out in seconds for initialization of each mini-batch. For example, load mini-batch data and pass it to the `run()` function. | `(0, 259200]` | `600` | N/A | `--task_overhead_timeout` |
165+
| `progress_update_timeout` | integer | Time out in seconds for monitoring the progress of mini-batch execution. If no progress updates are received within this timeout setting, the parallel job is marked as failed. | `(0, 259200]` | Dynamically calculated by other settings. | N/A | `--progress_update_timeout` |
166+
| `first_task_creation_timeout` | integer | Time out in seconds for monitoring the time between the job start and the run of the first mini-batch. | `(0, 259200]` | `600` | N/A | --first_task_creation_timeout |
161167
| `logging_level` | string | The level of logs to dump to user log files. | `INFO`, `WARNING`, or `DEBUG` | `INFO` | `logging_level` | N/A |
162168
| `append_row_to` | string | Aggregate all returns from each run of the mini-batch and output it into this file. May refer to one of the outputs of the parallel job by using the expression `${{outputs.<output_name>}}` | | | `task.append_row_to` | N/A |
163169
| `copy_logs_to_parent` | string | Boolean option whether to copy the job progress, overview, and logs to the parent pipeline job. | `True` or `False` | `False` | N/A | `--copy_logs_to_parent` |
@@ -171,7 +177,8 @@ Sample code to update these settings:
171177

172178
# [Python](#tab/python)
173179

174-
[!notebook-python[] (~/azureml-examples-main/sdk/python/jobs/pipelines/1g_pipeline_with_parallel_nodes/pipeline_with_parallel_nodes.ipynb?name=parallel-job-for-tabular-data)]
180+
[!Notebook-python[] (~/azureml-examples-main/sdk/python/jobs/pipelines/1g_pipeline_with_parallel_nodes/pipeline_with_parallel_nodes.ipynb?name=parallel-job-for-tabular-data)]
181+
175182
---
176183

177184
### Create the pipeline with parallel job step

0 commit comments

Comments
 (0)