You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@@ -47,23 +47,24 @@ You should consider using Azure Machine Learning Parallel job if:
47
47
48
48
Unlike other types of jobs, a parallel job requires preparation. Follow the next sections to prepare for creating your parallel job.
49
49
50
-
### Declare the inputs to be distributed and partition setting
50
+
### Declare the inputs to be distributed and data division setting
51
51
52
-
Parallel job requires only one **major input data** to be split and processed with parallel. The major input data can be either tabular data or a set of files. Different input types can have a different partition method.
52
+
Parallel job requires only one **major input data** to be split and processed with parallel. The major input data can be either tabular data or a set of files. Different input types can have a different data division method.
53
53
54
-
The following table illustrates the relation between input data and partition setting:
54
+
The following table illustrates the relation between input data and data division method:
55
55
56
-
| Data format | Azure Machine Learning input type | Azure Machine Learning input mode |Partition method |
56
+
| Data format | Azure Machine Learning input type | Azure Machine Learning input mode |Data division method |
| File list |`mltable` or<br>`uri_folder`| ro_mount or<br>download | By size (number of files) |
59
-
| Tabular data |`mltable`| direct | By size (estimated physical size) |
58
+
| File list |`mltable` or<br>`uri_folder`| ro_mount or<br>download | By size (number of files)<br>By partitions|
59
+
| Tabular data |`mltable`| direct | By size (estimated physical size)<br>By partitions|
60
60
61
-
You can declare your major input data with `input_data` attribute in parallel job YAML or Python SDK. And you can bind it with one of your defined `inputs` of your parallel job by using `${{inputs.<input name>}}`. Then to define the partition method for your major input.
61
+
You can declare your major input data with `input_data` attribute in parallel job YAML or Python SDK. And you can bind it with one of your defined `inputs` of your parallel job by using `${{inputs.<input name>}}`. Then you need to define the data division method for your major input by filling different attribute:
62
62
63
-
For example, you could set numbers to `mini_batch_size` to partition your data **by size**.
63
+
| Data division method | Attribute name | Attribute type | Job example |
| By size | mini_batch_size | string |[Iris batch prediction](https://github.com/Azure/azureml-examples/tree/main/cli/jobs/parallel/2a_iris_batch_prediction)|
66
+
| By partitions | partition_keys | list of string |[Orange juice sales prediction](https://github.com/Azure/azureml-examples/blob/main/cli/jobs/parallel/1a_oj_sales_prediction)|
64
67
65
-
- When using file list input, this value defines the number of files for each mini-batch.
66
-
- When using tabular input, this value defines the estimated physical size for each mini-batch.
67
68
68
69
# [Azure CLI](#tab/cliv2)
69
70
@@ -81,7 +82,7 @@ Declare `job_data_path` as one of the inputs. Bind it to `input_data` attribute.
81
82
82
83
---
83
84
84
-
Once you have the partition setting defined, you can configure parallel setting by using two attributes below:
85
+
Once you have the data division setting defined, you can configure how many resources for your parallelization by filling two attributes below:
85
86
86
87
| Attribute name | Type | Description | Default value |
87
88
|:-|--|:-|--|
@@ -156,6 +157,9 @@ Sample code to set two attributes:
156
157
> [!IMPORTANT]
157
158
> If you want to parse arguments in Init() or Run(mini_batch) function, use "parse_known_args" instead of "parse_args" for avoiding exceptions. See the [iris_score](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/machine-learning-pipelines/parallel-run/Code/iris_score.py) example for entry script with argument parser.
158
159
160
+
> [!IMPORTANT]
161
+
> If you use `mltable` as your major input data, you need to install 'mltable' library into your environment. See the line 9 of this [conda file](https://github.com/Azure/azureml-examples/blob/main/cli/jobs/parallel/1a_oj_sales_prediction/src/parallel_train/conda.yml) example.
162
+
159
163
### Consider automation settings
160
164
161
165
Azure Machine Learning parallel job exposes numerous settings to automatically control the job without manual intervention. See the following table for the details.
@@ -255,11 +259,8 @@ To debug the failure of your parallel job, navigate to **Outputs + Logs** tab, e
255
259
256
260
## Parallel job in pipeline examples
257
261
258
-
- Azure CLI + YAML:
259
-
-[Iris prediction using parallel](https://github.com/Azure/azureml-examples/tree/sdk-preview/cli/jobs/pipelines/iris-batch-prediction-using-parallel) (tabular input)
260
-
-[mnist identification using parallel](https://github.com/Azure/azureml-examples/tree/sdk-preview/cli/jobs/pipelines/mnist-batch-identification-using-parallel) (file list input)
261
-
- SDK:
262
-
-[Pipeline with parallel run function](https://github.com/Azure/azureml-examples/blob/sdk-preview/sdk/jobs/pipelines/1g_pipeline_with_parallel_nodes/pipeline_with_parallel_nodes.ipynb)
262
+
-[Azure CLI + YAML example repository](https://github.com/Azure/azureml-examples/tree/main/cli/jobs/parallel)
263
+
-[SDK example repository](https://github.com/Azure/azureml-examples/tree/main/sdk/python/jobs/parallel)
0 commit comments