You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
# Manage component and pipeline inputs and outputs
15
+
# Manage inputs and outputs for components and pipelines
16
16
17
17
Azure Machine Learning pipelines support inputs and outputs at both the component and pipeline levels. This article describes pipeline and component inputs and outputs and how to manage them.
18
18
19
-
At the component level, the inputs and outputs define the interface of a component. You can use the output from one component as an input for another component in the same parent pipeline, allowing for data or models to be passed between components. This interconnectivity forms a graph that illustrates the data flow within the pipeline.
19
+
At the component level, the inputs and outputs define the component interface. You can use the output from one component as an input for another component in the same parent pipeline, allowing for data or models to be passed between components. You can represent this interconnectivity as a graph that illustrates the data flow within the pipeline.
20
20
21
-
At the pipeline level, you can use inputs and outputs to submit pipeline jobs with varying data inputs or parameters that control training logic, such as `learning_rate`. When you invoke a pipeline via a REST endpoint, inputs and outputs are especially useful for assigning different values to the pipeline input or accessing the output of pipeline jobs. For more information, see [Create jobs and input data for batch endpoints](how-to-access-data-batch-endpoints-jobs.md).
21
+
At the pipeline level, you can use inputs and outputs to submit pipeline jobs with varying data inputs or parameters that control training logic, such as `learning_rate`. Inputs and outputs are especially useful when you invoke a pipeline via a REST endpoint. You can assign different values to the pipeline input or access the output of pipeline jobs. For more information, see [Create jobs and input data for batch endpoints](how-to-access-data-batch-endpoints-jobs.md).
22
22
23
23
## Input and output types
24
24
25
-
The following types are supported as **inputs** or **outputs** of a component or pipeline:
25
+
The following types are supported as both inputs and outputs of components or pipelines:
26
26
27
-
Data types. For more information, see [Data types](concept-data.md#data-types).
28
-
-`uri_file`
29
-
-`uri_folder`
30
-
-`mltable`
27
+
-Data types. For more information, see [Data types](concept-data.md#data-types).
28
+
-`uri_file`
29
+
-`uri_folder`
30
+
-`mltable`
31
31
32
-
Model types.
33
-
-`mlflow_model`
34
-
-`custom_model`
32
+
- Model types.
33
+
-`mlflow_model`
34
+
-`custom_model`
35
+
36
+
The following primitive types are also supported for inputs only:
35
37
36
-
Pipeline or component **inputs** can also be the following primitive types:
37
38
-`string`
38
39
-`number`
39
40
-`integer`
40
41
-`boolean`
41
42
42
-
Primitive types **output** isn't supported.
43
+
Primitive type output isn't supported.
43
44
44
-
Using data or model output serializes the outputs and saves them as files in a storage location. Later steps can mount this storage location, or download or upload the files to the compute file system, allowing the steps to access the files during job execution.
45
+
Using data or model outputs serializes the outputs and saves them as files in a storage location. Later steps can access the files during job execution by mounting this storage location or by downloading or uploading the files to the compute file system.
45
46
46
-
This process requires the component source code to serialize the desired output object, usually stored in memory, into files. For example, you could serialize a pandas dataframe as a CSV file. Azure Machine Learning doesn't define any standardized methods for object serialization. Users have the flexibility to choose their preferred methods to serialize objects into files. In the downstream component, you can independently deserialize and read these files.
47
+
This process requires the component source code to serialize the output object, which is usually stored in memory, into files. For example, you could serialize a pandas dataframe as a CSV file. Azure Machine Learning doesn't define any standardized methods for object serialization. You have the flexibility to choose your preferred methods to serialize objects into files. In the downstream component, you can independently deserialize and read these files.
47
48
48
-
### Examples
49
+
### Example inputs and outputs
49
50
50
-
The following example inputs and outputs are from the [NYC Taxi Data Regression](https://github.com/Azure/azureml-examples/tree/main/cli/jobs/pipelines-with-components/nyc_taxi_data_regression) pipeline in the [Azure Machine Learning examples](https://github.com/Azure/azureml-examples) GitHub repository.
51
+
These examples are from the [NYC Taxi Data Regression](https://github.com/Azure/azureml-examples/tree/main/cli/jobs/pipelines-with-components/nyc_taxi_data_regression) pipeline in the [Azure Machine Learning examples](https://github.com/Azure/azureml-examples) GitHub repository.
51
52
52
53
- The [train component](https://github.com/Azure/azureml-examples/blob/main/cli/jobs/pipelines-with-components/nyc_taxi_data_regression/train.yml) has a `number` input named `test_split_ratio`.
53
-
- The [prep component](https://github.com/Azure/azureml-examples/blob/main/cli/jobs/pipelines-with-components/nyc_taxi_data_regression/prep.yml) has an`uri_folder` type output. The component source code reads the CSV files from the input folder, processes the files, and writes the processed CSV files to the output folder.
54
-
- The [train component](https://github.com/Azure/azureml-examples/blob/main/cli/jobs/pipelines-with-components/nyc_taxi_data_regression/train.yml) has a `mlflow_model` type output. The component source code saves the trained model using `mlflow.sklearn.save_model` method.
54
+
- The [prep component](https://github.com/Azure/azureml-examples/blob/main/cli/jobs/pipelines-with-components/nyc_taxi_data_regression/prep.yml) has a `uri_folder` type output. The component source code reads the CSV files from the input folder, processes the files, and writes the processed CSV files to the output folder.
55
+
- The [train component](https://github.com/Azure/azureml-examples/blob/main/cli/jobs/pipelines-with-components/nyc_taxi_data_regression/train.yml) has a `mlflow_model` type output. The component source code saves the trained model using the `mlflow.sklearn.save_model` method.
55
56
56
-
## Data type input and output paths and modes
57
+
## Data input and output paths and modes
57
58
58
59
For data asset inputs and outputs, you must specify a `path` parameter that points to the data location. The following table shows the different data locations that Azure Machine Learning pipelines support, with `path` parameter examples:
59
60
60
61
|Location | Examples | Input | Output|
61
62
|---------|---------|---------|---------|
62
63
|A path on your local computer |`./home/username/data/my_data`| ✓ ||
63
64
|A path on a public http(s) server |`https://raw.githubusercontent.com/pandas-dev/pandas/main/doc/data/titanic.csv`| ✓ ||
64
-
|A path on Azure Storage |`wasbs://<container_name>@<account_name>.blob.core.windows.net/<path>`<br>`abfss://<file_system>@<account_name>.dfs.core.windows.net/<path>`|\*||
65
+
|A path on Azure Storage |`wasbs://<container_name>@<account_name>.blob.core.windows.net/<path>`<br>or<br>`abfss://<file_system>@<account_name>.dfs.core.windows.net/<path>`|\*||
65
66
|A path on an Azure Machine Learning datastore |`azureml://datastores/<data_store_name>/paths/<path>`| ✓ | ✓ |
66
67
|A path to a data asset |`azureml:<my_data>:<version>`|✓ | ✓ |
67
68
68
69
\* Using Azure Storage directly isn't recommended for input/output, because it might need extra identity configuration to read the data. It's better to use an Azure Machine Learning datastore path instead of a direct Azure Storage path. Datastore paths are supported across various job types in pipelines.
69
70
70
-
For data input/output, you can choose from various download, upload, and mount modes to define how to access data in the compute target. The following table shows the possible modes for different types of inputs and outputs.
71
+
For data type inputs and outputs, you can choose from several download, upload, and mount modes to define how to access data in the compute target. The following table shows the possible modes for different types of inputs and outputs.
> In most cases, `ro_mount` or `rw_mount` modes are suggested. For more information, see [Modes](how-to-read-write-data-v2.md#modes).
83
+
> In most cases, `ro_mount` or `rw_mount` modes are recommended. For more information, see [Modes](how-to-read-write-data-v2.md#modes).
83
84
84
-
## Inputs and outputs in the studio UI
85
+
## Inputs and outputs in studio Designer
85
86
86
-
The following screenshots show how the Azure Machine Learning studio UI displays inputs and outputs in the [NYC Taxi Data Regression](https://github.com/Azure/azureml-examples/tree/main/cli/jobs/pipelines-with-components/nyc_taxi_data_regression) pipeline from the [Azure Machine Learning examples](https://github.com/Azure/azureml-examples) repository.
87
+
The following screenshots from the [NYC Taxi Data Regression](https://github.com/Azure/azureml-examples/tree/main/cli/jobs/pipelines-with-components/nyc_taxi_data_regression) pipeline show how Azure Machine Learning studio displays inputs and outputs.
87
88
88
-
The studio pipeline job page shows data or model type inputs/outputs as small circles, called input/output ports, in the corresponding component. These ports represent the data flow in a pipeline. Pipeline level output is displayed as a purple box for easy identification.
89
+
The studio pipeline job page shows data or model type inputs/outputs as small circles called input/output ports in the corresponding component. These ports represent the data flow in a pipeline. Pipeline level output is displayed as a purple box for easy identification.
89
90
90
91
:::image type="content" source="./media/how-to-manage-pipeline-input-output/input-output-port.png" lightbox="./media/how-to-manage-pipeline-input-output/input-output-port.png" alt-text="Screenshot highlighting the pipeline input and output ports.":::
91
92
@@ -95,25 +96,25 @@ When you hover over an input/output port, the type is displayed.
95
96
96
97
Primitive type inputs aren't displayed on the graph. You can find these inputs on the **Settings** tab of the pipeline job overview panel for pipeline level inputs, or the component panel for component level inputs.
97
98
98
-
The following screenshot shows the **Settings** tab of a pipeline job, which you can open by selecting **Job Overview**. To check inputs for a component, double-click the component to open the component panel.
99
+
The following screenshot shows the **Settings** tab of a pipeline job, which you can open by selecting **Job overview**. To check inputs for a component, double-click the component to open the component panel.
When you edit a pipeline in the Designer, the pipeline inputs and outputs are in the **Pipeline interface** panel, and the component inputs and outputs are in the component's panel.
103
104
104
-
:::image type="content" source="./media/how-to-manage-pipeline-input-output/pipeline-interface.png" lightbox="./media/how-to-manage-pipeline-input-output/pipeline-interface.png" alt-text="Screenshot highlighting the pipeline interface in Designer.":::
105
+
:::image type="content" source="./media/how-to-manage-pipeline-input-output/pipeline-interface.png" alt-text="Screenshot highlighting the pipeline interface in Designer.":::
105
106
106
107
## Promote component inputs/outputs to pipeline level
107
108
108
-
Promoting a component's input/output to the pipeline level lets you overwrite the component's input/output when you submit a pipeline job. Promoting to the pipeline level is also useful for triggering the pipeline by using the REST endpoint.
109
+
Promoting a component's input/output to the pipeline level lets you overwrite the component's input/output when you submit a pipeline job. This ability is especially useful when triggering the pipeline by using a REST endpoint.
109
110
110
111
The following examples show how to promote component level inputs/outputs to pipeline level inputs/outputs.
111
112
112
113
# [Azure CLI](#tab/cli)
113
114
114
-
The following pipeline promotes three inputs and three outputs to the pipeline level. For example, `pipeline_job_training_max_epocs` is declared under the `inputs` section on the root level, which means it's pipeline level input.
115
+
The following pipeline promotes three inputs and three outputs to the pipeline level. For example, `pipeline_job_training_max_epocs` is pipeline level input because it's declared under the `inputs` section on the root level.
115
116
116
-
Under `train_job` in the `jobs` section, the input named `max_epocs` is referenced as `${{parent.inputs.pipeline_job_training_max_epocs}}`, meaning that the `train_job`'s `max_epocs` input references the pipeline level `pipeline_job_training_max_epocs` input. You can promote pipeline output by using the same schema.
117
+
Under `train_job` in the `jobs` section, the input named `max_epocs` is referenced as `${{parent.inputs.pipeline_job_training_max_epocs}}`, meaning that the `train_job`'s `max_epocs` input references the pipeline level `pipeline_job_training_max_epocs` input. Pipeline output is promoted by using the same schema.
You can promote a component's input to pipeline level input in the studio Designer authoring page.
201
+
You can promote a component's input to pipeline level input on the studio Designer authoring page.
201
202
202
-
1. Open the component's settings panel by doubleclicking the component.
203
+
1. Open the component's settings panel by double-clicking the component.
203
204
1. Find the input you want to promote and select the three dots on the right.
204
205
1. Select **Add to pipeline input**.
205
206
206
-
:::image type="content" source="./media/how-to-manage-pipeline-input-output/promote-pipeline-input.png" lightbox="./media/how-to-manage-pipeline-input-output/promote-pipeline-input.png" alt-text="Screenshot highlighting how to promote to pipeline input in Designer.":::
207
+
:::image type="content" source="./media/how-to-manage-pipeline-input-output/promote-pipeline-input.png" alt-text="Screenshot highlighting how to promote to pipeline input in Designer.":::
207
208
208
209
## Define optional inputs
209
210
210
-
By default, all inputs are required and must be assigned a default value or a value each time you submit a pipeline job. However, you can define an optional input and not assign a value to the input when you submit a pipeline job.
211
+
By default, all inputs are required and must either have a default value or be assigned a value each time you submit a pipeline job. However, you can define an optional input and not assign a value to the input when you submit a pipeline job.
211
212
212
213
> [!NOTE]
213
214
> Optional outputs aren't supported.
214
215
215
-
If you have an optional data/model type input and don't assign a value to it when submitting the pipeline job, a component in the pipeline lacks a preceding data dependency. The component's input port isn't linked to any component or data/model node. This situation causes the pipeline service to invoke the component directly, instead of waiting for the preceding dependency to be ready.
216
+
If you have an optional data/model type input and don't assign a value to it when submitting the pipeline job, a component in the pipeline lacks a preceding data dependency. The component's input port isn't linked to any component or data/model node. The pipeline service invokes the component directly, instead of waiting for the preceding dependency to be ready.
216
217
217
-
If you set `continue_on_step_failure = True` for the pipeline and a second node uses required output from the first node, the second node doesn't execute if the first node fails. But if the second node uses optional input from the first node, it executes even if the first node fails. The following screenshot illustrates this scenario:
218
+
If you set `continue_on_step_failure = True` for the pipeline and a second node uses required output from the first node, the second node doesn't execute if the first node fails. But if the second node uses optional input from the first node, it executes even if the first node fails. The following screenshot illustrates this scenario.
218
219
219
220
:::image type="content" source="./media/how-to-manage-pipeline-input-output/continue-on-failure-optional-input.png" alt-text="Screenshot showing the orchestration logic of optional input and continue on failure.":::
220
221
221
-
The following YAML code example shows how to define optional input. When the input is set as `optional = true`, you need to use `$[[]]` to embrace the command line with inputs, as in the highlighted lines of the example.
222
+
The following YAML code example shows how to define optional input. When the input is set as `optional = true`, you need to use `$[[]]` to embrace the command line inputs, as in the highlighted lines of the example.
To download the output of a child component that isn't promoted to pipeline level, first list all child job entities of a pipeline job and then use similar code to download the output.
300
+
To download the outputs of a child component that isn't promoted to pipeline level, first list all child job entities of a pipeline job and then use similar code to download the outputs.
0 commit comments