Skip to content

Commit 4bac728

Browse files
committed
changes before reboot
1 parent 1ed407b commit 4bac728

File tree

1 file changed

+44
-43
lines changed

1 file changed

+44
-43
lines changed

articles/machine-learning/how-to-manage-inputs-outputs-pipeline.md

Lines changed: 44 additions & 43 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
---
2-
title: Manage component and pipeline inputs and outputs
2+
title: Manage inputs and outputs for components and pipelines
33
titleSuffix: Azure Machine Learning
4-
description: Learn how to manage inputs and outputs of components and pipelines in Azure Machine Learning.
4+
description: Understand and manage inputs and outputs of pipeline components and pipeline jobs in Azure Machine Learning.
55
services: machine-learning
66
ms.service: azure-machine-learning
77
ms.subservice: core
@@ -12,62 +12,63 @@ ms.date: 09/11/2024
1212
ms.topic: how-to
1313
ms.custom: devplatv2, pipeline, devx-track-azurecli, update-code6
1414
---
15-
# Manage component and pipeline inputs and outputs
15+
# Manage inputs and outputs for components and pipelines
1616

1717
Azure Machine Learning pipelines support inputs and outputs at both the component and pipeline levels. This article describes pipeline and component inputs and outputs and how to manage them.
1818

19-
At the component level, the inputs and outputs define the interface of a component. You can use the output from one component as an input for another component in the same parent pipeline, allowing for data or models to be passed between components. This interconnectivity forms a graph that illustrates the data flow within the pipeline.
19+
At the component level, the inputs and outputs define the component interface. You can use the output from one component as an input for another component in the same parent pipeline, allowing for data or models to be passed between components. You can represent this interconnectivity as a graph that illustrates the data flow within the pipeline.
2020

21-
At the pipeline level, you can use inputs and outputs to submit pipeline jobs with varying data inputs or parameters that control training logic, such as `learning_rate`. When you invoke a pipeline via a REST endpoint, inputs and outputs are especially useful for assigning different values to the pipeline input or accessing the output of pipeline jobs. For more information, see [Create jobs and input data for batch endpoints](how-to-access-data-batch-endpoints-jobs.md).
21+
At the pipeline level, you can use inputs and outputs to submit pipeline jobs with varying data inputs or parameters that control training logic, such as `learning_rate`. Inputs and outputs are especially useful when you invoke a pipeline via a REST endpoint. You can assign different values to the pipeline input or access the output of pipeline jobs. For more information, see [Create jobs and input data for batch endpoints](how-to-access-data-batch-endpoints-jobs.md).
2222

2323
## Input and output types
2424

25-
The following types are supported as **inputs** or **outputs** of a component or pipeline:
25+
The following types are supported as both inputs and outputs of components or pipelines:
2626

27-
Data types. For more information, see [Data types](concept-data.md#data-types).
28-
- `uri_file`
29-
- `uri_folder`
30-
- `mltable`
27+
- Data types. For more information, see [Data types](concept-data.md#data-types).
28+
- `uri_file`
29+
- `uri_folder`
30+
- `mltable`
3131

32-
Model types.
33-
- `mlflow_model`
34-
- `custom_model`
32+
- Model types.
33+
- `mlflow_model`
34+
- `custom_model`
35+
36+
The following primitive types are also supported for inputs only:
3537

36-
Pipeline or component **inputs** can also be the following primitive types:
3738
- `string`
3839
- `number`
3940
- `integer`
4041
- `boolean`
4142

42-
Primitive types **output** isn't supported.
43+
Primitive type output isn't supported.
4344

44-
Using data or model output serializes the outputs and saves them as files in a storage location. Later steps can mount this storage location, or download or upload the files to the compute file system, allowing the steps to access the files during job execution.
45+
Using data or model outputs serializes the outputs and saves them as files in a storage location. Later steps can access the files during job execution by mounting this storage location or by downloading or uploading the files to the compute file system.
4546

46-
This process requires the component source code to serialize the desired output object, usually stored in memory, into files. For example, you could serialize a pandas dataframe as a CSV file. Azure Machine Learning doesn't define any standardized methods for object serialization. Users have the flexibility to choose their preferred methods to serialize objects into files. In the downstream component, you can independently deserialize and read these files.
47+
This process requires the component source code to serialize the output object, which is usually stored in memory, into files. For example, you could serialize a pandas dataframe as a CSV file. Azure Machine Learning doesn't define any standardized methods for object serialization. You have the flexibility to choose your preferred methods to serialize objects into files. In the downstream component, you can independently deserialize and read these files.
4748

48-
### Examples
49+
### Example inputs and outputs
4950

50-
The following example inputs and outputs are from the [NYC Taxi Data Regression](https://github.com/Azure/azureml-examples/tree/main/cli/jobs/pipelines-with-components/nyc_taxi_data_regression) pipeline in the [Azure Machine Learning examples](https://github.com/Azure/azureml-examples) GitHub repository.
51+
These examples are from the [NYC Taxi Data Regression](https://github.com/Azure/azureml-examples/tree/main/cli/jobs/pipelines-with-components/nyc_taxi_data_regression) pipeline in the [Azure Machine Learning examples](https://github.com/Azure/azureml-examples) GitHub repository.
5152

5253
- The [train component](https://github.com/Azure/azureml-examples/blob/main/cli/jobs/pipelines-with-components/nyc_taxi_data_regression/train.yml) has a `number` input named `test_split_ratio`.
53-
- The [prep component](https://github.com/Azure/azureml-examples/blob/main/cli/jobs/pipelines-with-components/nyc_taxi_data_regression/prep.yml) has an`uri_folder` type output. The component source code reads the CSV files from the input folder, processes the files, and writes the processed CSV files to the output folder.
54-
- The [train component](https://github.com/Azure/azureml-examples/blob/main/cli/jobs/pipelines-with-components/nyc_taxi_data_regression/train.yml) has a `mlflow_model` type output. The component source code saves the trained model using `mlflow.sklearn.save_model` method.
54+
- The [prep component](https://github.com/Azure/azureml-examples/blob/main/cli/jobs/pipelines-with-components/nyc_taxi_data_regression/prep.yml) has a `uri_folder` type output. The component source code reads the CSV files from the input folder, processes the files, and writes the processed CSV files to the output folder.
55+
- The [train component](https://github.com/Azure/azureml-examples/blob/main/cli/jobs/pipelines-with-components/nyc_taxi_data_regression/train.yml) has a `mlflow_model` type output. The component source code saves the trained model using the `mlflow.sklearn.save_model` method.
5556

56-
## Data type input and output paths and modes
57+
## Data input and output paths and modes
5758

5859
For data asset inputs and outputs, you must specify a `path` parameter that points to the data location. The following table shows the different data locations that Azure Machine Learning pipelines support, with `path` parameter examples:
5960

6061
|Location | Examples | Input | Output|
6162
|---------|---------|---------|---------|
6263
|A path on your local computer | `./home/username/data/my_data` || |
6364
|A path on a public http(s) server | `https://raw.githubusercontent.com/pandas-dev/pandas/main/doc/data/titanic.csv` || |
64-
|A path on Azure Storage | `wasbs://<container_name>@<account_name>.blob.core.windows.net/<path>`<br>`abfss://<file_system>@<account_name>.dfs.core.windows.net/<path>` | \* | |
65+
|A path on Azure Storage | `wasbs://<container_name>@<account_name>.blob.core.windows.net/<path>`<br>or<br>`abfss://<file_system>@<account_name>.dfs.core.windows.net/<path>` | \* | |
6566
|A path on an Azure Machine Learning datastore | `azureml://datastores/<data_store_name>/paths/<path>` |||
6667
|A path to a data asset | `azureml:<my_data>:<version>` |||
6768

6869
\* Using Azure Storage directly isn't recommended for input/output, because it might need extra identity configuration to read the data. It's better to use an Azure Machine Learning datastore path instead of a direct Azure Storage path. Datastore paths are supported across various job types in pipelines.
6970

70-
For data input/output, you can choose from various download, upload, and mount modes to define how to access data in the compute target. The following table shows the possible modes for different types of inputs and outputs.
71+
For data type inputs and outputs, you can choose from several download, upload, and mount modes to define how to access data in the compute target. The following table shows the possible modes for different types of inputs and outputs.
7172

7273
Type | `upload` | `download` | `ro_mount` | `rw_mount` | `direct` | `eval_download` | `eval_mount`
7374
------ | ------ | :---: | :---: | :---: | :---: | :---: | :---: | :---:
@@ -79,13 +80,13 @@ Type | `upload` | `download` | `ro_mount` | `rw_mount` | `direct` | `eval_downlo
7980
`mltable` output | ✓ | | | ✓ | ✓ | |
8081

8182
> [!NOTE]
82-
> In most cases, `ro_mount` or `rw_mount` modes are suggested. For more information, see [Modes](how-to-read-write-data-v2.md#modes).
83+
> In most cases, `ro_mount` or `rw_mount` modes are recommended. For more information, see [Modes](how-to-read-write-data-v2.md#modes).
8384
84-
## Inputs and outputs in the studio UI
85+
## Inputs and outputs in studio Designer
8586

86-
The following screenshots show how the Azure Machine Learning studio UI displays inputs and outputs in the [NYC Taxi Data Regression](https://github.com/Azure/azureml-examples/tree/main/cli/jobs/pipelines-with-components/nyc_taxi_data_regression) pipeline from the [Azure Machine Learning examples](https://github.com/Azure/azureml-examples) repository.
87+
The following screenshots from the [NYC Taxi Data Regression](https://github.com/Azure/azureml-examples/tree/main/cli/jobs/pipelines-with-components/nyc_taxi_data_regression) pipeline show how Azure Machine Learning studio displays inputs and outputs.
8788

88-
The studio pipeline job page shows data or model type inputs/outputs as small circles, called input/output ports, in the corresponding component. These ports represent the data flow in a pipeline. Pipeline level output is displayed as a purple box for easy identification.
89+
The studio pipeline job page shows data or model type inputs/outputs as small circles called input/output ports in the corresponding component. These ports represent the data flow in a pipeline. Pipeline level output is displayed as a purple box for easy identification.
8990

9091
:::image type="content" source="./media/how-to-manage-pipeline-input-output/input-output-port.png" lightbox="./media/how-to-manage-pipeline-input-output/input-output-port.png" alt-text="Screenshot highlighting the pipeline input and output ports.":::
9192

@@ -95,25 +96,25 @@ When you hover over an input/output port, the type is displayed.
9596

9697
Primitive type inputs aren't displayed on the graph. You can find these inputs on the **Settings** tab of the pipeline job overview panel for pipeline level inputs, or the component panel for component level inputs.
9798

98-
The following screenshot shows the **Settings** tab of a pipeline job, which you can open by selecting **Job Overview**. To check inputs for a component, double-click the component to open the component panel.
99+
The following screenshot shows the **Settings** tab of a pipeline job, which you can open by selecting **Job overview**. To check inputs for a component, double-click the component to open the component panel.
99100

100101
:::image type="content" source="./media/how-to-manage-pipeline-input-output/job-overview-setting.png" lightbox="./media/how-to-manage-pipeline-input-output/job-overview-setting.png" alt-text="Screenshot highlighting the job overview setting panel.":::
101102

102103
When you edit a pipeline in the Designer, the pipeline inputs and outputs are in the **Pipeline interface** panel, and the component inputs and outputs are in the component's panel.
103104

104-
:::image type="content" source="./media/how-to-manage-pipeline-input-output/pipeline-interface.png" lightbox="./media/how-to-manage-pipeline-input-output/pipeline-interface.png" alt-text="Screenshot highlighting the pipeline interface in Designer.":::
105+
:::image type="content" source="./media/how-to-manage-pipeline-input-output/pipeline-interface.png" alt-text="Screenshot highlighting the pipeline interface in Designer.":::
105106

106107
## Promote component inputs/outputs to pipeline level
107108

108-
Promoting a component's input/output to the pipeline level lets you overwrite the component's input/output when you submit a pipeline job. Promoting to the pipeline level is also useful for triggering the pipeline by using the REST endpoint.
109+
Promoting a component's input/output to the pipeline level lets you overwrite the component's input/output when you submit a pipeline job. This ability is especially useful when triggering the pipeline by using a REST endpoint.
109110

110111
The following examples show how to promote component level inputs/outputs to pipeline level inputs/outputs.
111112

112113
# [Azure CLI](#tab/cli)
113114

114-
The following pipeline promotes three inputs and three outputs to the pipeline level. For example, `pipeline_job_training_max_epocs` is declared under the `inputs` section on the root level, which means it's pipeline level input.
115+
The following pipeline promotes three inputs and three outputs to the pipeline level. For example, `pipeline_job_training_max_epocs` is pipeline level input because it's declared under the `inputs` section on the root level.
115116

116-
Under `train_job` in the `jobs` section, the input named `max_epocs` is referenced as `${{parent.inputs.pipeline_job_training_max_epocs}}`, meaning that the `train_job`'s `max_epocs` input references the pipeline level `pipeline_job_training_max_epocs` input. You can promote pipeline output by using the same schema.
117+
Under `train_job` in the `jobs` section, the input named `max_epocs` is referenced as `${{parent.inputs.pipeline_job_training_max_epocs}}`, meaning that the `train_job`'s `max_epocs` input references the pipeline level `pipeline_job_training_max_epocs` input. Pipeline output is promoted by using the same schema.
117118

118119
:::code language="yaml" source="~/azureml-examples-main/cli/jobs/pipelines-with-components/basics/1b_e2e_registered_components/pipeline.yml" range="1-65" highlight="6-17,30,34,52,57,63,65":::
119120

@@ -195,30 +196,30 @@ pipeline_job.settings.default_datastore = "workspaceblobstore"
195196

196197
---
197198

198-
### Studio UI
199+
### Promote inputs in the studio UI
199200

200-
You can promote a component's input to pipeline level input in the studio Designer authoring page.
201+
You can promote a component's input to pipeline level input on the studio Designer authoring page.
201202

202-
1. Open the component's settings panel by double clicking the component.
203+
1. Open the component's settings panel by double-clicking the component.
203204
1. Find the input you want to promote and select the three dots on the right.
204205
1. Select **Add to pipeline input**.
205206

206-
:::image type="content" source="./media/how-to-manage-pipeline-input-output/promote-pipeline-input.png" lightbox="./media/how-to-manage-pipeline-input-output/promote-pipeline-input.png" alt-text="Screenshot highlighting how to promote to pipeline input in Designer.":::
207+
:::image type="content" source="./media/how-to-manage-pipeline-input-output/promote-pipeline-input.png" alt-text="Screenshot highlighting how to promote to pipeline input in Designer.":::
207208

208209
## Define optional inputs
209210

210-
By default, all inputs are required and must be assigned a default value or a value each time you submit a pipeline job. However, you can define an optional input and not assign a value to the input when you submit a pipeline job.
211+
By default, all inputs are required and must either have a default value or be assigned a value each time you submit a pipeline job. However, you can define an optional input and not assign a value to the input when you submit a pipeline job.
211212

212213
> [!NOTE]
213214
> Optional outputs aren't supported.
214215
215-
If you have an optional data/model type input and don't assign a value to it when submitting the pipeline job, a component in the pipeline lacks a preceding data dependency. The component's input port isn't linked to any component or data/model node. This situation causes the pipeline service to invoke the component directly, instead of waiting for the preceding dependency to be ready.
216+
If you have an optional data/model type input and don't assign a value to it when submitting the pipeline job, a component in the pipeline lacks a preceding data dependency. The component's input port isn't linked to any component or data/model node. The pipeline service invokes the component directly, instead of waiting for the preceding dependency to be ready.
216217

217-
If you set `continue_on_step_failure = True` for the pipeline and a second node uses required output from the first node, the second node doesn't execute if the first node fails. But if the second node uses optional input from the first node, it executes even if the first node fails. The following screenshot illustrates this scenario:
218+
If you set `continue_on_step_failure = True` for the pipeline and a second node uses required output from the first node, the second node doesn't execute if the first node fails. But if the second node uses optional input from the first node, it executes even if the first node fails. The following screenshot illustrates this scenario.
218219

219220
:::image type="content" source="./media/how-to-manage-pipeline-input-output/continue-on-failure-optional-input.png" alt-text="Screenshot showing the orchestration logic of optional input and continue on failure.":::
220221

221-
The following YAML code example shows how to define optional input. When the input is set as `optional = true`, you need to use `$[[]]` to embrace the command line with inputs, as in the highlighted lines of the example.
222+
The following YAML code example shows how to define optional input. When the input is set as `optional = true`, you need to use `$[[]]` to embrace the command line inputs, as in the highlighted lines of the example.
222223

223224
:::code language="yaml" source="~/azureml-examples-main/cli/assets/component/train.yml" range="1-34" highlight="11-21,30-32":::
224225

@@ -294,9 +295,9 @@ output = client.jobs.download(name=job.name, download_path=tmp_path, output_name
294295
```
295296
---
296297

297-
### Download child job outputs
298+
## Download child job outputs
298299

299-
To download the output of a child component that isn't promoted to pipeline level, first list all child job entities of a pipeline job and then use similar code to download the output.
300+
To download the outputs of a child component that isn't promoted to pipeline level, first list all child job entities of a pipeline job and then use similar code to download the outputs.
300301

301302
# [Azure CLI](#tab/cli)
302303

0 commit comments

Comments
 (0)