You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/machine-learning/how-to-create-component-pipeline-python.md
+43-43Lines changed: 43 additions & 43 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,5 +1,5 @@
1
1
---
2
-
title: 'Create and run machine learning pipelines using components with the Azure Machine Learning SDK v2'
2
+
title: 'Create and Run Machine Learning Pipelines Using Components with the Machine Learning SDK v2'
3
3
titleSuffix: Azure Machine Learning
4
4
description: Build a machine learning pipeline for image classification. Focus on machine learning instead of infrastructure and automation.
5
5
ms.service: azure-machine-learning
@@ -15,38 +15,40 @@ ms.custom:
15
15
- build-2023
16
16
- ignite-2023
17
17
- update-code
18
+
19
+
#customer intent: As a machine learning engineer, I want to create a component-based machine learning pipeline so that I can take advantage of the flexibility and reuse provided by components.
18
20
---
19
21
20
-
# Create and run machine learning pipelines using components with the Azure Machine Learning SDK v2
22
+
# Create and run machine learning pipelines by using components with the Machine Learning SDK v2
In this article, you learn how to build an [Azure Machine Learning pipeline](concept-ml-pipelines.md) using Python SDK v2 to complete an image classification task containing three steps: prepare data, train an image classification model, and score the model. Machine learning pipelines optimize your workflow with speed, portability, and reuse, so you can focus on machine learning instead of infrastructure and automation.
26
+
In this article, you learn how to build an [Azure Machine Learning pipeline](concept-ml-pipelines.md)by using the Azure Machine Learning Python SDK v2 to complete an image classification task that contains three steps: prepare data, train an image classification model, and score the model. Machine Learning pipelines optimize your workflow with speed, portability, and reuse, so you can focus on machine learning instead of infrastructure and automation.
25
27
26
-
The example trains a small [Keras](https://keras.io/) convolutional neural network to classify images in the [Fashion MNIST](https://github.com/zalandoresearch/fashion-mnist) dataset. The pipeline looks like following.
28
+
The example pipeline trains a small [Keras](https://keras.io/) convolutional neural network to classify images in the [Fashion MNIST](https://github.com/zalandoresearch/fashion-mnist) dataset. The pipeline looks like this:
27
29
28
-
:::image type="content" source="./media/how-to-create-component-pipeline-python/pipeline-graph.png" alt-text="Screenshot showing pipeline graph of the image classification Keras example." lightbox ="./media/how-to-create-component-pipeline-python/pipeline-graph.png":::
30
+
:::image type="content" source="./media/how-to-create-component-pipeline-python/pipeline-graph.png" alt-text="Screenshot showing a pipeline graph of the image classification example." lightbox ="./media/how-to-create-component-pipeline-python/pipeline-graph.png":::
29
31
30
32
In this article, you complete the following tasks:
31
33
32
34
> [!div class="checklist"]
33
35
> * Prepare input data for the pipeline job
34
-
> * Create three components to prepare the data, train and score
35
-
> *Compose a Pipeline from the components
36
-
> * Get access to workspace with compute
36
+
> * Create three components to prepare the data, train an image, and score the model
37
+
> *Build a pipeline from the components
38
+
> * Get access to a workspace with compute
37
39
> * Submit the pipeline job
38
40
> * Review the output of the components and the trained neural network
39
-
> * (Optional) Register the component for further reuse and sharing within workspace
41
+
> * (Optional) Register the component for further reuse and sharing within the workspace
40
42
41
43
If you don't have an Azure subscription, create a free account before you begin. Try the [free or paid version of Azure Machine Learning](https://azure.microsoft.com/free/) today.
42
44
43
45
## Prerequisites
44
46
45
-
* Azure Machine Learning workspace - if you don't have one, complete the [Create resources tutorial](quickstart-create-resources.md).
46
-
* A Python environment in which you've installed Azure Machine Learning Python SDK v2 - [install instructions](https://github.com/Azure/azureml-examples/tree/sdk-preview/sdk#getting-started) - check the getting started section. This environment is for defining and controlling your Azure Machine Learning resources and is separate from the environment used at runtime for training.
47
-
*Clone examples repository
47
+
*An Azure Machine Learning workspace. If you don't have one, complete the [Create resources tutorial](quickstart-create-resources.md).
48
+
* A Python environment in which you've installed Azure Machine Learning Python SDK v2. For installation instructions, see [Getting started](https://github.com/Azure/azureml-examples/tree/sdk-preview/sdk#getting-started). This environment is for defining and controlling your Azure Machine Learning resources and is separate from the environment that's used at runtime for training.
49
+
*A clone of the examples repository.
48
50
49
-
To run the training examples, first clone the examples repository and change into the `sdk` directory:
51
+
To run the training examples, first clone the examples repository and go to the `sdk` directory:
@@ -55,78 +57,76 @@ If you don't have an Azure subscription, create a free account before you begin.
55
57
56
58
## Start an interactive Python session
57
59
58
-
This article uses the Python SDK forAzure Machine Learning to create and control an Azure Machine Learning pipeline. The article assumes that you'll be running the code snippets interactivelyin either a Python REPL environment or a Jupyter notebook.
60
+
This article uses the Azure Machine Learning Python SDK to create and control an Azure Machine Learning pipeline. The article is written based on the assumption that you'll be running the code snippets interactively in either a Python REPL environment or a Jupyter notebook.
59
61
60
-
This article is based on the [image_classification_keras_minist_convnet.ipynb](https://github.com/Azure/azureml-examples/blob/main/sdk/python/jobs/pipelines/2e_image_classification_keras_minist_convnet/image_classification_keras_minist_convnet.ipynb) notebook found in the `sdk/python/jobs/pipelines/2e_image_classification_keras_minist_convnet` directory of the [Azure Machine Learning Examples](https://github.com/azure/azureml-examples) repository.
62
+
This article is based on the [image_classification_keras_minist_convnet.ipynb](https://github.com/Azure/azureml-examples/blob/main/sdk/python/jobs/pipelines/2e_image_classification_keras_minist_convnet/image_classification_keras_minist_convnet.ipynb) notebook, which you can find in the `sdk/python/jobs/pipelines/2e_image_classification_keras_minist_convnet` directory of the [Azure Machine Learning examples](https://github.com/azure/azureml-examples) repository.
61
63
62
64
## Import required libraries
63
65
64
-
Import all the Azure Machine Learning required libraries that you'll need for this article:
66
+
Import all the Azure Machine Learning libraries that you need for this article:
You need to prepare the input data for this image classification pipeline.
71
-
72
-
Fashion-MNIST is a dataset of fashion images divided into 10 classes. Each image is a 28x28 grayscale image and there are 60,000 training and 10,000 test images. As an image classification problem, Fashion-MNIST is harder than the classic MNIST handwritten digit database. It's distributed in the same compressed binary form as the original [handwritten digit database](http://yann.lecun.com/exdb/mnist/).
72
+
You need to prepare the input data for the image classification pipeline.
73
73
74
-
Import all the Azure Machine Learning required libraries that you'll need.
74
+
Fashion-MNIST is a dataset of fashion images divided into 10 classes. Each image is a 28 x 28 grayscale image. There are 60,000 training images and 10,000 test images. As an image classification problem, Fashion-MNIST is more challenging than the classic MNIST handwritten digit database. It's distributed in the same compressed binary form as the original [handwritten digit database](http://yann.lecun.com/exdb/mnist/).
75
75
76
76
By defining an `Input`, you create a reference to the data source location. The data remains in its existing location, so no extra storage cost is incurred.
77
77
78
-
## Create components for building pipeline
78
+
## Create components for building the pipeline
79
79
80
-
The image classification task can be split into three steps: prepare data, train model and score model.
80
+
The image classification task can be split into three steps: prepare data, train the model, and score the model.
81
81
82
-
[Azure Machine Learning component](concept-component.md) is a self-contained piece of code that does one step in a machine learning pipeline. In this article, you'll create three components for the image classification task:
82
+
An [Azure Machine Learning component](concept-component.md) is a self-contained piece of code that completes one step in a machine learning pipeline. In this article, you create three components for the image classification task:
83
83
84
-
* Prepare data for training and test
85
-
* Train a neural network for image classification using training data
86
-
* Score the model using test data
84
+
* Prepare data for training and test it.
85
+
* Train a neural network for image classification by using training data.
86
+
* Score the model by using test data.
87
87
88
-
For each component, you need to prepare the following:
88
+
For each component, you need to complete these steps:
89
89
90
-
1. Prepare the Python script containing the execution logic
90
+
1. Prepare the Python script that contains the execution logic.
91
91
92
-
1. Define the interface of the component
92
+
1. Define the interface of the component.
93
93
94
-
1. Add other metadata of the component, including run-time environment, command to run the component, and etc.
94
+
1. Add other metadata of the component, including the runtime environment and the command to run the component.
95
95
96
-
The next section will show the create components in two different ways: the first two components using Python functionand the third component using YAML definition.
96
+
The next section shows how to create the components in two ways. For the first two components, you use a Python function. For the third component you use YAML definition.
97
97
98
98
### Create the data-preparation component
99
99
100
-
The first component in this pipeline will convert the compressed data files of `fashion_ds` into two csv files, one for training and the other for scoring. You'll use Python function to define this component.
100
+
The first component in this pipeline converts the compressed data files of `fashion_ds` into two .csv files, one for training and the other for scoring. You use a Python functionto define this component.
101
101
102
-
If you're following along with the example in the [Azure Machine Learning examples repo](https://github.com/Azure/azureml-examples/tree/main/sdk/python/jobs/pipelines/2e_image_classification_keras_minist_convnet), the source files are already available in`prep/` folder. This folder contains two files to construct the component: `prep_component.py`, which defines the component and `conda.yaml`, which defines the run-time environment of the component.
102
+
If you're following along with the example in the [Azure Machine Learning examples repo](https://github.com/Azure/azureml-examples/tree/main/sdk/python/jobs/pipelines/2e_image_classification_keras_minist_convnet), the source files are already available in the `prep/` folder. This folder contains two files to construct the component: `prep_component.py`, which defines the component, and `conda.yaml`, which defines the runtime environment of the component.
103
103
104
-
#### Define component using Python function
104
+
#### Define component by using a Python function
105
105
106
-
By using `command_component()`functionas a decorator, you can easily define the component's interface, metadata and code to execute from a Python function. Each decorated Python function will be transformed into a single static specification (YAML) that the pipeline service can process.
106
+
By using the `command_component()` function as a decorator, you can easily define the component's interface, its metadata, and the code to run from a Python function. Each decorated Python functionwill be transformed into a single static specification (YAML) that the pipeline service can process.
The code above define a component with display name `Prep Data` using `@command_component` decorator:
110
+
The preceding code defines a component with display name `Prep Data`by using the`@command_component` decorator:
111
111
112
112
*`name` is the unique identifier of the component.
113
113
*`version` is the current version of the component. A component can have multiple versions.
114
-
* `display_name` is a friendly display name of the component in UI, which isn't unique.
115
-
*`description` usually describes what task this component can complete.
116
-
*`environment` specifies the run-time environment forthis component. The environment of this component specifies a docker image and refers to the `conda.yaml` file.
114
+
*`display_name` is a friendly display name of the component for UI. It isn't unique.
115
+
* `description` usually describes the task the component can complete.
116
+
* `environment` specifies the runtime environment for the component. The environment of this component specifies a Docker image and refers to the `conda.yaml` file.
117
117
118
-
The `conda.yaml` file contains all packages used for the component like following:
118
+
The `conda.yaml` file contains all packages used for the component:
* The `prepare_data_component` function defines one input for `input_data` and two outputs for `training_data` and `test_data`.
123
123
`input_data` is input data path. `training_data` and `test_data` are output data paths for training data and test data.
124
-
*This component converts the data from `input_data` into a training data csv to `training_data`and a test data csv to `test_data`.
124
+
* The component converts the data from `input_data` into a `training_data` .csv to train data and a `test_data` .csv to test data.
125
125
126
-
Following is what a component looks like in the studio UI.
126
+
This is what a component looks like in the studio UI:
127
127
128
128
* A component is a block in a pipeline graph.
129
-
*The `input_data`, `training_data` and `test_data` are ports of the component, which connects to other components for data streaming.
129
+
* `input_data`, `training_data`, and `test_data` are ports of the component, which connect to other components for data streaming.
130
130
131
131
:::image type="content" source="./media/how-to-create-component-pipeline-python/prep-data-component.png" alt-text="Screenshot of the Prep Data component in the UI and code." lightbox ="./media/how-to-create-component-pipeline-python/prep-data-component.png":::
0 commit comments