You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/machine-learning/tutorial-pipeline-python-sdk.md
+31-33Lines changed: 31 additions & 33 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -9,7 +9,7 @@ ms.topic: tutorial
9
9
author: lgayhardt
10
10
ms.author: lagayhar
11
11
ms.reviewer: keli19
12
-
ms.date: 05/15/2024
12
+
ms.date: 09/09/2025
13
13
ms.custom:
14
14
- sdkv2
15
15
- build-2023
@@ -25,11 +25,9 @@ ms.custom:
25
25
> [!NOTE]
26
26
> For a tutorial that uses SDK v1 to build a pipeline, see [Tutorial: Build an Azure Machine Learning pipeline for image classification](v1/tutorial-pipeline-python-sdk.md)
27
27
28
-
The core of a machine learning pipeline is to split a complete machine learning task into a multistep workflow. Each step is a manageable component that can be developed, optimized, configured, and automated individually. Steps are connected through well-defined interfaces. The Azure Machine Learning pipeline service automatically orchestrates all the dependencies between pipeline steps. The benefits of using a pipeline are standardized the MLOps practice, scalable team collaboration, training efficiency and cost reduction. To learn more about the benefits of pipelines, see [What are Azure Machine Learning pipelines](concept-ml-pipelines.md).
28
+
The core of a machine learning pipeline is to split a complete machine learning task into a multistep workflow. Each step is a manageable component that can be developed, optimized, configured, and automated individually. Steps are connected through well-defined interfaces. The Azure Machine Learning pipeline service automatically orchestrates all the dependencies between pipeline steps. The benefits of using a pipeline are standardized the MLOps practice, scalable team collaboration, training efficiency, and cost reduction. To learn more about the benefits of pipelines, see [What are Azure Machine Learning pipelines](concept-ml-pipelines.md).
29
29
30
-
In this tutorial, you use Azure Machine Learning to create a production ready machine learning project, using Azure Machine Learning Python SDK v2.
31
-
32
-
This means you will be able to leverage the Azure Machine Learning Python SDK to:
30
+
In this tutorial, you use Azure Machine Learning to create a production-ready machine learning project, using Azure Machine Learning Python SDK v2. This means you are able to use the Azure Machine Learning Python SDK to:
33
31
34
32
> [!div class="checklist"]
35
33
> - Get a handle to your Azure Machine Learning workspace
@@ -58,7 +56,7 @@ This video shows how to get started in Azure Machine Learning studio so that you
1. Complete the tutorial [Upload, access and explore your data](tutorial-explore-data.md) to create the data asset you need in this tutorial. Make sure you run all the code to create the initial data asset. Explore the data and revise it if you wish, but you'll only need the initial data in this tutorial.
59
+
1. Complete the tutorial [Upload, access and explore your data](tutorial-explore-data.md) to create the data asset you need in this tutorial. Make sure you run all the code to create the initial data asset. Explore the data and revise it if you wish, but you only need the initial data in this tutorial.
62
60
63
61
1.[!INCLUDE [open or create notebook](includes/prereq-open-or-create.md)]
@@ -73,21 +71,21 @@ This video shows how to get started in Azure Machine Learning studio so that you
73
71
74
72
The Azure Machine Learning framework can be used from CLI, Python SDK, or studio interface. In this example, you use the Azure Machine Learning Python SDK v2 to create a pipeline.
75
73
76
-
Before creating the pipeline, you need the following resources:
74
+
Before creating the pipeline, you need these resources:
77
75
78
76
* The data asset for training
79
77
* The software environment to run the pipeline
80
78
* A compute resource to where the job runs
81
79
82
80
## Create handle to workspace
83
81
84
-
Before we dive in the code, you need a way to reference your workspace. You'll create `ml_client` for a handle to the workspace. You'll then use `ml_client` to manage resources and jobs.
82
+
Before we dive in the code, you need a way to reference your workspace. You create `ml_client` for a handle to the workspace. You then use `ml_client` to manage resources and jobs.
85
83
86
-
In the next cell, enter your Subscription ID, Resource Group name and Workspace name. To find these values:
84
+
In the next cell, enter your Subscription ID, Resource Group name, and Workspace name. To find these values:
87
85
88
86
1. In the upper right Azure Machine Learning studio toolbar, select your workspace name.
89
-
1. Copy the value for workspace, resource group and subscription ID into the code.
90
-
1. You'll need to copy one value, close the area and paste, then come back for the next one.
87
+
1. Copy the value for workspace, resource group, and subscription ID into the code.
88
+
1. You need to copy one value, close the area, and paste, then come back for the next one.
91
89
92
90
93
91
@@ -111,14 +109,14 @@ ml_client = MLClient(
111
109
```
112
110
113
111
> [!NOTE]
114
-
> Creating MLClient will not connect to the workspace. The client initialization is lazy, it will wait for the first time it needs to make a call (this will happen in the next code cell).
112
+
> Creating MLClient won't connect to the workspace. The client initialization is lazy, it will wait for the first time it needs to make a call (this will happen in the next code cell).
115
113
116
-
Verify the connection by making a call to `ml_client`. Since this is the first time that you're making a call to the workspace, you may be asked to authenticate.
114
+
Verify the connection by making a call to `ml_client`. Since this is the first time that you're making a call to the workspace, you might be asked to authenticate.
117
115
118
116
119
117
```python
120
118
# Verify that the handle works correctly.
121
-
# If you ge an error here, modify your SUBSCRIPTION, RESOURCE_GROUP, and WS_NAME in the previous cell.
119
+
# If you get an error here, modify your SUBSCRIPTION, RESOURCE_GROUP, and WS_NAME in the previous cell.
122
120
ws = ml_client.workspaces.get(WS_NAME)
123
121
print(ws.location, ":", ws.resource_group)
124
122
```
@@ -175,7 +173,7 @@ dependencies:
175
173
176
174
The specification contains some usual packages, that you use in your pipeline (numpy, pip), together with some Azure Machine Learning specific packages (azureml-mlflow).
177
175
178
-
The Azure Machine Learning packages aren't mandatory to run Azure Machine Learning jobs. However, adding these packages let you interact with Azure Machine Learning for logging metrics and registering models, all inside the Azure Machine Learning job. You use them in the training script later in this tutorial.
176
+
The Azure Machine Learning packages aren't mandatory to run Azure Machine Learning jobs. However, adding these packages lets you interact with Azure Machine Learning for logging metrics and registering models, all inside the Azure Machine Learning job. You use them in the training script later in this tutorial.
179
177
180
178
Use the *yaml* file to create and register this custom environment in your workspace:
181
179
@@ -210,17 +208,17 @@ Azure Machine Learning pipelines are reusable ML workflows that usually consist
210
208
- Write the yaml specification of the component, or create it programmatically using `ComponentMethod`.
211
209
- Optionally, register the component with a name and version in your workspace, to make it reusable and shareable.
212
210
- Load that component from the pipeline code.
213
-
- Implement the pipeline using the component's inputs, outputs and parameters.
211
+
- Implement the pipeline using the component's inputs, outputs, and parameters.
214
212
- Submit the pipeline.
215
213
216
-
There are two ways to create a component, programmatic and yaml definition. The next two sections walk you through creating a component both ways. You can either create the two components trying both options or pick your preferred method.
214
+
You can create a component in two ways: programmatic and yaml definition. The next two sections walk you through creating a component both ways. You can either create the two components trying both options or pick your preferred method.
217
215
218
216
> [!NOTE]
219
-
> In this tutorial for simplicity we are using the same compute for all components. However, you can set different computes for each component, for example by adding a line like `train_step.compute = "cpu-cluster"`. To view an example of building a pipeline with different computes for each component, see the [Basic pipeline job section in the cifar-10 pipeline tutorial](https://github.com/Azure/azureml-examples/blob/main/sdk/python/jobs/pipelines/2b_train_cifar_10_with_pytorch/train_cifar_10_with_pytorch.ipynb).
217
+
> In this tutorial for simplicity, we're using the same compute for all components. However, you can set different computes for each component, for example by adding a line like `train_step.compute = "cpu-cluster"`. To view an example of building a pipeline with different computes for each component, see the [Basic pipeline job section in the cifar-10 pipeline tutorial](https://github.com/Azure/azureml-examples/blob/main/sdk/python/jobs/pipelines/2b_train_cifar_10_with_pytorch/train_cifar_10_with_pytorch.ipynb).
220
218
221
219
### Create component 1: data prep (using programmatic definition)
222
220
223
-
Let's start by creating the first component. This component handles the preprocessing of the data. The preprocessing task is performed in the *data_prep.py* Python file.
221
+
Start by creating the first component. This component handles the preprocessing of the data. The preprocessing task is performed in the *data_prep.py* Python file.
224
222
225
223
First create a source folder for the data_prep component:
226
224
@@ -325,10 +323,10 @@ Optionally, register the component in the workspace for future reuse.
# Create (register) the component in your workspace
329
+
# Create and register the component in your workspace
332
330
print(
333
331
f"Component {data_prep_component.name} with Version {data_prep_component.version} is registered"
334
332
)
@@ -338,7 +336,7 @@ print(
338
336
339
337
The second component that you create consumes the training and test data, train a tree based model and return the output model. Use Azure Machine Learning logging capabilities to record and visualize the learning progress.
340
338
341
-
You used the `CommandComponent` class to create your first component. This time you use the yaml definition to define the second component. Each method has its own advantages. A yaml definition can actually be checked-in along the code, and would provide a readable history tracking. The programmatic method using `CommandComponent` can be easier with built-in class documentation and code completion.
339
+
You used the `CommandComponent` class to create your first component. This time you use the yaml definition to define the second component. Each method has its own advantages. A yaml definition can actually be checkedin along the code and provides readable history tracking. The programmatic method using `CommandComponent` can be easier with built-in class documentation and code completion.
342
340
343
341
Create the directory for this component:
344
342
@@ -488,7 +486,7 @@ command: >-
488
486
489
487
```
490
488
491
-
Now create and register the component. Registering it allows you to re-use it in other pipelines. Also, anyone else with access to your workspace can use the registered component.
489
+
Now create and register the component. Registering it allows you to reuse it in other pipelines. Also, anyone else with access to your workspace can use the registered component.
492
490
493
491
494
492
```python
@@ -498,10 +496,10 @@ from azure.ai.ml import load_component
# Create (register) the component in your workspace
502
+
# Create and register the component in your workspace
505
503
print(
506
504
f"Component {train_component.name} with Version {train_component.version} is registered"
507
505
)
@@ -512,14 +510,14 @@ print(
512
510
Now that both your components are defined and registered, you can start implementing the pipeline.
513
511
514
512
515
-
Here, you use *input data*, *split ratio* and *registered model name* as input variables. Then call the components and connect them via their inputs/outputs identifiers. The outputs of each step can be accessed via the `.outputs` property.
513
+
Here, you use *input data*, *split ratio*, and *registered model name* as input variables. Then call the components and connect them via their inputs/outputs identifiers. The outputs of each step can be accessed via the `.outputs` property.
516
514
517
515
518
-
The Python functions returned by `load_component()` work as any regular Python function that we use within a pipeline to call each step.
516
+
The Python functions returned by `load_component()` work as any regular Python function that you use within a pipeline to call each step.
519
517
520
-
To code the pipeline, you use a specific `@dsl.pipeline` decorator that identifies the Azure Machine Learning pipelines. In the decorator, we can specify the pipeline description and default resources like compute and storage. Like a Python function, pipelines can have inputs. You can then create multiple instances of a single pipeline with different inputs.
518
+
To code the pipeline, you use a specific `@dsl.pipeline` decorator that identifies the Azure Machine Learning pipelines. In the decorator, you can specify the pipeline description and default resources like compute and storage. Like a Python function, pipelines can have inputs. You can then create multiple instances of a single pipeline with different inputs.
521
519
522
-
Here, we used*input data*, *split ratio* and *registered model name* as input variables. We then call the components and connect them via their inputs/outputs identifiers. The outputs of each step can be accessed via the `.outputs` property.
520
+
Here, you use*input data*, *split ratio*, and *registered model name* as input variables. You then call the components and connect them via their inputs/outputs identifiers. The outputs of each step can be accessed via the `.outputs` property.
You can track the progress of your pipeline, by using the link generated in the previous cell. When you first select this link, you may see that the pipeline is still running. Once it's complete, you can examine each component's results.
594
+
You can track the progress of your pipeline, by using the link generated in the previous cell. When you first select this link, you might see that the pipeline is still running. Once it's complete, you can examine each component's results.
597
595
598
596
Double-click the **Train Credit Defaults Model** component.
599
597
600
-
There are two important results you'll want to see about training:
598
+
There are two important results you want to see about training:
601
599
602
600
* View your logs:
603
601
1. Select the **Outputs+logs** tab.
604
602
1. Open the folders to `user_logs` > `std_log.txt`
605
603
This section shows the script run stdout.
606
604
:::image type="content" source="media/tutorial-pipeline-python-sdk/user-logs.jpg" alt-text="Screenshot of std_log.txt." lightbox="media/tutorial-pipeline-python-sdk/user-logs.jpg":::
607
-
* View your metrics: Select the **Metrics** tab. This section shows different logged metrics. In this example. mlflow `autologging`, has automatically logged the training metrics.
605
+
* View your metrics: Select the **Metrics** tab. This section shows different logged metrics. In this example, mlflow `autologging` has automatically logged the training metrics.
0 commit comments