You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/machine-learning/tutorial-pipeline-python-sdk.md
+69-33Lines changed: 69 additions & 33 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -9,7 +9,7 @@ ms.topic: tutorial
9
9
author: lgayhardt
10
10
ms.author: lagayhar
11
11
ms.reviewer: keli19
12
-
ms.date: 05/15/2024
12
+
ms.date: 09/09/2025
13
13
ms.custom:
14
14
- sdkv2
15
15
- build-2023
@@ -25,11 +25,9 @@ ms.custom:
25
25
> [!NOTE]
26
26
> For a tutorial that uses SDK v1 to build a pipeline, see [Tutorial: Build an Azure Machine Learning pipeline for image classification](v1/tutorial-pipeline-python-sdk.md)
27
27
28
-
The core of a machine learning pipeline is to split a complete machine learning task into a multistep workflow. Each step is a manageable component that can be developed, optimized, configured, and automated individually. Steps are connected through well-defined interfaces. The Azure Machine Learning pipeline service automatically orchestrates all the dependencies between pipeline steps. The benefits of using a pipeline are standardized the MLOps practice, scalable team collaboration, training efficiency and cost reduction. To learn more about the benefits of pipelines, see [What are Azure Machine Learning pipelines](concept-ml-pipelines.md).
28
+
The core of a machine learning pipeline is to split a complete machine learning task into a multistep workflow. Each step is a manageable component that can be developed, optimized, configured, and automated individually. Steps are connected through well-defined interfaces. The Azure Machine Learning pipeline service automatically orchestrates all the dependencies between pipeline steps. The benefits of using a pipeline are standardized the MLOps practice, scalable team collaboration, training efficiency, and cost reduction. To learn more about the benefits of pipelines, see [What are Azure Machine Learning pipelines](concept-ml-pipelines.md).
29
29
30
-
In this tutorial, you use Azure Machine Learning to create a production ready machine learning project, using Azure Machine Learning Python SDK v2.
31
-
32
-
This means you will be able to leverage the Azure Machine Learning Python SDK to:
30
+
In this tutorial, you use Azure Machine Learning to create a production-ready machine learning project, using Azure Machine Learning Python SDK v2. This means you are able to use the Azure Machine Learning Python SDK to:
33
31
34
32
> [!div class="checklist"]
35
33
> - Get a handle to your Azure Machine Learning workspace
@@ -58,7 +56,7 @@ This video shows how to get started in Azure Machine Learning studio so that you
1. Complete the tutorial [Upload, access and explore your data](tutorial-explore-data.md) to create the data asset you need in this tutorial. Make sure you run all the code to create the initial data asset. Explore the data and revise it if you wish, but you'll only need the initial data in this tutorial.
59
+
1. Complete the tutorial [Upload, access and explore your data](tutorial-explore-data.md) to create the data asset you need in this tutorial. Make sure you run all the code to create the initial data asset. Explore the data and revise it if you wish, but you only need the initial data in this tutorial.
62
60
63
61
1.[!INCLUDE [open or create notebook](includes/prereq-open-or-create.md)]
@@ -73,21 +71,21 @@ This video shows how to get started in Azure Machine Learning studio so that you
73
71
74
72
The Azure Machine Learning framework can be used from CLI, Python SDK, or studio interface. In this example, you use the Azure Machine Learning Python SDK v2 to create a pipeline.
75
73
76
-
Before creating the pipeline, you need the following resources:
74
+
Before creating the pipeline, you need these resources:
77
75
78
76
* The data asset for training
79
77
* The software environment to run the pipeline
80
78
* A compute resource to where the job runs
81
79
82
80
## Create handle to workspace
83
81
84
-
Before we dive in the code, you need a way to reference your workspace. You'll create `ml_client` for a handle to the workspace. You'll then use `ml_client` to manage resources and jobs.
82
+
Before we dive in the code, you need a way to reference your workspace. You create `ml_client` for a handle to the workspace. You then use `ml_client` to manage resources and jobs.
85
83
86
-
In the next cell, enter your Subscription ID, Resource Group name and Workspace name. To find these values:
84
+
In the next cell, enter your Subscription ID, Resource Group name, and Workspace name. To find these values:
87
85
88
86
1. In the upper right Azure Machine Learning studio toolbar, select your workspace name.
89
-
1. Copy the value for workspace, resource group and subscription ID into the code.
90
-
1. You'll need to copy one value, close the area and paste, then come back for the next one.
87
+
1. Copy the value for workspace, resource group, and subscription ID into the code.
88
+
1. You need to copy one value, close the area, and paste, then come back for the next one.
> Creating MLClient will not connect to the workspace. The client initialization is lazy, it will wait for the first time it needs to make a call (this will happen in the next code cell).
116
+
> Creating MLClient won't connect to the workspace. The client initialization is lazy, it will wait for the first time it needs to make a call (this will happen in the next code cell).
115
117
116
-
Verify the connection by making a call to `ml_client`. Since this is the first time that you're making a call to the workspace, you may be asked to authenticate.
118
+
Verify the connection by making a call to `ml_client`. Since this is the first time that you're making a call to the workspace, you might be asked to authenticate.
117
119
118
120
119
121
```python
120
122
# Verify that the handle works correctly.
121
-
# If you ge an error here, modify your SUBSCRIPTION, RESOURCE_GROUP, and WS_NAME in the previous cell.
123
+
# If you get an error here, modify your SUBSCRIPTION, RESOURCE_GROUP, and WS_NAME in the previous cell.
Start by getting the data that you previously registered in [Tutorial: Upload, access and explore your data in Azure Machine Learning](tutorial-explore-data.md).
So far, you've created a development environment on the compute instance, your development machine. You also need an environment to use for each step of the pipeline. Each step can have its own environment, or you can use some common environments for multiple steps.
@@ -175,7 +183,7 @@ dependencies:
175
183
176
184
The specification contains some usual packages, that you use in your pipeline (numpy, pip), together with some Azure Machine Learning specific packages (azureml-mlflow).
177
185
178
-
The Azure Machine Learning packages aren't mandatory to run Azure Machine Learning jobs. However, adding these packages let you interact with Azure Machine Learning for logging metrics and registering models, all inside the Azure Machine Learning job. You use them in the training script later in this tutorial.
186
+
The Azure Machine Learning packages aren't mandatory to run Azure Machine Learning jobs. However, adding these packages lets you interact with Azure Machine Learning for logging metrics and registering models, all inside the Azure Machine Learning job. You use them in the training script later in this tutorial.
179
187
180
188
Use the *yaml* file to create and register this custom environment in your workspace:
Now that you have all assets required to run your pipeline, it's time to build the pipeline itself.
@@ -210,17 +222,17 @@ Azure Machine Learning pipelines are reusable ML workflows that usually consist
210
222
- Write the yaml specification of the component, or create it programmatically using `ComponentMethod`.
211
223
- Optionally, register the component with a name and version in your workspace, to make it reusable and shareable.
212
224
- Load that component from the pipeline code.
213
-
- Implement the pipeline using the component's inputs, outputs and parameters.
225
+
- Implement the pipeline using the component's inputs, outputs, and parameters.
214
226
- Submit the pipeline.
215
227
216
-
There are two ways to create a component, programmatic and yaml definition. The next two sections walk you through creating a component both ways. You can either create the two components trying both options or pick your preferred method.
228
+
You can create a component in two ways: programmatic and yaml definition. The next two sections walk you through creating a component both ways. You can either create the two components trying both options or pick your preferred method.
217
229
218
230
> [!NOTE]
219
-
> In this tutorial for simplicity we are using the same compute for all components. However, you can set different computes for each component, for example by adding a line like `train_step.compute = "cpu-cluster"`. To view an example of building a pipeline with different computes for each component, see the [Basic pipeline job section in the cifar-10 pipeline tutorial](https://github.com/Azure/azureml-examples/blob/main/sdk/python/jobs/pipelines/2b_train_cifar_10_with_pytorch/train_cifar_10_with_pytorch.ipynb).
231
+
> In this tutorial for simplicity, we're using the same compute for all components. However, you can set different computes for each component, for example by adding a line like `train_step.compute = "cpu-cluster"`. To view an example of building a pipeline with different computes for each component, see the [Basic pipeline job section in the cifar-10 pipeline tutorial](https://github.com/Azure/azureml-examples/blob/main/sdk/python/jobs/pipelines/2b_train_cifar_10_with_pytorch/train_cifar_10_with_pytorch.ipynb).
220
232
221
233
### Create component 1: data prep (using programmatic definition)
222
234
223
-
Let's start by creating the first component. This component handles the preprocessing of the data. The preprocessing task is performed in the *data_prep.py* Python file.
235
+
Start by creating the first component. This component handles the preprocessing of the data. The preprocessing task is performed in the *data_prep.py* Python file.
224
236
225
237
First create a source folder for the data_prep component:
### Create component 2: training (using yaml definition)
338
358
339
359
The second component that you create consumes the training and test data, train a tree based model and return the output model. Use Azure Machine Learning logging capabilities to record and visualize the learning progress.
340
360
341
-
You used the `CommandComponent` class to create your first component. This time you use the yaml definition to define the second component. Each method has its own advantages. A yaml definition can actually be checked-in along the code, and would provide a readable history tracking. The programmatic method using `CommandComponent` can be easier with built-in class documentation and code completion.
361
+
You used the `CommandComponent` class to create your first component. This time you use the yaml definition to define the second component. Each method has its own advantages. A yaml definition can actually be checkedin along the code and provides readable history tracking. The programmatic method using `CommandComponent` can be easier with built-in class documentation and code completion.
342
362
343
363
Create the directory for this component:
344
364
@@ -488,7 +508,7 @@ command: >-
488
508
489
509
```
490
510
491
-
Now create and register the component. Registering it allows you to re-use it in other pipelines. Also, anyone else with access to your workspace can use the registered component.
511
+
Now create and register the component. Registering it allows you to reuse it in other pipelines. Also, anyone else with access to your workspace can use the registered component.
492
512
493
513
494
514
```python
@@ -498,28 +518,32 @@ from azure.ai.ml import load_component
Now that both your components are defined and registered, you can start implementing the pipeline.
513
537
514
538
515
-
Here, you use *input data*, *split ratio* and *registered model name* as input variables. Then call the components and connect them via their inputs/outputs identifiers. The outputs of each step can be accessed via the `.outputs` property.
539
+
Here, you use *input data*, *split ratio*, and *registered model name* as input variables. Then call the components and connect them via their inputs/outputs identifiers. The outputs of each step can be accessed via the `.outputs` property.
516
540
517
541
518
-
The Python functions returned by `load_component()` work as any regular Python function that we use within a pipeline to call each step.
542
+
The Python functions returned by `load_component()` work as any regular Python function that you use within a pipeline to call each step.
519
543
520
-
To code the pipeline, you use a specific `@dsl.pipeline` decorator that identifies the Azure Machine Learning pipelines. In the decorator, we can specify the pipeline description and default resources like compute and storage. Like a Python function, pipelines can have inputs. You can then create multiple instances of a single pipeline with different inputs.
544
+
To code the pipeline, you use a specific `@dsl.pipeline` decorator that identifies the Azure Machine Learning pipelines. In the decorator, you can specify the pipeline description and default resources like compute and storage. Like a Python function, pipelines can have inputs. You can then create multiple instances of a single pipeline with different inputs.
521
545
522
-
Here, we used*input data*, *split ratio* and *registered model name* as input variables. We then call the components and connect them via their inputs/outputs identifiers. The outputs of each step can be accessed via the `.outputs` property.
546
+
Here, you use*input data*, *split ratio*, and *registered model name* as input variables. You then call the components and connect them via their inputs/outputs identifiers. The outputs of each step can be accessed via the `.outputs` property.
You can track the progress of your pipeline, by using the link generated in the previous cell. When you first select this link, you may see that the pipeline is still running. Once it's complete, you can examine each component's results.
You can track the progress of your pipeline, by using the link generated in the previous cell. When you first select this link, you might see that the pipeline is still running. Once it's complete, you can examine each component's results.
597
633
598
634
Double-click the **Train Credit Defaults Model** component.
599
635
600
-
There are two important results you'll want to see about training:
636
+
There are two important results you want to see about training:
601
637
602
638
* View your logs:
603
639
1. Select the **Outputs+logs** tab.
604
640
1. Open the folders to `user_logs` > `std_log.txt`
605
641
This section shows the script run stdout.
606
642
:::image type="content" source="media/tutorial-pipeline-python-sdk/user-logs.jpg" alt-text="Screenshot of std_log.txt." lightbox="media/tutorial-pipeline-python-sdk/user-logs.jpg":::
607
-
* View your metrics: Select the **Metrics** tab. This section shows different logged metrics. In this example. mlflow `autologging`, has automatically logged the training metrics.
643
+
* View your metrics: Select the **Metrics** tab. This section shows different logged metrics. In this example, mlflow `autologging` has automatically logged the training metrics.
0 commit comments