Skip to content

Commit 222f632

Browse files
Merge pull request #7006 from s-polly/stp_pipelines_9-9
Freshness - pipelines tutorial
2 parents a03cc76 + 4de080b commit 222f632

File tree

1 file changed

+69
-33
lines changed

1 file changed

+69
-33
lines changed

articles/machine-learning/tutorial-pipeline-python-sdk.md

Lines changed: 69 additions & 33 deletions
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ ms.topic: tutorial
99
author: lgayhardt
1010
ms.author: lagayhar
1111
ms.reviewer: keli19
12-
ms.date: 05/15/2024
12+
ms.date: 09/09/2025
1313
ms.custom:
1414
- sdkv2
1515
- build-2023
@@ -25,11 +25,9 @@ ms.custom:
2525
> [!NOTE]
2626
> For a tutorial that uses SDK v1 to build a pipeline, see [Tutorial: Build an Azure Machine Learning pipeline for image classification](v1/tutorial-pipeline-python-sdk.md)
2727
28-
The core of a machine learning pipeline is to split a complete machine learning task into a multistep workflow. Each step is a manageable component that can be developed, optimized, configured, and automated individually. Steps are connected through well-defined interfaces. The Azure Machine Learning pipeline service automatically orchestrates all the dependencies between pipeline steps. The benefits of using a pipeline are standardized the MLOps practice, scalable team collaboration, training efficiency and cost reduction. To learn more about the benefits of pipelines, see [What are Azure Machine Learning pipelines](concept-ml-pipelines.md).
28+
The core of a machine learning pipeline is to split a complete machine learning task into a multistep workflow. Each step is a manageable component that can be developed, optimized, configured, and automated individually. Steps are connected through well-defined interfaces. The Azure Machine Learning pipeline service automatically orchestrates all the dependencies between pipeline steps. The benefits of using a pipeline are standardized the MLOps practice, scalable team collaboration, training efficiency, and cost reduction. To learn more about the benefits of pipelines, see [What are Azure Machine Learning pipelines](concept-ml-pipelines.md).
2929

30-
In this tutorial, you use Azure Machine Learning to create a production ready machine learning project, using Azure Machine Learning Python SDK v2.
31-
32-
This means you will be able to leverage the Azure Machine Learning Python SDK to:
30+
In this tutorial, you use Azure Machine Learning to create a production-ready machine learning project, using Azure Machine Learning Python SDK v2. This means you are able to use the Azure Machine Learning Python SDK to:
3331

3432
> [!div class="checklist"]
3533
> - Get a handle to your Azure Machine Learning workspace
@@ -58,7 +56,7 @@ This video shows how to get started in Azure Machine Learning studio so that you
5856

5957
1. [!INCLUDE [sign in](includes/prereq-sign-in.md)]
6058

61-
1. Complete the tutorial [Upload, access and explore your data](tutorial-explore-data.md) to create the data asset you need in this tutorial. Make sure you run all the code to create the initial data asset. Explore the data and revise it if you wish, but you'll only need the initial data in this tutorial.
59+
1. Complete the tutorial [Upload, access and explore your data](tutorial-explore-data.md) to create the data asset you need in this tutorial. Make sure you run all the code to create the initial data asset. Explore the data and revise it if you wish, but you only need the initial data in this tutorial.
6260

6361
1. [!INCLUDE [open or create notebook](includes/prereq-open-or-create.md)]
6462
* [!INCLUDE [new notebook](includes/prereq-new-notebook.md)]
@@ -73,21 +71,21 @@ This video shows how to get started in Azure Machine Learning studio so that you
7371

7472
The Azure Machine Learning framework can be used from CLI, Python SDK, or studio interface. In this example, you use the Azure Machine Learning Python SDK v2 to create a pipeline.
7573

76-
Before creating the pipeline, you need the following resources:
74+
Before creating the pipeline, you need these resources:
7775

7876
* The data asset for training
7977
* The software environment to run the pipeline
8078
* A compute resource to where the job runs
8179

8280
## Create handle to workspace
8381

84-
Before we dive in the code, you need a way to reference your workspace. You'll create `ml_client` for a handle to the workspace. You'll then use `ml_client` to manage resources and jobs.
82+
Before we dive in the code, you need a way to reference your workspace. You create `ml_client` for a handle to the workspace. You then use `ml_client` to manage resources and jobs.
8583

86-
In the next cell, enter your Subscription ID, Resource Group name and Workspace name. To find these values:
84+
In the next cell, enter your Subscription ID, Resource Group name, and Workspace name. To find these values:
8785

8886
1. In the upper right Azure Machine Learning studio toolbar, select your workspace name.
89-
1. Copy the value for workspace, resource group and subscription ID into the code.
90-
1. You'll need to copy one value, close the area and paste, then come back for the next one.
87+
1. Copy the value for workspace, resource group, and subscription ID into the code.
88+
1. You need to copy one value, close the area, and paste, then come back for the next one.
9189

9290

9391

@@ -110,19 +108,26 @@ ml_client = MLClient(
110108
)
111109
```
112110

111+
**SDK Reference:**
112+
- [MLClient](/python/api/azure-ai-ml/azure.ai.ml.mlclient)
113+
- [DefaultAzureCredential](/python/api/azure-identity/azure.identity.defaultazurecredential)
114+
113115
> [!NOTE]
114-
> Creating MLClient will not connect to the workspace. The client initialization is lazy, it will wait for the first time it needs to make a call (this will happen in the next code cell).
116+
> Creating MLClient won't connect to the workspace. The client initialization is lazy, it will wait for the first time it needs to make a call (this will happen in the next code cell).
115117
116-
Verify the connection by making a call to `ml_client`. Since this is the first time that you're making a call to the workspace, you may be asked to authenticate.
118+
Verify the connection by making a call to `ml_client`. Since this is the first time that you're making a call to the workspace, you might be asked to authenticate.
117119

118120

119121
```python
120122
# Verify that the handle works correctly.
121-
# If you ge an error here, modify your SUBSCRIPTION, RESOURCE_GROUP, and WS_NAME in the previous cell.
123+
# If you get an error here, modify your SUBSCRIPTION, RESOURCE_GROUP, and WS_NAME in the previous cell.
122124
ws = ml_client.workspaces.get(WS_NAME)
123125
print(ws.location, ":", ws.resource_group)
124126
```
125127

128+
**SDK Reference:**
129+
- [WorkspaceOperations.get](/python/api/azure-ai-ml/azure.ai.ml.operations.workspaceoperations#azure-ai-ml-operations-workspaceoperations-get)
130+
126131
## Access the registered data asset
127132

128133
Start by getting the data that you previously registered in [Tutorial: Upload, access and explore your data in Azure Machine Learning](tutorial-explore-data.md).
@@ -136,6 +141,9 @@ credit_data = ml_client.data.get(name="credit-card", version="initial")
136141
print(f"Data asset URI: {credit_data.path}")
137142
```
138143

144+
**SDK Reference:**
145+
- [DataOperations.get](/python/api/azure-ai-ml/azure.ai.ml.operations.dataoperations#azure-ai-ml-operations-dataoperations-get)
146+
139147
## Create a job environment for pipeline steps
140148

141149
So far, you've created a development environment on the compute instance, your development machine. You also need an environment to use for each step of the pipeline. Each step can have its own environment, or you can use some common environments for multiple steps.
@@ -175,7 +183,7 @@ dependencies:
175183

176184
The specification contains some usual packages, that you use in your pipeline (numpy, pip), together with some Azure Machine Learning specific packages (azureml-mlflow).
177185

178-
The Azure Machine Learning packages aren't mandatory to run Azure Machine Learning jobs. However, adding these packages let you interact with Azure Machine Learning for logging metrics and registering models, all inside the Azure Machine Learning job. You use them in the training script later in this tutorial.
186+
The Azure Machine Learning packages aren't mandatory to run Azure Machine Learning jobs. However, adding these packages lets you interact with Azure Machine Learning for logging metrics and registering models, all inside the Azure Machine Learning job. You use them in the training script later in this tutorial.
179187

180188
Use the *yaml* file to create and register this custom environment in your workspace:
181189

@@ -201,6 +209,10 @@ print(
201209
)
202210
```
203211

212+
**SDK Reference:**
213+
- [Environment](/python/api/azure-ai-ml/azure.ai.ml.entities.environment)
214+
- [EnvironmentOperations.create_or_update](/python/api/azure-ai-ml/azure.ai.ml.operations.environmentoperations#azure-ai-ml-operations-environmentoperations-create-or-update)
215+
204216
## Build the training pipeline
205217

206218
Now that you have all assets required to run your pipeline, it's time to build the pipeline itself.
@@ -210,17 +222,17 @@ Azure Machine Learning pipelines are reusable ML workflows that usually consist
210222
- Write the yaml specification of the component, or create it programmatically using `ComponentMethod`.
211223
- Optionally, register the component with a name and version in your workspace, to make it reusable and shareable.
212224
- Load that component from the pipeline code.
213-
- Implement the pipeline using the component's inputs, outputs and parameters.
225+
- Implement the pipeline using the component's inputs, outputs, and parameters.
214226
- Submit the pipeline.
215227

216-
There are two ways to create a component, programmatic and yaml definition. The next two sections walk you through creating a component both ways. You can either create the two components trying both options or pick your preferred method.
228+
You can create a component in two ways: programmatic and yaml definition. The next two sections walk you through creating a component both ways. You can either create the two components trying both options or pick your preferred method.
217229

218230
> [!NOTE]
219-
> In this tutorial for simplicity we are using the same compute for all components. However, you can set different computes for each component, for example by adding a line like `train_step.compute = "cpu-cluster"`. To view an example of building a pipeline with different computes for each component, see the [Basic pipeline job section in the cifar-10 pipeline tutorial](https://github.com/Azure/azureml-examples/blob/main/sdk/python/jobs/pipelines/2b_train_cifar_10_with_pytorch/train_cifar_10_with_pytorch.ipynb).
231+
> In this tutorial for simplicity, we're using the same compute for all components. However, you can set different computes for each component, for example by adding a line like `train_step.compute = "cpu-cluster"`. To view an example of building a pipeline with different computes for each component, see the [Basic pipeline job section in the cifar-10 pipeline tutorial](https://github.com/Azure/azureml-examples/blob/main/sdk/python/jobs/pipelines/2b_train_cifar_10_with_pytorch/train_cifar_10_with_pytorch.ipynb).
220232
221233
### Create component 1: data prep (using programmatic definition)
222234

223-
Let's start by creating the first component. This component handles the preprocessing of the data. The preprocessing task is performed in the *data_prep.py* Python file.
235+
Start by creating the first component. This component handles the preprocessing of the data. The preprocessing task is performed in the *data_prep.py* Python file.
224236

225237
First create a source folder for the data_prep component:
226238

@@ -320,25 +332,33 @@ data_prep_component = command(
320332
)
321333
```
322334

335+
**SDK Reference:**
336+
- [command](/python/api/azure-ai-ml/azure.ai.ml#azure-ai-ml-command)
337+
- [Input](/python/api/azure-ai-ml/azure.ai.ml.input)
338+
- [Output](/python/api/azure-ai-ml/azure.ai.ml.output)
339+
323340
Optionally, register the component in the workspace for future reuse.
324341

325342

326343

327344
```python
328-
# Now we register the component to the workspace
345+
# Now register the component to the workspace
329346
data_prep_component = ml_client.create_or_update(data_prep_component.component)
330347

331-
# Create (register) the component in your workspace
348+
# Create and register the component in your workspace
332349
print(
333350
f"Component {data_prep_component.name} with Version {data_prep_component.version} is registered"
334351
)
335352
```
336353

354+
**SDK Reference:**
355+
- [MLClient.create_or_update](/python/api/azure-ai-ml/azure.ai.ml.mlclient#azure-ai-ml-mlclient-create-or-update)
356+
337357
### Create component 2: training (using yaml definition)
338358

339359
The second component that you create consumes the training and test data, train a tree based model and return the output model. Use Azure Machine Learning logging capabilities to record and visualize the learning progress.
340360

341-
You used the `CommandComponent` class to create your first component. This time you use the yaml definition to define the second component. Each method has its own advantages. A yaml definition can actually be checked-in along the code, and would provide a readable history tracking. The programmatic method using `CommandComponent` can be easier with built-in class documentation and code completion.
361+
You used the `CommandComponent` class to create your first component. This time you use the yaml definition to define the second component. Each method has its own advantages. A yaml definition can actually be checked in along the code and provides readable history tracking. The programmatic method using `CommandComponent` can be easier with built-in class documentation and code completion.
342362

343363
Create the directory for this component:
344364

@@ -488,7 +508,7 @@ command: >-
488508

489509
```
490510

491-
Now create and register the component. Registering it allows you to re-use it in other pipelines. Also, anyone else with access to your workspace can use the registered component.
511+
Now create and register the component. Registering it allows you to reuse it in other pipelines. Also, anyone else with access to your workspace can use the registered component.
492512

493513

494514
```python
@@ -498,28 +518,32 @@ from azure.ai.ml import load_component
498518
# Loading the component from the yml file
499519
train_component = load_component(source=os.path.join(train_src_dir, "train.yml"))
500520

501-
# Now we register the component to the workspace
521+
# Now register the component to the workspace
502522
train_component = ml_client.create_or_update(train_component)
503523

504-
# Create (register) the component in your workspace
524+
# Create and register the component in your workspace
505525
print(
506526
f"Component {train_component.name} with Version {train_component.version} is registered"
507527
)
508528
```
509529

530+
**SDK Reference:**
531+
- [load_component](/python/api/azure-ai-ml/azure.ai.ml#azure-ai-ml-load-component)
532+
- [MLClient.create_or_update](/python/api/azure-ai-ml/azure.ai.ml.mlclient#azure-ai-ml-mlclient-create-or-update)
533+
510534
### Create the pipeline from components
511535

512536
Now that both your components are defined and registered, you can start implementing the pipeline.
513537

514538

515-
Here, you use *input data*, *split ratio* and *registered model name* as input variables. Then call the components and connect them via their inputs/outputs identifiers. The outputs of each step can be accessed via the `.outputs` property.
539+
Here, you use *input data*, *split ratio*, and *registered model name* as input variables. Then call the components and connect them via their inputs/outputs identifiers. The outputs of each step can be accessed via the `.outputs` property.
516540

517541

518-
The Python functions returned by `load_component()` work as any regular Python function that we use within a pipeline to call each step.
542+
The Python functions returned by `load_component()` work as any regular Python function that you use within a pipeline to call each step.
519543

520-
To code the pipeline, you use a specific `@dsl.pipeline` decorator that identifies the Azure Machine Learning pipelines. In the decorator, we can specify the pipeline description and default resources like compute and storage. Like a Python function, pipelines can have inputs. You can then create multiple instances of a single pipeline with different inputs.
544+
To code the pipeline, you use a specific `@dsl.pipeline` decorator that identifies the Azure Machine Learning pipelines. In the decorator, you can specify the pipeline description and default resources like compute and storage. Like a Python function, pipelines can have inputs. You can then create multiple instances of a single pipeline with different inputs.
521545

522-
Here, we used *input data*, *split ratio* and *registered model name* as input variables. We then call the components and connect them via their inputs/outputs identifiers. The outputs of each step can be accessed via the `.outputs` property.
546+
Here, you use *input data*, *split ratio*, and *registered model name* as input variables. You then call the components and connect them via their inputs/outputs identifiers. The outputs of each step can be accessed via the `.outputs` property.
523547

524548

525549
```python
@@ -559,6 +583,11 @@ def credit_defaults_pipeline(
559583
}
560584
```
561585

586+
**SDK Reference:**
587+
- [dsl.pipeline](/python/api/azure-ai-ml/azure.ai.ml.dsl#azure-ai-ml-dsl-pipeline)
588+
- [Input](/python/api/azure-ai-ml/azure.ai.ml.input)
589+
- [Output](/python/api/azure-ai-ml/azure.ai.ml.output)
590+
562591
Now use your pipeline definition to instantiate a pipeline with your dataset, split rate of choice and the name you picked for your model.
563592

564593

@@ -574,6 +603,9 @@ pipeline = credit_defaults_pipeline(
574603
)
575604
```
576605

606+
**SDK Reference:**
607+
- [Input](/python/api/azure-ai-ml/azure.ai.ml.input)
608+
577609
## Submit the job
578610

579611
It's now time to submit the job to run in Azure Machine Learning. This time you use `create_or_update` on `ml_client.jobs`.
@@ -593,18 +625,22 @@ pipeline_job = ml_client.jobs.create_or_update(
593625
ml_client.jobs.stream(pipeline_job.name)
594626
```
595627

596-
You can track the progress of your pipeline, by using the link generated in the previous cell. When you first select this link, you may see that the pipeline is still running. Once it's complete, you can examine each component's results.
628+
**SDK Reference:**
629+
- [JobOperations.create_or_update](/python/api/azure-ai-ml/azure.ai.ml.operations.joboperations#azure-ai-ml-operations-joboperations-create-or-update)
630+
- [JobOperations.stream](/python/api/azure-ai-ml/azure.ai.ml.operations.joboperations#azure-ai-ml-operations-joboperations-stream)
631+
632+
You can track the progress of your pipeline, by using the link generated in the previous cell. When you first select this link, you might see that the pipeline is still running. Once it's complete, you can examine each component's results.
597633

598634
Double-click the **Train Credit Defaults Model** component.
599635

600-
There are two important results you'll want to see about training:
636+
There are two important results you want to see about training:
601637

602638
* View your logs:
603639
1. Select the **Outputs+logs** tab.
604640
1. Open the folders to `user_logs` > `std_log.txt`
605641
This section shows the script run stdout.
606642
:::image type="content" source="media/tutorial-pipeline-python-sdk/user-logs.jpg" alt-text="Screenshot of std_log.txt." lightbox="media/tutorial-pipeline-python-sdk/user-logs.jpg":::
607-
* View your metrics: Select the **Metrics** tab. This section shows different logged metrics. In this example. mlflow `autologging`, has automatically logged the training metrics.
643+
* View your metrics: Select the **Metrics** tab. This section shows different logged metrics. In this example, mlflow `autologging` has automatically logged the training metrics.
608644

609645
:::image type="content" source="media/tutorial-pipeline-python-sdk/metrics.jpg" alt-text="Screenshot shows logged metrics.txt." lightbox="media/tutorial-pipeline-python-sdk/metrics.jpg":::
610646

@@ -621,7 +657,7 @@ If you plan to continue now to other tutorials, skip to [Next steps](#next-steps
621657

622658
### Stop compute instance
623659

624-
If you're not going to use it now, stop the compute instance:
660+
If you aren't going to use it now, stop the compute instance:
625661

626662
1. In the studio, in the left pane, select **Compute**.
627663
1. In the top tabs, select **Compute instances**

0 commit comments

Comments
 (0)