Skip to content

Commit 0518780

Browse files
committed
freshness edit
1 parent 48b8bce commit 0518780

File tree

1 file changed

+31
-33
lines changed

1 file changed

+31
-33
lines changed

articles/machine-learning/tutorial-pipeline-python-sdk.md

Lines changed: 31 additions & 33 deletions
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ ms.topic: tutorial
99
author: lgayhardt
1010
ms.author: lagayhar
1111
ms.reviewer: keli19
12-
ms.date: 05/15/2024
12+
ms.date: 09/09/2025
1313
ms.custom:
1414
- sdkv2
1515
- build-2023
@@ -25,11 +25,9 @@ ms.custom:
2525
> [!NOTE]
2626
> For a tutorial that uses SDK v1 to build a pipeline, see [Tutorial: Build an Azure Machine Learning pipeline for image classification](v1/tutorial-pipeline-python-sdk.md)
2727
28-
The core of a machine learning pipeline is to split a complete machine learning task into a multistep workflow. Each step is a manageable component that can be developed, optimized, configured, and automated individually. Steps are connected through well-defined interfaces. The Azure Machine Learning pipeline service automatically orchestrates all the dependencies between pipeline steps. The benefits of using a pipeline are standardized the MLOps practice, scalable team collaboration, training efficiency and cost reduction. To learn more about the benefits of pipelines, see [What are Azure Machine Learning pipelines](concept-ml-pipelines.md).
28+
The core of a machine learning pipeline is to split a complete machine learning task into a multistep workflow. Each step is a manageable component that can be developed, optimized, configured, and automated individually. Steps are connected through well-defined interfaces. The Azure Machine Learning pipeline service automatically orchestrates all the dependencies between pipeline steps. The benefits of using a pipeline are standardized the MLOps practice, scalable team collaboration, training efficiency, and cost reduction. To learn more about the benefits of pipelines, see [What are Azure Machine Learning pipelines](concept-ml-pipelines.md).
2929

30-
In this tutorial, you use Azure Machine Learning to create a production ready machine learning project, using Azure Machine Learning Python SDK v2.
31-
32-
This means you will be able to leverage the Azure Machine Learning Python SDK to:
30+
In this tutorial, you use Azure Machine Learning to create a production-ready machine learning project, using Azure Machine Learning Python SDK v2. This means you are able to use the Azure Machine Learning Python SDK to:
3331

3432
> [!div class="checklist"]
3533
> - Get a handle to your Azure Machine Learning workspace
@@ -58,7 +56,7 @@ This video shows how to get started in Azure Machine Learning studio so that you
5856

5957
1. [!INCLUDE [sign in](includes/prereq-sign-in.md)]
6058

61-
1. Complete the tutorial [Upload, access and explore your data](tutorial-explore-data.md) to create the data asset you need in this tutorial. Make sure you run all the code to create the initial data asset. Explore the data and revise it if you wish, but you'll only need the initial data in this tutorial.
59+
1. Complete the tutorial [Upload, access and explore your data](tutorial-explore-data.md) to create the data asset you need in this tutorial. Make sure you run all the code to create the initial data asset. Explore the data and revise it if you wish, but you only need the initial data in this tutorial.
6260

6361
1. [!INCLUDE [open or create notebook](includes/prereq-open-or-create.md)]
6462
* [!INCLUDE [new notebook](includes/prereq-new-notebook.md)]
@@ -73,21 +71,21 @@ This video shows how to get started in Azure Machine Learning studio so that you
7371

7472
The Azure Machine Learning framework can be used from CLI, Python SDK, or studio interface. In this example, you use the Azure Machine Learning Python SDK v2 to create a pipeline.
7573

76-
Before creating the pipeline, you need the following resources:
74+
Before creating the pipeline, you need these resources:
7775

7876
* The data asset for training
7977
* The software environment to run the pipeline
8078
* A compute resource to where the job runs
8179

8280
## Create handle to workspace
8381

84-
Before we dive in the code, you need a way to reference your workspace. You'll create `ml_client` for a handle to the workspace. You'll then use `ml_client` to manage resources and jobs.
82+
Before we dive in the code, you need a way to reference your workspace. You create `ml_client` for a handle to the workspace. You then use `ml_client` to manage resources and jobs.
8583

86-
In the next cell, enter your Subscription ID, Resource Group name and Workspace name. To find these values:
84+
In the next cell, enter your Subscription ID, Resource Group name, and Workspace name. To find these values:
8785

8886
1. In the upper right Azure Machine Learning studio toolbar, select your workspace name.
89-
1. Copy the value for workspace, resource group and subscription ID into the code.
90-
1. You'll need to copy one value, close the area and paste, then come back for the next one.
87+
1. Copy the value for workspace, resource group, and subscription ID into the code.
88+
1. You need to copy one value, close the area, and paste, then come back for the next one.
9189

9290

9391

@@ -111,14 +109,14 @@ ml_client = MLClient(
111109
```
112110

113111
> [!NOTE]
114-
> Creating MLClient will not connect to the workspace. The client initialization is lazy, it will wait for the first time it needs to make a call (this will happen in the next code cell).
112+
> Creating MLClient won't connect to the workspace. The client initialization is lazy, it will wait for the first time it needs to make a call (this will happen in the next code cell).
115113
116-
Verify the connection by making a call to `ml_client`. Since this is the first time that you're making a call to the workspace, you may be asked to authenticate.
114+
Verify the connection by making a call to `ml_client`. Since this is the first time that you're making a call to the workspace, you might be asked to authenticate.
117115

118116

119117
```python
120118
# Verify that the handle works correctly.
121-
# If you ge an error here, modify your SUBSCRIPTION, RESOURCE_GROUP, and WS_NAME in the previous cell.
119+
# If you get an error here, modify your SUBSCRIPTION, RESOURCE_GROUP, and WS_NAME in the previous cell.
122120
ws = ml_client.workspaces.get(WS_NAME)
123121
print(ws.location, ":", ws.resource_group)
124122
```
@@ -175,7 +173,7 @@ dependencies:
175173

176174
The specification contains some usual packages, that you use in your pipeline (numpy, pip), together with some Azure Machine Learning specific packages (azureml-mlflow).
177175

178-
The Azure Machine Learning packages aren't mandatory to run Azure Machine Learning jobs. However, adding these packages let you interact with Azure Machine Learning for logging metrics and registering models, all inside the Azure Machine Learning job. You use them in the training script later in this tutorial.
176+
The Azure Machine Learning packages aren't mandatory to run Azure Machine Learning jobs. However, adding these packages lets you interact with Azure Machine Learning for logging metrics and registering models, all inside the Azure Machine Learning job. You use them in the training script later in this tutorial.
179177

180178
Use the *yaml* file to create and register this custom environment in your workspace:
181179

@@ -210,17 +208,17 @@ Azure Machine Learning pipelines are reusable ML workflows that usually consist
210208
- Write the yaml specification of the component, or create it programmatically using `ComponentMethod`.
211209
- Optionally, register the component with a name and version in your workspace, to make it reusable and shareable.
212210
- Load that component from the pipeline code.
213-
- Implement the pipeline using the component's inputs, outputs and parameters.
211+
- Implement the pipeline using the component's inputs, outputs, and parameters.
214212
- Submit the pipeline.
215213

216-
There are two ways to create a component, programmatic and yaml definition. The next two sections walk you through creating a component both ways. You can either create the two components trying both options or pick your preferred method.
214+
You can create a component in two ways: programmatic and yaml definition. The next two sections walk you through creating a component both ways. You can either create the two components trying both options or pick your preferred method.
217215

218216
> [!NOTE]
219-
> In this tutorial for simplicity we are using the same compute for all components. However, you can set different computes for each component, for example by adding a line like `train_step.compute = "cpu-cluster"`. To view an example of building a pipeline with different computes for each component, see the [Basic pipeline job section in the cifar-10 pipeline tutorial](https://github.com/Azure/azureml-examples/blob/main/sdk/python/jobs/pipelines/2b_train_cifar_10_with_pytorch/train_cifar_10_with_pytorch.ipynb).
217+
> In this tutorial for simplicity, we're using the same compute for all components. However, you can set different computes for each component, for example by adding a line like `train_step.compute = "cpu-cluster"`. To view an example of building a pipeline with different computes for each component, see the [Basic pipeline job section in the cifar-10 pipeline tutorial](https://github.com/Azure/azureml-examples/blob/main/sdk/python/jobs/pipelines/2b_train_cifar_10_with_pytorch/train_cifar_10_with_pytorch.ipynb).
220218
221219
### Create component 1: data prep (using programmatic definition)
222220

223-
Let's start by creating the first component. This component handles the preprocessing of the data. The preprocessing task is performed in the *data_prep.py* Python file.
221+
Start by creating the first component. This component handles the preprocessing of the data. The preprocessing task is performed in the *data_prep.py* Python file.
224222

225223
First create a source folder for the data_prep component:
226224

@@ -325,10 +323,10 @@ Optionally, register the component in the workspace for future reuse.
325323

326324

327325
```python
328-
# Now we register the component to the workspace
326+
# Now register the component to the workspace
329327
data_prep_component = ml_client.create_or_update(data_prep_component.component)
330328

331-
# Create (register) the component in your workspace
329+
# Create and register the component in your workspace
332330
print(
333331
f"Component {data_prep_component.name} with Version {data_prep_component.version} is registered"
334332
)
@@ -338,7 +336,7 @@ print(
338336

339337
The second component that you create consumes the training and test data, train a tree based model and return the output model. Use Azure Machine Learning logging capabilities to record and visualize the learning progress.
340338

341-
You used the `CommandComponent` class to create your first component. This time you use the yaml definition to define the second component. Each method has its own advantages. A yaml definition can actually be checked-in along the code, and would provide a readable history tracking. The programmatic method using `CommandComponent` can be easier with built-in class documentation and code completion.
339+
You used the `CommandComponent` class to create your first component. This time you use the yaml definition to define the second component. Each method has its own advantages. A yaml definition can actually be checked in along the code and provides readable history tracking. The programmatic method using `CommandComponent` can be easier with built-in class documentation and code completion.
342340

343341
Create the directory for this component:
344342

@@ -488,7 +486,7 @@ command: >-
488486

489487
```
490488

491-
Now create and register the component. Registering it allows you to re-use it in other pipelines. Also, anyone else with access to your workspace can use the registered component.
489+
Now create and register the component. Registering it allows you to reuse it in other pipelines. Also, anyone else with access to your workspace can use the registered component.
492490

493491

494492
```python
@@ -498,10 +496,10 @@ from azure.ai.ml import load_component
498496
# Loading the component from the yml file
499497
train_component = load_component(source=os.path.join(train_src_dir, "train.yml"))
500498

501-
# Now we register the component to the workspace
499+
# Now register the component to the workspace
502500
train_component = ml_client.create_or_update(train_component)
503501

504-
# Create (register) the component in your workspace
502+
# Create and register the component in your workspace
505503
print(
506504
f"Component {train_component.name} with Version {train_component.version} is registered"
507505
)
@@ -512,14 +510,14 @@ print(
512510
Now that both your components are defined and registered, you can start implementing the pipeline.
513511

514512

515-
Here, you use *input data*, *split ratio* and *registered model name* as input variables. Then call the components and connect them via their inputs/outputs identifiers. The outputs of each step can be accessed via the `.outputs` property.
513+
Here, you use *input data*, *split ratio*, and *registered model name* as input variables. Then call the components and connect them via their inputs/outputs identifiers. The outputs of each step can be accessed via the `.outputs` property.
516514

517515

518-
The Python functions returned by `load_component()` work as any regular Python function that we use within a pipeline to call each step.
516+
The Python functions returned by `load_component()` work as any regular Python function that you use within a pipeline to call each step.
519517

520-
To code the pipeline, you use a specific `@dsl.pipeline` decorator that identifies the Azure Machine Learning pipelines. In the decorator, we can specify the pipeline description and default resources like compute and storage. Like a Python function, pipelines can have inputs. You can then create multiple instances of a single pipeline with different inputs.
518+
To code the pipeline, you use a specific `@dsl.pipeline` decorator that identifies the Azure Machine Learning pipelines. In the decorator, you can specify the pipeline description and default resources like compute and storage. Like a Python function, pipelines can have inputs. You can then create multiple instances of a single pipeline with different inputs.
521519

522-
Here, we used *input data*, *split ratio* and *registered model name* as input variables. We then call the components and connect them via their inputs/outputs identifiers. The outputs of each step can be accessed via the `.outputs` property.
520+
Here, you use *input data*, *split ratio*, and *registered model name* as input variables. You then call the components and connect them via their inputs/outputs identifiers. The outputs of each step can be accessed via the `.outputs` property.
523521

524522

525523
```python
@@ -593,18 +591,18 @@ pipeline_job = ml_client.jobs.create_or_update(
593591
ml_client.jobs.stream(pipeline_job.name)
594592
```
595593

596-
You can track the progress of your pipeline, by using the link generated in the previous cell. When you first select this link, you may see that the pipeline is still running. Once it's complete, you can examine each component's results.
594+
You can track the progress of your pipeline, by using the link generated in the previous cell. When you first select this link, you might see that the pipeline is still running. Once it's complete, you can examine each component's results.
597595

598596
Double-click the **Train Credit Defaults Model** component.
599597

600-
There are two important results you'll want to see about training:
598+
There are two important results you want to see about training:
601599

602600
* View your logs:
603601
1. Select the **Outputs+logs** tab.
604602
1. Open the folders to `user_logs` > `std_log.txt`
605603
This section shows the script run stdout.
606604
:::image type="content" source="media/tutorial-pipeline-python-sdk/user-logs.jpg" alt-text="Screenshot of std_log.txt." lightbox="media/tutorial-pipeline-python-sdk/user-logs.jpg":::
607-
* View your metrics: Select the **Metrics** tab. This section shows different logged metrics. In this example. mlflow `autologging`, has automatically logged the training metrics.
605+
* View your metrics: Select the **Metrics** tab. This section shows different logged metrics. In this example, mlflow `autologging` has automatically logged the training metrics.
608606

609607
:::image type="content" source="media/tutorial-pipeline-python-sdk/metrics.jpg" alt-text="Screenshot shows logged metrics.txt." lightbox="media/tutorial-pipeline-python-sdk/metrics.jpg":::
610608

@@ -621,7 +619,7 @@ If you plan to continue now to other tutorials, skip to [Next steps](#next-steps
621619

622620
### Stop compute instance
623621

624-
If you're not going to use it now, stop the compute instance:
622+
If you aren't going to use it now, stop the compute instance:
625623

626624
1. In the studio, in the left pane, select **Compute**.
627625
1. In the top tabs, select **Compute instances**

0 commit comments

Comments
 (0)