Merge pull request #7006 from s-polly/stp_pipelines_9-9

prmerger-automator[bot] · web-flow · commit 222f63215ca6 · 2025-09-09T23:24:22.000Z
Freshness - pipelines tutorial
diff --git a/articles/machine-learning/tutorial-pipeline-python-sdk.md b/articles/machine-learning/tutorial-pipeline-python-sdk.md
@@ -9,7 +9,7 @@ ms.topic: tutorial
 author: lgayhardt
 ms.author: lagayhar
 ms.reviewer: keli19
-ms.date: 05/15/2024
+ms.date: 09/09/2025
 ms.custom:
   - sdkv2
   - build-2023
@@ -25,11 +25,9 @@ ms.custom:
 > [!NOTE]
 > For a tutorial that uses SDK v1 to build a pipeline, see [Tutorial: Build an Azure Machine Learning pipeline for image classification](v1/tutorial-pipeline-python-sdk.md)
 
-The core of a machine learning pipeline is to split a complete machine learning task into a multistep workflow. Each step is a manageable component that can be developed, optimized, configured, and automated individually. Steps are connected through well-defined interfaces. The Azure Machine Learning pipeline service automatically orchestrates all the dependencies between pipeline steps. The benefits of using a pipeline are standardized the MLOps practice, scalable team collaboration, training efficiency and cost reduction. To learn more about the benefits of pipelines, see [What are Azure Machine Learning pipelines](concept-ml-pipelines.md).
+The core of a machine learning pipeline is to split a complete machine learning task into a multistep workflow. Each step is a manageable component that can be developed, optimized, configured, and automated individually. Steps are connected through well-defined interfaces. The Azure Machine Learning pipeline service automatically orchestrates all the dependencies between pipeline steps. The benefits of using a pipeline are standardized the MLOps practice, scalable team collaboration, training efficiency, and cost reduction. To learn more about the benefits of pipelines, see [What are Azure Machine Learning pipelines](concept-ml-pipelines.md).
 
-In this tutorial, you use Azure Machine Learning to create a production ready machine learning project, using Azure Machine Learning Python SDK v2.
-
-This means you will be able to leverage the Azure Machine Learning Python SDK to:
+In this tutorial, you use Azure Machine Learning to create a production-ready machine learning project, using Azure Machine Learning Python SDK v2. This means you are able to use the Azure Machine Learning Python SDK to:
 
 > [!div class="checklist"]
 > - Get a handle to your Azure Machine Learning workspace
@@ -58,7 +56,7 @@ This video shows how to get started in Azure Machine Learning studio so that you
 
 1. [!INCLUDE [sign in](includes/prereq-sign-in.md)]
 
-1. Complete the tutorial [Upload, access and explore your data](tutorial-explore-data.md) to create the data asset you need in this tutorial.  Make sure you run all the code to create the initial data asset.  Explore the data and revise it if you wish, but you'll only need the initial data in this tutorial.
+1. Complete the tutorial [Upload, access and explore your data](tutorial-explore-data.md) to create the data asset you need in this tutorial.  Make sure you run all the code to create the initial data asset.  Explore the data and revise it if you wish, but you only need the initial data in this tutorial.
 
 1. [!INCLUDE [open or create  notebook](includes/prereq-open-or-create.md)]
     * [!INCLUDE [new notebook](includes/prereq-new-notebook.md)]
@@ -73,21 +71,21 @@ This video shows how to get started in Azure Machine Learning studio so that you
 
 The Azure Machine Learning framework can be used from CLI, Python SDK, or studio interface. In this example, you use the Azure Machine Learning Python SDK v2 to create a pipeline. 
 
-Before creating the pipeline, you need the following resources:
+Before creating the pipeline, you need these resources:
 
 * The data asset for training
 * The software environment to run the pipeline
 * A compute resource to where the job runs
 
 ## Create handle to workspace
 
-Before we dive in the code, you need a way to reference your workspace. You'll create `ml_client` for a handle to the workspace.  You'll then use `ml_client` to manage resources and jobs.
+Before we dive in the code, you need a way to reference your workspace. You create `ml_client` for a handle to the workspace. You then use `ml_client` to manage resources and jobs.
 
-In the next cell, enter your Subscription ID, Resource Group name and Workspace name. To find these values:
+In the next cell, enter your Subscription ID, Resource Group name, and Workspace name. To find these values:
 
 1. In the upper right Azure Machine Learning studio toolbar, select your workspace name.
-1. Copy the value for workspace, resource group and subscription ID into the code.
-1. You'll need to copy one value, close the area and paste, then come back for the next one.
+1. Copy the value for workspace, resource group, and subscription ID into the code.
+1. You need to copy one value, close the area, and paste, then come back for the next one.
 
 
 
@@ -110,19 +108,26 @@ ml_client = MLClient(
 )
 ```
 
+**SDK Reference:**
+- [MLClient](/python/api/azure-ai-ml/azure.ai.ml.mlclient)
+- [DefaultAzureCredential](/python/api/azure-identity/azure.identity.defaultazurecredential)
+
 > [!NOTE]
-> Creating MLClient will not connect to the workspace. The client initialization is lazy, it will wait for the first time it needs to make a call (this will happen in the next code cell).
+> Creating MLClient won't connect to the workspace. The client initialization is lazy, it will wait for the first time it needs to make a call (this will happen in the next code cell).
 
-Verify the connection by making a call to `ml_client`. Since this is the first time that you're making a call to the workspace, you may be asked to authenticate. 
+Verify the connection by making a call to `ml_client`. Since this is the first time that you're making a call to the workspace, you might be asked to authenticate. 
 
 
 ```python
 # Verify that the handle works correctly.
-# If you ge an error here, modify your SUBSCRIPTION, RESOURCE_GROUP, and WS_NAME in the previous cell.
+# If you get an error here, modify your SUBSCRIPTION, RESOURCE_GROUP, and WS_NAME in the previous cell.
 ws = ml_client.workspaces.get(WS_NAME)
 print(ws.location, ":", ws.resource_group)
 ```
 
+**SDK Reference:**
+- [WorkspaceOperations.get](/python/api/azure-ai-ml/azure.ai.ml.operations.workspaceoperations#azure-ai-ml-operations-workspaceoperations-get)
+
 ## Access the registered data asset
 
 Start by getting the data that you previously registered in [Tutorial: Upload, access and explore your data in Azure Machine Learning](tutorial-explore-data.md).
@@ -136,6 +141,9 @@ credit_data = ml_client.data.get(name="credit-card", version="initial")
 print(f"Data asset URI: {credit_data.path}")
 ```
 
+**SDK Reference:**
+- [DataOperations.get](/python/api/azure-ai-ml/azure.ai.ml.operations.dataoperations#azure-ai-ml-operations-dataoperations-get)
+
 ## Create a job environment for pipeline steps
 
 So far, you've created a development environment on the compute instance, your development machine. You also need an environment to use for each step of the pipeline. Each step can have its own environment, or you can use some common environments for multiple steps.
@@ -175,7 +183,7 @@ dependencies:
 
 The specification contains some usual packages, that you use in your pipeline (numpy, pip), together with some Azure Machine Learning specific packages (azureml-mlflow).
 
-The Azure Machine Learning packages aren't mandatory to run Azure Machine Learning jobs. However, adding these packages let you interact with Azure Machine Learning for logging metrics and registering models, all inside the Azure Machine Learning job. You use them in the training script later in this tutorial.
+The Azure Machine Learning packages aren't mandatory to run Azure Machine Learning jobs. However, adding these packages lets you interact with Azure Machine Learning for logging metrics and registering models, all inside the Azure Machine Learning job. You use them in the training script later in this tutorial.
 
 Use the *yaml* file to create and register this custom environment in your workspace:
 
@@ -201,6 +209,10 @@ print(
 )
 ```
 
+**SDK Reference:**
+- [Environment](/python/api/azure-ai-ml/azure.ai.ml.entities.environment)
+- [EnvironmentOperations.create_or_update](/python/api/azure-ai-ml/azure.ai.ml.operations.environmentoperations#azure-ai-ml-operations-environmentoperations-create-or-update)
+
 ## Build the training pipeline
 
 Now that you have all assets required to run your pipeline, it's time to build the pipeline itself.
@@ -210,17 +222,17 @@ Azure Machine Learning pipelines are reusable ML workflows that usually consist
 - Write the yaml specification of the component, or create it programmatically using `ComponentMethod`.
 - Optionally, register the component with a name and version in your workspace, to make it reusable and shareable.
 - Load that component from the pipeline code.
-- Implement the pipeline using the component's inputs, outputs and parameters.
+- Implement the pipeline using the component's inputs, outputs, and parameters.
 - Submit the pipeline.
 
-There are two ways to create a component, programmatic and yaml definition. The next two sections walk you through creating a component both ways. You can either create the two components trying both options or pick your preferred method.
+You can create a component in two ways: programmatic and yaml definition. The next two sections walk you through creating a component both ways. You can either create the two components trying both options or pick your preferred method.
 
 > [!NOTE]
-> In this tutorial for simplicity we are using the same compute for all components. However, you can set different computes for each component, for example by adding a line like `train_step.compute = "cpu-cluster"`. To view an example of building a pipeline with different computes for each component, see the [Basic pipeline job section in the cifar-10 pipeline tutorial](https://github.com/Azure/azureml-examples/blob/main/sdk/python/jobs/pipelines/2b_train_cifar_10_with_pytorch/train_cifar_10_with_pytorch.ipynb).
+> In this tutorial for simplicity, we're using the same compute for all components. However, you can set different computes for each component, for example by adding a line like `train_step.compute = "cpu-cluster"`. To view an example of building a pipeline with different computes for each component, see the [Basic pipeline job section in the cifar-10 pipeline tutorial](https://github.com/Azure/azureml-examples/blob/main/sdk/python/jobs/pipelines/2b_train_cifar_10_with_pytorch/train_cifar_10_with_pytorch.ipynb).
 
 ### Create component 1: data prep (using programmatic definition)
 
-Let's start by creating the first component. This component handles the preprocessing of the data. The preprocessing task is performed in the *data_prep.py* Python file.
+Start by creating the first component. This component handles the preprocessing of the data. The preprocessing task is performed in the *data_prep.py* Python file.
 
 First create a source folder for the data_prep component:
 
@@ -320,25 +332,33 @@ data_prep_component = command(
 )
 ```
 
+**SDK Reference:**
+- [command](/python/api/azure-ai-ml/azure.ai.ml#azure-ai-ml-command)
+- [Input](/python/api/azure-ai-ml/azure.ai.ml.input)
+- [Output](/python/api/azure-ai-ml/azure.ai.ml.output)
+
 Optionally, register the component in the workspace for future reuse.
 
 
 
 ```python
-# Now we register the component to the workspace
+# Now register the component to the workspace
 data_prep_component = ml_client.create_or_update(data_prep_component.component)
 
-# Create (register) the component in your workspace
+# Create and register the component in your workspace
 print(
     f"Component {data_prep_component.name} with Version {data_prep_component.version} is registered"
 )
 ```
 
+**SDK Reference:**
+- [MLClient.create_or_update](/python/api/azure-ai-ml/azure.ai.ml.mlclient#azure-ai-ml-mlclient-create-or-update)
+
 ### Create component 2: training (using yaml definition)
 
 The second component that you create consumes the training and test data, train a tree based model and return the output model. Use Azure Machine Learning logging capabilities to record and visualize the learning progress.
 
-You used the `CommandComponent` class to create your first component. This time you use the yaml definition to define the second component. Each method has its own advantages. A yaml definition can actually be checked-in along the code, and would provide a readable history tracking. The programmatic method using `CommandComponent` can be easier with built-in class documentation and code completion.
+You used the `CommandComponent` class to create your first component. This time you use the yaml definition to define the second component. Each method has its own advantages. A yaml definition can actually be checked in along the code and provides readable history tracking. The programmatic method using `CommandComponent` can be easier with built-in class documentation and code completion.
 
 Create the directory for this component:
 
@@ -488,7 +508,7 @@ command: >-
 
 ```
 
-Now create and register the component.  Registering it allows you to re-use it in other pipelines.  Also, anyone else with access to your workspace can use the registered component.
+Now create and register the component.  Registering it allows you to reuse it in other pipelines.  Also, anyone else with access to your workspace can use the registered component.
 
 
 ```python
@@ -498,28 +518,32 @@ from azure.ai.ml import load_component
 # Loading the component from the yml file
 train_component = load_component(source=os.path.join(train_src_dir, "train.yml"))
 
-# Now we register the component to the workspace
+# Now register the component to the workspace
 train_component = ml_client.create_or_update(train_component)
 
-# Create (register) the component in your workspace
+# Create and register the component in your workspace
 print(
     f"Component {train_component.name} with Version {train_component.version} is registered"
 )
 ```
 
+**SDK Reference:**
+- [load_component](/python/api/azure-ai-ml/azure.ai.ml#azure-ai-ml-load-component)
+- [MLClient.create_or_update](/python/api/azure-ai-ml/azure.ai.ml.mlclient#azure-ai-ml-mlclient-create-or-update)
+
 ### Create the pipeline from components
 
 Now that both your components are defined and registered, you can start implementing the pipeline.
 
 
-Here, you use *input data*, *split ratio* and *registered model name* as input variables. Then call the components and connect them via their inputs/outputs identifiers. The outputs of each step can be accessed via the `.outputs` property.
+Here, you use *input data*, *split ratio*, and *registered model name* as input variables. Then call the components and connect them via their inputs/outputs identifiers. The outputs of each step can be accessed via the `.outputs` property.
 
 
-The Python functions returned by `load_component()` work as any regular Python function that we use within a pipeline to call each step.
+The Python functions returned by `load_component()` work as any regular Python function that you use within a pipeline to call each step.
 
-To code the pipeline, you use a specific `@dsl.pipeline` decorator that identifies the Azure Machine Learning pipelines. In the decorator, we can specify the pipeline description and default resources like compute and storage. Like a Python function, pipelines can have inputs. You can then create multiple instances of a single pipeline with different inputs.
+To code the pipeline, you use a specific `@dsl.pipeline` decorator that identifies the Azure Machine Learning pipelines. In the decorator, you can specify the pipeline description and default resources like compute and storage. Like a Python function, pipelines can have inputs. You can then create multiple instances of a single pipeline with different inputs.
 
-Here, we used *input data*, *split ratio* and *registered model name* as input variables. We then call the components and connect them via their inputs/outputs identifiers. The outputs of each step can be accessed via the `.outputs` property.
+Here, you use *input data*, *split ratio*, and *registered model name* as input variables. You then call the components and connect them via their inputs/outputs identifiers. The outputs of each step can be accessed via the `.outputs` property.
 
 
 ```python
@@ -559,6 +583,11 @@ def credit_defaults_pipeline(
     }
 ```
 
+**SDK Reference:**
+- [dsl.pipeline](/python/api/azure-ai-ml/azure.ai.ml.dsl#azure-ai-ml-dsl-pipeline)
+- [Input](/python/api/azure-ai-ml/azure.ai.ml.input)
+- [Output](/python/api/azure-ai-ml/azure.ai.ml.output)
+
 Now use your pipeline definition to instantiate a pipeline with your dataset, split rate of choice and the name you picked for your model.
 
 
@@ -574,6 +603,9 @@ pipeline = credit_defaults_pipeline(
 )
 ```
 
+**SDK Reference:**
+- [Input](/python/api/azure-ai-ml/azure.ai.ml.input)
+
 ## Submit the job 
 
 It's now time to submit the job to run in Azure Machine Learning. This time you use `create_or_update`  on `ml_client.jobs`.
@@ -593,18 +625,22 @@ pipeline_job = ml_client.jobs.create_or_update(
 ml_client.jobs.stream(pipeline_job.name)
 ```
 
-You can track the progress of your pipeline, by using the link generated in the previous cell. When you first select this link, you may see that the pipeline is still running. Once it's complete, you can examine each component's results.
+**SDK Reference:**
+- [JobOperations.create_or_update](/python/api/azure-ai-ml/azure.ai.ml.operations.joboperations#azure-ai-ml-operations-joboperations-create-or-update)
+- [JobOperations.stream](/python/api/azure-ai-ml/azure.ai.ml.operations.joboperations#azure-ai-ml-operations-joboperations-stream)
+
+You can track the progress of your pipeline, by using the link generated in the previous cell. When you first select this link, you might see that the pipeline is still running. Once it's complete, you can examine each component's results.
 
 Double-click the **Train Credit Defaults Model** component. 
 
-There are two important results you'll want to see about training:
+There are two important results you want to see about training:
 
 * View your logs:
     1. Select the **Outputs+logs** tab.
     1. Open the folders to `user_logs` > `std_log.txt`
     This section shows the script run stdout.
     :::image type="content" source="media/tutorial-pipeline-python-sdk/user-logs.jpg" alt-text="Screenshot of std_log.txt." lightbox="media/tutorial-pipeline-python-sdk/user-logs.jpg":::
-* View your metrics: Select the **Metrics** tab.  This section shows different logged metrics. In this example. mlflow `autologging`, has automatically logged the training metrics.
+* View your metrics: Select the **Metrics** tab.  This section shows different logged metrics. In this example, mlflow `autologging` has automatically logged the training metrics.
     
     :::image type="content" source="media/tutorial-pipeline-python-sdk/metrics.jpg" alt-text="Screenshot shows logged metrics.txt." lightbox="media/tutorial-pipeline-python-sdk/metrics.jpg":::
 
@@ -621,7 +657,7 @@ If you plan to continue now to other tutorials, skip to [Next steps](#next-steps
 
 ### Stop compute instance
 
-If you're not going to use it now, stop the compute instance:
+If you aren't going to use it now, stop the compute instance:
 
 1. In the studio, in the left pane, select **Compute**.
 1. In the top tabs, select **Compute instances**