You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
#Customer intent: As a Python coding data scientist, I want to improve my operational efficiency by scheduling my training pipeline of my model using the latest data.
14
+
#Customer intent: As a data scientist who uses Python, I want to improve my operational efficiency by scheduling the training pipeline of my model with the latest data.
In this article, you'll learn how to programmatically schedule a pipeline to run on Azure. You can create a schedule based on elapsed time or on file-system changes. Time-based schedules can be used to take care of routine tasks, such as monitoring for data drift. Change-based schedules can be used to react to irregular or unpredictable changes, such as new data being uploaded or old data being edited. After learning how to create schedules, you'll learn how to retrieve and deactivate them. Finally, you'll learn how to use other Azure services, Azure Logic App and Azure Data Factory, to run pipelines. An Azure Logic App allows for more complex triggering logic or behavior. Azure Data Factory pipelines allow you to call a machine learning pipeline as part of a larger data orchestration pipeline.
23
+
In this article, you'll learn how to programmatically schedule a pipeline to run on Azure. You can create a schedule based on elapsed time or on file-system changes. You can use time-based schedules to accomplish routine tasks, such as monitoring for data drift. You can use change-based schedules to react to irregular or unpredictable changes, such as new data being uploaded or old data being edited. After you learn how to create schedules, you'll learn how to retrieve and deactivate them. Finally, you'll learn how to use other Azure services, Azure Logic Apps and Azure Data Factory, to run pipelines. A logic app enables more complex triggering logic or behavior. Azure Data Factory pipelines allow you to call a machine learning pipeline as part of a larger data orchestration pipeline.
24
24
25
25
## Prerequisites
26
26
27
27
* An Azure subscription. If you don’t have an Azure subscription, create a [free account](https://azure.microsoft.com/free/).
28
28
29
-
* A Python environment in which the Azure Machine Learning SDK for Python is installed. For more information, see [Create and manage reusable environments for training and deployment with Azure Machine Learning.](how-to-use-environments.md)
29
+
* A Python environment in which the Azure Machine Learning SDK for Python is installed. For more information, see [Create and manage reusable environments for training and deployment with Azure Machine Learning](how-to-use-environments.md).
30
30
31
-
* A Machine Learning workspace with a published pipeline. You can use the one built in [Create and run machine learning pipelines with Azure Machine Learning SDK](./how-to-create-machine-learning-pipelines.md).
31
+
* A Machine Learning workspace with a published pipeline. You can use the one that's created in [Create and run machine learning pipelines with Azure Machine Learning SDK](./how-to-create-machine-learning-pipelines.md).
32
32
33
-
## Trigger pipelines with Azure Machine Learning SDK for Python
33
+
## Get required values
34
34
35
-
To schedule a pipeline, you'll need a reference to your workspace, the identifier of your published pipeline, and the name of the experiment in which you wish to create the schedule. You can get these values with the following code:
35
+
To schedule a pipeline, you'll need a reference to your workspace, the identifier of your published pipeline, and the name of the experiment in which you want to create the schedule. You can get these values by using the following code:
To run a pipeline on a recurring basis, you'll create a schedule. A `Schedule` associates a pipeline, an experiment, and a trigger. The trigger can either be a`ScheduleRecurrence` that describes the wait between jobs or a Datastore path that specifies a directory to watch for changes. In either case, you'll need the pipeline identifier and the name of the experiment in which to create the schedule.
59
+
To run a pipeline on a recurring basis, you create a schedule. A `Schedule` associates a pipeline, an experiment, and a trigger. The trigger can either be a`ScheduleRecurrence` that defines the wait time between jobs or a datastore path that specifies a directory to watch for changes. In either case, you need the pipeline identifier and the name of the experiment in which to create the schedule.
60
60
61
61
At the top of your Python file, import the `Schedule` and `ScheduleRecurrence` classes:
62
62
@@ -67,7 +67,7 @@ from azureml.pipeline.core.schedule import ScheduleRecurrence, Schedule
67
67
68
68
### Create a time-based schedule
69
69
70
-
The `ScheduleRecurrence` constructor has a required `frequency` argument that must be one of the following strings: "Minute", "Hour", "Day", "Week", or "Month". It also requires an integer `interval` argument specifying how many of the `frequency` units should elapse between schedule starts. Optional arguments allow you to be more specific about starting times, as detailed in the [ScheduleRecurrence SDK docs](/python/api/azureml-pipeline-core/azureml.pipeline.core.schedule.schedulerecurrence).
70
+
The `ScheduleRecurrence` constructor has a required `frequency` argument that must be set to one of the following strings: `"Minute"`, `"Hour"`, `"Day"`, `"Week"`, or `"Month"`. It also requires an integer `interval` argument that specifies how many `frequency` units should elapse between start times. Optional arguments allow you to be more specific about starting times, as described in the [ScheduleRecurrence SDK documentation](/python/api/azureml-pipeline-core/azureml.pipeline.core.schedule.schedulerecurrence).
71
71
72
72
Create a `Schedule` that begins a job every 15 minutes:
Pipelines that are triggered by file changes may be more efficient than time-based schedules. When you want to do something before a file is changed, or when a new file is added to a data directory, you can preprocess that file. You can monitor any changes to a datastore or changes within a specific directory within the datastore. If you monitor a specific directory, changes within subdirectories of that directory will _not_ trigger a job.
85
+
Pipelines that are triggered by file changes might be more efficient than time-based schedules. When you want to do something before a file is changed, or when a new file is added to a data directory, you can preprocess that file. You can monitor any changes to a datastore or changes in a specific directory within the datastore. If you monitor a specific directory, changes within subdirectories of that directory won't trigger a job.
86
86
87
87
> [!NOTE]
88
-
> Change-based schedules only supports monitoring Azure Blob storage.
88
+
> Change-based schedules support monitoring Azure Blob Storage only.
89
89
90
-
To create a file-reactive `Schedule`, you must set the `datastore` parameter in the call to [Schedule.create](/python/api/azureml-pipeline-core/azureml.pipeline.core.schedule.schedule#create-workspace--name--pipeline-id--experiment-name--recurrence-none--description-none--pipeline-parameters-none--wait-for-provisioning-false--wait-timeout-3600--datastore-none--polling-interval-5--data-path-parameter-name-none--continue-on-step-failure-none--path-on-datastore-none---workflow-provider-none---service-endpoint-none-). To monitor a folder, set the `path_on_datastore` argument.
90
+
To create a file-reactive `Schedule`, you need to set the `datastore` parameter in the call to [Schedule.create](/python/api/azureml-pipeline-core/azureml.pipeline.core.schedule.schedule#create-workspace--name--pipeline-id--experiment-name--recurrence-none--description-none--pipeline-parameters-none--wait-for-provisioning-false--wait-timeout-3600--datastore-none--polling-interval-5--data-path-parameter-name-none--continue-on-step-failure-none--path-on-datastore-none---workflow-provider-none---service-endpoint-none-). To monitor a folder, set the `path_on_datastore` argument.
91
91
92
-
The `polling_interval` argument allows you to specify, in minutes, the frequency at which the datastore is checked for changes.
92
+
The `polling_interval` argument enables you to specify, in minutes, the frequency at which the datastore is checked for changes.
93
93
94
94
If the pipeline was constructed with a [DataPath](/python/api/azureml-core/azureml.data.datapath.datapath)[PipelineParameter](/python/api/azureml-pipeline-core/azureml.pipeline.core.pipelineparameter), you can set that variable to the name of the changed file by setting the `data_path_parameter_name` argument.
In addition to the arguments discussed previously, you may set the `status` argument to `"Disabled"` to create an inactive schedule. Finally, the `continue_on_step_failure`allows you to pass a Boolean that will override the pipeline's default failure behavior.
105
+
In addition to the arguments discussed previously, you can set the `status` argument to `"Disabled"` to create an inactive schedule. The `continue_on_step_failure`enables you to pass a Boolean value that overrides the pipeline's default failure behavior.
106
106
107
107
## View your scheduled pipelines
108
108
109
-
In your Web browser, navigate to Azure Machine Learning. From the **Endpoints** section of the navigation panel, choose**Pipeline endpoints**. This takes you to a list of the pipelines published in the Workspace.
109
+
In a browser, go to Azure Machine Learning studio. In the left pane, select the Endpoints icon. In the **Endpoints** pane, select**Pipeline endpoints**. This takes you to a list of the pipelines that are published in the workspace.
110
110
111
-
:::image type="content" source="./media/how-to-trigger-published-pipeline/scheduled-pipelines.png" alt-text="Pipelines page of AML":::
111
+
:::image type="content" source="./media/how-to-trigger-published-pipeline/scheduled-pipelines.png" alt-text="Screenshot that shows the Endpoints pane." lightbox="./media/how-to-trigger-published-pipeline/scheduled-pipelines.png":::
112
112
113
-
In this page you can see summary information about all the pipelines in the Workspace: names, descriptions, status, and so forth. Drill in by clicking in your pipeline. On the resulting page, there are more details about your pipeline and you may drill down into individual jobs.
113
+
On this page, you can see summary information about all the pipelines in the workspace: names, descriptions, status, and so on. You can get more information by selecting the name of a pipeline. On the resulting page, you can also get information about individual jobs.
114
114
115
115
## Deactivate the pipeline
116
116
117
-
If you have a `Pipeline` that is published, but not scheduled, you can disable it with:
117
+
If you have a `Pipeline` that's published but not scheduled, you can disable it with this code:
If you then run `Schedule.list(ws)` again, you should get an empty list.
144
144
145
-
## Use Azure Logic Apps for complex triggers
145
+
## Use Logic Apps for complex triggers
146
146
147
-
More complex trigger rules or behavior can be created using an [Azure Logic App](/azure/logic-apps/logic-apps-overview).
147
+
You can create more complex trigger rules or behavior by using [Logic Apps](/azure/logic-apps/logic-apps-overview).
148
148
149
-
To use an Azure Logic App to trigger a Machine Learning pipeline, you'll need the REST endpoint for a published Machine Learning pipeline. [Create and publish your pipeline](./how-to-create-machine-learning-pipelines.md). Then find the REST endpoint of your `PublishedPipeline` by using the pipeline ID:
149
+
To use a logic app to trigger a Machine Learning pipeline, you need the REST endpoint for a published Machine Learning pipeline. [Create and publish your pipeline](./how-to-create-machine-learning-pipelines.md). Then find the REST endpoint of your `PublishedPipeline` by using the pipeline ID:
150
150
151
151
```python
152
152
# You can find the pipeline ID in Azure Machine Learning studio
@@ -157,19 +157,19 @@ published_pipeline.endpoint
157
157
158
158
## Create a logic app in Azure
159
159
160
-
Now create an [Azure Logic App](/azure/logic-apps/logic-apps-overview) instance. After your logic app is provisioned, use these steps to configure a trigger for your pipeline:
160
+
Now create a [logic app](/azure/logic-apps/logic-apps-overview) instance. After your logic app is provisioned, use these steps to configure a trigger for your pipeline:
161
161
162
-
1.[Create a system-assigned managed identity](/azure/logic-apps/create-managed-service-identity) to give the app access to your Azure Machine Learning Workspace.
162
+
1.[Create a system-assigned managed identity](/azure/logic-apps/create-managed-service-identity) to give the app access to your Azure Machine Learning workspace.
163
163
164
-
1.Navigate to the Logic App Designer view and select the Blank Logic App template.
164
+
1.Go to the Logic App Designer view and select the **Blank Logic App** template:
> :::image type="content" source="media/how-to-trigger-published-pipeline/blank-template.png" alt-text="Screenshot that shows the button for the Blank Logic App template.":::
167
167
168
-
1. In the Designer, search for **blob**. Select the **When a blob is added or modified (properties only)** trigger and add this trigger to your Logic App.
168
+
1. In the designer, search for **blob**. Select the **When a blob is added or modified (properties only)** trigger and add this trigger to your logic app.
> :::image type="content" source="media/how-to-trigger-published-pipeline/add-trigger.png" alt-text="Screenshot that shows how to add a trigger to a logic app." lightbox="media/how-to-trigger-published-pipeline/add-trigger.png":::
171
171
172
-
1. Fill in the connection info for the Blob storage account you wish to monitor for blob additions or modifications. Select the Container to monitor.
172
+
1. Fill in the connection info for the Blob Storage account that you wish to monitor for blob additions or modifications. Select the Container to monitor.
173
173
174
174
Choose the **Interval** and **Frequency** to poll for updates that work for you.
0 commit comments