Skip to content

Commit bf86c32

Browse files
committed
edit pass: monitor-azure-machine-learning
1 parent cad12da commit bf86c32

9 files changed

+19
-20
lines changed

learn-pr/advocates/monitor-azure-machine-learning/7-azure-machine-learning-model-monitoring.yml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,8 @@
11
### YamlMime:ModuleUnit
22
uid: learn.introduction-to-azure-machine-learning-monitoring.azure-machine-learning-model-monitoring
3-
title: Monitoring Azure Machine Learning models
3+
title: Azure Machine Learning model monitoring
44
metadata:
5-
title: Monitoring Azure Machine Learning Models
5+
title: Azure Machine Learning Model Monitoring
66
description: Learn how to monitor Azure Machine Learning models.
77
ms.date: 03/15/2025
88
author: Orin-Thomas

learn-pr/advocates/monitor-azure-machine-learning/9-knowledge-check.yml

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -17,17 +17,17 @@ quiz:
1717
- content: "Monitor Azure Machine Learning performance."
1818
isCorrect: true
1919
explanation: "Administrators are primarily responsible for the Azure infrastructure on which an Azure Machine Learning workspace and models run."
20-
- content: "Monitor workflow issues."
20+
- content: "Monitor workflow problems."
2121
isCorrect: false
2222
explanation: "Both administrators and machine learning professionals are responsible for monitoring problems and performance related to Azure Machine Learning workspaces."
2323
- content: "Monitor Azure Machine Learning models."
2424
isCorrect: false
2525
explanation: "Machine learning professionals and data scientists are primarily responsible for monitoring model performance."
26-
- content: "You can use Kusto Query Language (KQL) to query the Log Analytics workspace that's associated with your Azure Machine Learning workspace. What is a necessary first step for accomplishing this goal?"
26+
- content: "You can use KQL to query the Log Analytics workspace that's associated with your Azure Machine Learning workspace. What is a necessary first step for accomplishing this goal?"
2727
choices:
2828
- content: "Log Analytics is configured by default."
2929
isCorrect: false
30-
explanation: "Only metrics are enabled by default on Azure Machine Learning. Logs can be exported to Log Analytics by setting up diagnostic settings."
30+
explanation: "Only metrics are enabled by default on Azure Machine Learning. You can export logs to Log Analytics by setting up diagnostic settings."
3131
- content: "Configure diagnostic settings."
3232
isCorrect: true
3333
explanation: "By configuring diagnostic settings, you can enable logs to be exported to a Log Analytics workspace."

learn-pr/advocates/monitor-azure-machine-learning/includes/1-introduction.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,10 @@
1-
Azure Machine Learning is a cloud service for managing the life cycles of machine learning projects. Machine Learning professionals, data scientists, and engineers can use Azure Machine Learning to train and deploy models and manage machine learning operations.
1+
Azure Machine Learning is a cloud service for managing the life cycles of machine learning projects. Machine learning professionals, data scientists, and engineers can use Azure Machine Learning to train and deploy models and manage machine learning operations.
22

33
When anyone is monitoring an Azure Machine Learning environment, it's important to have visibility into all resources that might affect performance and AI model quality. Monitoring of Azure Machine Learning consists of the following areas:
44

55
- **Azure Machine Learning performance**: Compute resources provide the infrastructure for running a machine learning workflow. They can affect Azure Machine Learning runs, experiments, and overall performance. This area is traditionally for operators and administrators.
6-
- **Workflow problems**: Throughout the machine learning life cycle, problems and errors might occur during deployment of new models, during the running of a job, or in other circumstances. Both administrators and machine learning professionals might be interested in this area.
7-
- **Machine learning models**: Data drift, model prediction drift, data quality, and feature attribution drift can lead to outdated models and cause your AI systems to become obsolete. Machine learning professionals and data scientists are the traditional owners of this monitoring.
6+
- **Workflow problems**: Throughout the life cycle of machine learning, problems and errors might occur during deployment of new models, during the running of a job, or in other circumstances. Both administrators and machine learning professionals might be interested in this area.
7+
- **Machine learning models**: Data drift, model prediction drift, poor data quality, and feature attribution drift can lead to outdated models and cause AI systems to become obsolete. Machine learning professionals and data scientists are the traditional owners of this monitoring.
88

99
Azure Monitor is the primary tool for managing an Azure Machine Learning environment. Azure Monitor provides built-in capabilities to monitor performance and workflow problems in Azure Machine Learning. You can also expand these capabilities for your own needs.
1010

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,6 @@
1-
This module gave you an overview of monitoring in Azure Machine Learning. You learned that Azure Machine Learning consists of resources in Azure, such as compute and workspace, and how to check metrics for these resources. You also learned how to enable diagnostic settings for an Azure Machine Learning workspace, so that you can query logs from the Azure Machine Learning resources. Finally, you learned why it's important to monitor the Machine Learning models to prevent them from becoming obsolete and stale.
1+
This module gave you an overview of monitoring in Azure Machine Learning. You learned that Azure Machine Learning consists of resources in Azure, such as compute and workspace, and how to check metrics for these resources. You also learned how to enable diagnostic settings for an Azure Machine Learning workspace, so that you can query logs from the Azure Machine Learning resources. Finally, you learned why it's important to monitor the machine learning models to prevent them from becoming obsolete and stale.
22

33
## Further reading/learning
44

5-
- [Monitor Azure Machine Learning](/azure/machine-learning/monitor-azure-machine-learning-reference)
65
- [Azure Machine Learning model monitoring](/azure/machine-learning/concept-model-monitoring)
76
- [Azure Machine Learning monitoring data reference](/azure/machine-learning/monitor-azure-machine-learning-reference)

learn-pr/advocates/monitor-azure-machine-learning/includes/2-monitoring-azure-machine-learning-workspace-compute.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,7 @@ A compute is a designated resource where you run your job or host your endpoint.
2020
- **Compute cluster**: This type of compute is a managed infrastructure where you can create a cluster of CPU or GPU compute nodes.
2121
- **Serverless compute**: When you use serverless compute, you don't need to create your own cluster. All compute life-cycle management is offloaded to Azure Machine Learning.
2222
- **Kubernetes cluster**: This type of compute is a used to deploy trained machine learning models to Azure Kubernetes Service (AKS). You can create an AKS cluster from your Azure Machine Learning workspace or attach an existing AKS cluster.
23-
- **Attached (Unmanaged) compute**: You can attach your own compute resources to your workspace and use them for training and inference. Virtual machines (VM) are supported, along with services such as Azure Databricks and Azure HDInsight. These are unmanaged compute resources. As such, they can require extra steps for you to maintain or improve performance for machine learning workloads.
23+
- **Attached (unmanaged) compute**: You can attach your own compute resources to your workspace and use them for training and inference. Virtual machines are supported, along with services such as Azure Databricks and Azure HDInsight. These are unmanaged compute resources. As such, they can require extra steps for you to maintain or improve performance for machine learning workloads.
2424

2525
For managed compute resources, you can get insights into the performance of the nodes, quota availability, and resilience of the environment directly from Azure Machine Learning.
2626

learn-pr/advocates/monitor-azure-machine-learning/includes/3-azure-monitor-platform-metrics.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,8 @@
11
Azure Monitor collects and aggregates metrics from every component of Azure Machine Learning by default. Azure Monitor platform metrics provide a view of availability, performance, and resilience.
22

3-
Azure Monitor uses the concept of *resource types* to identify Azure resources. Resource types are also part of the resource IDs for every resource running in Azure. For example, one resource type for Azure Machine Learning is **Microsoft.MachineLearningServices/workspaces**.
3+
Azure Monitor uses the concept of *resource types* to identify Azure resources. Resource types are also part of the resource ID for every resource running in Azure. For example, one resource type for Azure Machine Learning is **Microsoft.MachineLearningServices/workspaces**.
44

5-
Azure Monitor organizes core monitoring data into metrics on resource types, also called *namespaces*. Metrics and logs are available for various resource types. The metric categories in the **Microsoft.MachineLearningServices/workspaces** resource are **Model**, **Quota**, **Resource**, **Run**, and **Traffic**. Quota information is for Machine Learning compute only. **Run** provides information on training runs for the workspace.
5+
Azure Monitor organizes core monitoring data into metrics on resource types, also called *namespaces*. Metrics and logs are available for various resource types. The metric categories in the **Microsoft.MachineLearningServices/workspaces** resource are **Model**, **Quota**, **Resource**, **Run**, and **Traffic**. Quota information is for Machine Learning compute only. The **Run** category provides information on training runs for the workspace.
66

77
You can use that data to analyze the performance of your Azure Machine Learning environment. For example, if you want to check how many cores a workspace is consuming:
88

learn-pr/advocates/monitor-azure-machine-learning/includes/4-azure-monitor-resource-logs.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
Resource logs provide insight into operations that an Azure resource completed, such as Azure Machine Learning. Logs are generated automatically, but you must route them to Log Analytics or a different service to save or query them.
1+
Resource logs provide insight into operations that an Azure resource (such as Azure Machine Learning) completed. Logs are generated automatically, but you must route them to Log Analytics or a different service to save or query them.
22

33
To route Azure Machine Learning logs to Log Analytics, perform the following steps:
44

@@ -34,7 +34,7 @@ After you configure the diagnostic setting, you can query the logs in Log Analyt
3434

3535
1. On the **New Query 1*** tab, select the dropdown menu on the right side to change **Simple mode** to **KQL**. (KQL stands for Kusto Query Language.)
3636

37-
1. In the KQL query editor, you can enter the query that you want to perform. For the example in this module, we check for records for a specific job name:
37+
1. In the KQL query editor, you can enter the query that you want to perform. The following example checks for records for a specific job name:
3838

3939
```kusto
4040
_AmlComputeJobEvent_
@@ -46,4 +46,4 @@ After you configure the diagnostic setting, you can query the logs in Log Analyt
4646

4747
![Screenshot of the KQL code in a log query in the Azure portal.](../media/log-query.png)
4848

49-
After you run the query, you can analyze the results on the **Results** pane.
49+
After you run the query, you can analyze the results on the **Results** tab.

learn-pr/advocates/monitor-azure-machine-learning/includes/5-azure-monitor-alerts.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ There are various types of alerts:
44

55
- *Metric alerts* evaluate resource metrics at regular intervals. Metrics can be platform metrics, custom metrics, logs from Azure Monitor converted to metrics, or Application Insights metrics. Metric alerts can also apply multiple conditions and dynamic thresholds.
66
- *Log alerts* enable users to use a Log Analytics query to evaluate resource logs at a predefined frequency.
7-
- *Activity log alerts* trigger when a new activity log event matches defined conditions. For example, activity log alerts can report on your service health and resource health.
7+
- *Activity log alerts* are triggered when a new activity log event matches defined conditions. For example, activity log alerts can report on service health and resource health.
88

99
When you're monitoring an Azure Machine Learning workspace, you might want to get an alert when a model deployment fails, when quota utilization exceeds a threshold, or when nodes are unusable.
1010

@@ -26,14 +26,14 @@ To create an alert:
2626

2727
1. For **Alert logic**, you can provide thresholds for when the condition is met to trigger the alert. For **Threshold**, enter **5**. This value means that more than five failed deployments trigger this alert. You can change the variables to meet your needs. When you finish, select **Next**.
2828

29-
1. On the **Actions** tab, make sure that **Use quick actions** is selected and provide the details on the right pane. Then select **Save** > **Next**.
29+
1. On the **Actions** tab, make sure that **Use quick actions** is selected, and provide the details on the right pane. Then select **Save** > **Next**.
3030

3131
1. On the **Details** tab, confirm the subscription and resource group to use. Select the appropriate severity level. For **Alert rule name**, provide the name of the alert and a description. Then select **Review + Create**.
3232

3333
The **Alerts** dashboard shows information if the alert is triggered.
3434

3535
![Screenshot of the Alert dashboard in an Azure Machine Learning workspace in the Azure portal.](../media/machine-learning-alerts.png)
3636

37-
If you configure an alert to send an email, you should receive a notification.
37+
If you configure an alert to send an email, you should receive a notification like the following example.
3838

3939
![Screenshot that shows the output of an Azure Monitor alert in the Azure portal.](../media/azure-monitor-alert.png)

learn-pr/advocates/monitor-azure-machine-learning/includes/7-azure-machine-learning-model-monitoring.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@ Azure Machine Learning provides the following capabilities for continuous model
1212
- **Ability to define custom monitoring signals**. If the built-in monitoring signals aren't suitable for your business scenario, you can define your own monitoring signal with a custom component.
1313
- **Flexibility to use production inference data from any source**. If you deploy models outside Azure Machine Learning or you deploy models to batch endpoints, you can still collect production inference data yourself to use in Azure Machine Learning model monitoring.
1414

15-
Each machine learning model and its use cases are unique. Therefore, model monitoring is unique for each situation. Here are recommended best practices for model monitoring:
15+
Each machine learning model and its use cases are unique. Therefore, model monitoring is unique for each situation. Here are best practices for model monitoring:
1616

1717
- **Start model monitoring immediately after you deploy a model to production**.
1818
- **Work with data scientists who are familiar with the model to set up monitoring**. Data scientists who have insight into the model and its use cases can recommend monitoring signals and metrics. They can set the right alert thresholds for each metric to avoid alert fatigue.

0 commit comments

Comments
 (0)