You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/machine-learning/concept-model-monitoring.md
+19-20Lines changed: 19 additions & 20 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -10,13 +10,13 @@ ms.subservice: mlops
10
10
ms.reviewer: mopeakande
11
11
reviewer: msakande
12
12
ms.topic: conceptual
13
-
ms.date: 05/23/2023
13
+
ms.date: 09/15/2023
14
14
ms.custom: devplatv2
15
15
---
16
16
17
17
# Model monitoring with Azure Machine Learning (preview)
18
18
19
-
In this article, you'll learn about model monitoring in Azure Machine Learning, the signals and metrics you can monitor, and the recommended practices for using model monitoring.
19
+
In this article, you learn about model monitoring in Azure Machine Learning, the signals and metrics you can monitor, and the recommended practices for using model monitoring.
@@ -31,35 +31,34 @@ Azure Machine Learning provides the following capabilities for continuous model
31
31
***Built-in monitoring signals**. Model monitoring provides built-in monitoring signals for tabular data. These monitoring signals include data drift, prediction drift, data quality, and feature attribution drift.
32
32
***Out-of-box model monitoring setup with Azure Machine Learning online endpoint**. If you deploy your model to production in an Azure Machine Learning online endpoint, Azure Machine Learning collects production inference data automatically and uses it for continuous monitoring.
33
33
***Use of multiple monitoring signals for a broad view**. You can easily include several monitoring signals in one monitoring setup. For each monitoring signal, you can select your preferred metric(s) and fine-tune an alert threshold.
34
-
***Use of recent past production data or training data as comparison baseline dataset**. For model signals and metrics, Azure Machine Learning lets you set these datasets as the baseline dataset for comparison.
35
-
***Monitoring of data drift or data quality for top n features**. If you use training data as the comparison baseline dataset, you can define data drift or data quality layering over feature importance.
36
-
***Monitoring of data drift for a population subset**. For some ML models, data drift can occur only for a subset of the population. This can make data drift go undetected and its impact subtle. For such ML models, it's important to monitor drift for specific subsets of the population.
34
+
***Use of recent past production data or training data as reference data for comparison**. For monitoring signals, Azure Machine Learning lets you set reference data using recent past production data or training data.
35
+
***Monitoring of top N features for data drift or data quality**. If you use training data as the reference data, you can define data drift or data quality signals layering over feature importance.
37
36
***Flexibility to define your monitoring signal**. If the built-in monitoring signals aren't suitable for your business scenario, you can define your own monitoring signal with a custom monitoring signal component.
38
-
***Flexibility to bring your own production inference data**. If you deploy models outside of Azure Machine Learning, or if you deploy models to Azure Machine Learning batch endpoints, you can collect production inference data and use that data in Azure Machine Learning for model monitoring.
39
-
***Flexibility to select data window**. You have the flexibility to select a data window for both the target dataset and the baseline dataset.
40
-
* By default, the data window for production inference data (the target dataset) is your monitoring frequency. That is, all data collected in the past monitoring period before the monitoring job is run will be used as the target dataset. You can use `data_window_size` to adjust the data window for the target dataset if needed.
41
-
* By default, the data window for the baseline dataset is the full dataset. You can adjust the data window by using either the date range or the `trailing_days` parameter.
37
+
***Flexibility to use production inference data from any source**. If you deploy models outside of Azure Machine Learning, or if you deploy models to Azure Machine Learning batch endpoints, you can collect production inference data. You can then use the inference data in Azure Machine Learning for model monitoring.
38
+
***Flexibility to select data window**. You have the flexibility to select a data window for both the production data and the reference data.
39
+
* By default, the data window for production data is your monitoring frequency. That is, all data collected in the past monitoring period before the monitoring job is run will be analyzed. You can use the `production_data.data_window_size`property to adjust the data window for the production data, if needed.
40
+
* By default, the data window for the reference data is the full dataset. You can adjust the reference data window with the `reference_data.data_window` property. Both rolling data window and fixed data window are supported.
42
41
43
42
## Monitoring signals and metrics
44
43
45
44
Azure Machine Learning model monitoring (preview) supports the following list of monitoring signals and metrics:
46
45
47
-
|Monitoring signal | Description | Metrics | Model task type (supported data format) | Target dataset | Baseline dataset |
46
+
47
+
|Monitoring signal | Description | Metrics | Model tasks (supported data format) | Production data | Reference data |
48
48
|--|--|--|--|--|--|
49
49
| Data drift | Data drift tracks changes in the distribution of a model's input data by comparing it to the model's training data or recent past production data. | Jensen-Shannon Distance, Population Stability Index, Normalized Wasserstein Distance, Two-Sample Kolmogorov-Smirnov Test, Pearson's Chi-Squared Test | Classification (tabular data), Regression (tabular data) | Production data - model inputs | Recent past production data or training data |
50
50
| Prediction drift | Prediction drift tracks changes in the distribution of a model's prediction outputs by comparing it to validation or test labeled data or recent past production data. | Jensen-Shannon Distance, Population Stability Index, Normalized Wasserstein Distance, Chebyshev Distance, Two-Sample Kolmogorov-Smirnov Test, Pearson's Chi-Squared Test | Classification (tabular data), Regression (tabular data) | Production data - model outputs | Recent past production data or validation data |
51
51
| Data quality | Data quality tracks the data integrity of a model's input by comparing it to the model's training data or recent past production data. The data quality checks include checking for null values, type mismatch, or out-of-bounds of values. | Null value rate, data type error rate, out-of-bounds rate | Classification (tabular data), Regression (tabular data) | production data - model inputs | Recent past production data or training data |
52
-
| Feature attribution drift | Feature attribution drift tracks the importance or contributions of features to prediction outputs in production by comparing it to feature importance at training time | Normalized discounted cumulative gain | Classification (tabular data), Regression (tabular data) | Production data - model inputs & outputs (*see the following note*) | Training data (required) |
53
-
|[Generative AI: Generation safety and quality](./prompt-flow/how-to-monitor-generative-ai-applications.md)|Evaluates generative AI applications for safety & quality using GPT-assisted metrics|groundedness, relevance, fluency, similarity, coherence|text_question_answering| prompt, completion, context, and annotation template |N/A|
52
+
| Feature attribution drift | Feature attribution drift tracks the contribution of features to predictions (also known as feature importance) during production by comparing it with feature importance during training.| Normalized discounted cumulative gain | Classification (tabular data), Regression (tabular data) | Production data - model inputs & outputs | Training data (required) |
53
+
|[Generative AI: Generation safety and quality](./prompt-flow/how-to-monitor-generative-ai-applications.md)|Evaluates generative AI applications for safety & quality using GPT-assisted metrics.| Groundedness, relevance, fluency, similarity, coherence|text_question_answering| prompt, completion, context, and annotation template |N/A|
54
+
54
55
55
-
> [!NOTE]
56
-
> For 'feature attribution drift' signal (during Preview), the user must create a custom data asset of type 'uri_folder' that contains joined inputs and outputs (Model Data Collector can be leveraged). Additionally, 'target_column_name' is also a required field, which specifies the prediction column in your training dataset.
57
56
58
57
## How model monitoring works in Azure Machine Learning
59
58
60
59
Azure Machine Learning acquires monitoring signals by performing statistical computations on production inference data and reference data. This reference data can include the model's training data or validation data, while the production inference data refers to the model's input and output data collected in production.
61
60
62
-
The following steps describe an example of the statistical computation used to acquire monitoring signals about data drift for a model that's in production.
61
+
The following steps describe an example of the statistical computation used to acquire a data drift signal for a model that's in production.
63
62
64
63
* For a feature in the training data, calculate the statistical distribution of its values. This distribution is the baseline distribution.
65
64
* Calculate the statistical distribution of the feature's latest values that are seen in production.
@@ -71,18 +70,18 @@ The following steps describe an example of the statistical computation used to a
71
70
Take the following steps to enable model monitoring in Azure Machine Learning:
72
71
73
72
***Enable production inference data collection.** If you deploy a model to an Azure Machine Learning online endpoint, you can enable production inference data collection by using Azure Machine Learning [Model Data Collection](concept-data-collection.md). However, if you deploy a model outside of Azure Machine Learning or to an Azure Machine Learning batch endpoint, you're responsible for collecting production inference data. You can then use this data for Azure Machine Learning model monitoring.
74
-
***Set up model monitoring.** You can use SDK/CLI 2.0 or the studio UI to easily set up model monitoring. During the setup, you can specify your preferred monitoring signals and metrics and set the alert threshold for each metric.
73
+
***Set up model monitoring.** You can use SDK/CLI 2.0 or the studio UI to easily set up model monitoring. During the setup, you can specify your preferred monitoring signals and customize metrics and thresholds for each signal.
75
74
***View and analyze model monitoring results.** Once model monitoring is set up, a monitoring job is scheduled to run at your specified frequency. Each run computes and evaluates metrics for all selected monitoring signals and triggers alert notifications when any specified threshold is exceeded. You can follow the link in the alert notification to your Azure Machine Learning workspace to view and analyze monitoring results.
76
75
77
76
## Recommended best practices for model monitoring
78
77
79
78
Each machine learning model and its use cases are unique. Therefore, model monitoring is unique for each situation. The following is a list of recommended best practices for model monitoring:
80
79
***Start model monitoring as soon as your model is deployed to production.**
81
-
***Work with data scientists that are familiar with the model to set up model monitoring.**These data scientists have insight into the model and its use cases and are best positioned to recommend monitoring signals and metrics as well as set the right alert thresholds for each metric—to avoid alert fatigue.
82
-
***Include multiple monitoring signals in your monitoring setup.** With multiple monitoring signals, you get both a broad view and granular view of monitoring. For example, you can combine both data drift and feature attribution drift signals to get an early warning about your model performance issue. With data drift cohort analysis signal, you can get a granular view about a certain data segment.
83
-
***Use model training data as the baseline dataset.** For comparison based on the baseline dataset, Azure Machine Learning allows you to use the recent past production data or historical data (such as training data or validation data). For a meaningful comparison, we recommend that you use the training data as the comparison baseline for data drift and data quality. For prediction drift, use the validation data as the comparison baseline.
80
+
***Work with data scientists that are familiar with the model to set up model monitoring.**Data scientists who have insight into the model and its use cases are in the best position to recommend monitoring signals and metrics as well as set the right alert thresholds for each metric (to avoid alert fatigue).
81
+
***Include multiple monitoring signals in your monitoring setup.** With multiple monitoring signals, you get both a broad view and granular view of monitoring. For example, you can combine both data drift and feature attribution drift signals to get an early warning about your model performance issue.
82
+
***Use model training data as the reference data.** For reference data used as the comparison baseline, Azure Machine Learning allows you to use the recent past production data or historical data (such as training data or validation data). For a meaningful comparison, we recommend that you use the training data as the comparison baseline for data drift and data quality. For prediction drift, use the validation data as the comparison baseline.
84
83
***Specify the monitoring frequency based on how your production data will grow over time**. For example, if your production model has much traffic daily, and the daily data accumulation is sufficient for you to monitor, then you can set the monitoring frequency to daily. Otherwise, you can consider a weekly or monthly monitoring frequency, based on the growth of your production data over time.
85
-
***Monitor the top N important features or a subset of features.** If you use training data as the comparison baseline, by default, Azure Machine Learning monitors data drift or data quality for the top 10 important features. For models that have a large number of features, consider monitoring a subset of those features to reduce computation cost and monitoring noise.
84
+
***Monitor the top N important features or a subset of features.** If you use training data as the comparison baseline, you can easily configure data drift monitoring or data quality monitoring for the top N features. For models that have a large number of features, consider monitoring a subset of those features to reduce computation cost and monitoring noise.
0 commit comments