Skip to content

Commit f931c60

Browse files
committed
edits to articles
1 parent 184dea8 commit f931c60

File tree

3 files changed

+18
-18
lines changed

3 files changed

+18
-18
lines changed

articles/machine-learning/concept-model-monitoring.md

Lines changed: 9 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -31,13 +31,13 @@ Azure Machine Learning provides the following capabilities for continuous model
3131
* **Built-in monitoring signals**. Model monitoring provides built-in monitoring signals for tabular data. These monitoring signals include data drift, prediction drift, data quality, and feature attribution drift.
3232
* **Out-of-box model monitoring setup with Azure Machine Learning online endpoint**. If you deploy your model to production in an Azure Machine Learning online endpoint, Azure Machine Learning collects production inference data automatically and uses it for continuous monitoring.
3333
* **Use of multiple monitoring signals for a broad view**. You can easily include several monitoring signals in one monitoring setup. For each monitoring signal, you can select your preferred metric(s) and fine-tune an alert threshold.
34-
* **Use of recent past production data or training data as comparison references data**. For monitoring signals, Azure Machine Learning lets you set reference data using recent past production data or training data.
35-
* **Monitoring top N features for data drift or data quality**. If you use training data as the reference data, you can define data drift or data quality signals layering over feature importance.
34+
* **Use of recent past production data or training data as reference data for comparison**. For monitoring signals, Azure Machine Learning lets you set reference data using recent past production data or training data.
35+
* **Monitoring of top N features for data drift or data quality**. If you use training data as the reference data, you can define data drift or data quality signals layering over feature importance.
3636
* **Flexibility to define your monitoring signal**. If the built-in monitoring signals aren't suitable for your business scenario, you can define your own monitoring signal with a custom monitoring signal component.
3737
* **Flexibility to use production inference data from any source**. If you deploy models outside of Azure Machine Learning, or if you deploy models to Azure Machine Learning batch endpoints, you can collect production inference data. You can then use the inference data in Azure Machine Learning for model monitoring.
3838
* **Flexibility to select data window**. You have the flexibility to select a data window for both the production data and the reference data.
39-
* By default, the data window for production data is your monitoring frequency. That is, all data collected in the past monitoring period before the monitoring job is run will be analyzed. You can use `production_data.data_window_size` property to adjust the data window for the production data if needed.
40-
* By default, the data window for the reference data is the full dataset. You can adjust the reference data window with `reference_data.data_window` property, both rolling data window and fixed data window are supported.
39+
* By default, the data window for production data is your monitoring frequency. That is, all data collected in the past monitoring period before the monitoring job is run will be analyzed. You can use the `production_data.data_window_size` property to adjust the data window for the production data, if needed.
40+
* By default, the data window for the reference data is the full dataset. You can adjust the reference data window with the `reference_data.data_window` property. Both rolling data window and fixed data window are supported.
4141

4242
## Monitoring signals and metrics
4343

@@ -49,16 +49,16 @@ Azure Machine Learning model monitoring (preview) supports the following list of
4949
| Data drift | Data drift tracks changes in the distribution of a model's input data by comparing it to the model's training data or recent past production data. | Jensen-Shannon Distance, Population Stability Index, Normalized Wasserstein Distance, Two-Sample Kolmogorov-Smirnov Test, Pearson's Chi-Squared Test | Classification (tabular data), Regression (tabular data) | Production data - model inputs | Recent past production data or training data |
5050
| Prediction drift | Prediction drift tracks changes in the distribution of a model's prediction outputs by comparing it to validation or test labeled data or recent past production data. | Jensen-Shannon Distance, Population Stability Index, Normalized Wasserstein Distance, Chebyshev Distance, Two-Sample Kolmogorov-Smirnov Test, Pearson's Chi-Squared Test | Classification (tabular data), Regression (tabular data) | Production data - model outputs | Recent past production data or validation data |
5151
| Data quality | Data quality tracks the data integrity of a model's input by comparing it to the model's training data or recent past production data. The data quality checks include checking for null values, type mismatch, or out-of-bounds of values. | Null value rate, data type error rate, out-of-bounds rate | Classification (tabular data), Regression (tabular data) | production data - model inputs | Recent past production data or training data |
52-
| Feature attribution drift | Feature attribution drift tracks the importance or contributions of features to prediction outputs in production by comparing it to feature importance at training time | Normalized discounted cumulative gain | Classification (tabular data), Regression (tabular data) | Production data - model inputs & outputs | Training data (required) |
53-
|[Generative AI: Generation safety and quality](./prompt-flow/how-to-monitor-generative-ai-applications.md)|Evaluates generative AI applications for safety & quality using GPT-assisted metrics|groundedness, relevance, fluency, similarity, coherence|text_question_answering| prompt, completion, context, and annotation template |N/A|
52+
| Feature attribution drift | Feature attribution drift tracks the contribution of features to predictions (also known as feature importance) during production by comparing it with feature importance during training.| Normalized discounted cumulative gain | Classification (tabular data), Regression (tabular data) | Production data - model inputs & outputs | Training data (required) |
53+
|[Generative AI: Generation safety and quality](./prompt-flow/how-to-monitor-generative-ai-applications.md)|Evaluates generative AI applications for safety & quality using GPT-assisted metrics.| Groundedness, relevance, fluency, similarity, coherence|text_question_answering| prompt, completion, context, and annotation template |N/A|
5454

5555

5656

5757
## How model monitoring works in Azure Machine Learning
5858

5959
Azure Machine Learning acquires monitoring signals by performing statistical computations on production inference data and reference data. This reference data can include the model's training data or validation data, while the production inference data refers to the model's input and output data collected in production.
6060

61-
The following steps describe an example of the statistical computation used to acquire monitoring signal about data drift for a model that's in production.
61+
The following steps describe an example of the statistical computation used to acquire a data drift signal for a model that's in production.
6262

6363
* For a feature in the training data, calculate the statistical distribution of its values. This distribution is the baseline distribution.
6464
* Calculate the statistical distribution of the feature's latest values that are seen in production.
@@ -79,9 +79,9 @@ Each machine learning model and its use cases are unique. Therefore, model monit
7979
* **Start model monitoring as soon as your model is deployed to production.**
8080
* **Work with data scientists that are familiar with the model to set up model monitoring.** Data scientists who have insight into the model and its use cases are in the best position to recommend monitoring signals and metrics as well as set the right alert thresholds for each metric (to avoid alert fatigue).
8181
* **Include multiple monitoring signals in your monitoring setup.** With multiple monitoring signals, you get both a broad view and granular view of monitoring. For example, you can combine both data drift and feature attribution drift signals to get an early warning about your model performance issue.
82-
* **Use model training data as the reference data.** For reference data used for comparison baseline, Azure Machine Learning allows you to use the recent past production data or historical data (such as training data or validation data). For a meaningful comparison, we recommend that you use the training data as the comparison baseline for data drift and data quality. For prediction drift, use the validation data as the comparison baseline.
82+
* **Use model training data as the reference data.** For reference data used as the comparison baseline, Azure Machine Learning allows you to use the recent past production data or historical data (such as training data or validation data). For a meaningful comparison, we recommend that you use the training data as the comparison baseline for data drift and data quality. For prediction drift, use the validation data as the comparison baseline.
8383
* **Specify the monitoring frequency based on how your production data will grow over time**. For example, if your production model has much traffic daily, and the daily data accumulation is sufficient for you to monitor, then you can set the monitoring frequency to daily. Otherwise, you can consider a weekly or monthly monitoring frequency, based on the growth of your production data over time.
84-
* **Monitor the top N important features or a subset of features.** If you use training data as the comparison baseline, you can easily specify to monitor top N features for data drift or data quality. For models that have a large number of features, consider monitoring a subset of those features to reduce computation cost and monitoring noise.
84+
* **Monitor the top N important features or a subset of features.** If you use training data as the comparison baseline, you can easily configure data drift monitoring or data quality monitoring for the top N features. For models that have a large number of features, consider monitoring a subset of those features to reduce computation cost and monitoring noise.
8585

8686
## Next steps
8787

articles/machine-learning/how-to-monitor-model-performance.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -267,8 +267,8 @@ create_monitor:
267267
feature_attribution_drift_signal:
268268
type: feature_attribution_drift
269269
# production_data: is not required input here
270-
# Please ensure AzureML online endpoint is enabled to collected both model_inputs and model_outputs data
271-
# AzureML model monitoring will automatically join both model_inputs and model_outputs data and used it for computation
270+
# Please ensure Azure Machine Learning online endpoint is enabled to collected both model_inputs and model_outputs data
271+
# Azure Machine Learning model monitoring will automatically join both model_inputs and model_outputs data and used it for computation
272272
reference_data:
273273
input_data:
274274
path: azureml:my_model_training_data:1
@@ -588,7 +588,7 @@ create_monitor:
588588
feature_attribution_drift_signal:
589589
type: feature_attribution_drift
590590
production_data:
591-
# using production_data collected outside of AzureML
591+
# using production_data collected outside of Azure Machine Learning
592592
- input_data:
593593
path: azureml:my_model_inputs:1
594594
type: uri_folder

0 commit comments

Comments
 (0)