Skip to content

Commit e2a0638

Browse files
authored
Merge pull request #111112 from buchananwp/patch-2
Monitoring - Refine Feature Attribution Drift setup
2 parents 7c67dc3 + d9a3343 commit e2a0638

File tree

4 files changed

+37
-18
lines changed

4 files changed

+37
-18
lines changed

articles/machine-learning/concept-model-monitoring.md

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -49,8 +49,11 @@ Azure Machine Learning model monitoring (preview) supports the following list of
4949
| Data drift | Data drift tracks changes in the distribution of a model's input data by comparing it to the model's training data or recent past production data. | Jensen-Shannon Distance, Population Stability Index, Normalized Wasserstein Distance, Two-Sample Kolmogorov-Smirnov Test, Pearson's Chi-Squared Test | Classification (tabular data), Regression (tabular data) | Production data - model inputs | Recent past production data or training data |
5050
| Prediction drift | Prediction drift tracks changes in the distribution of a model's prediction outputs by comparing it to validation or test labeled data or recent past production data. | Jensen-Shannon Distance, Population Stability Index, Normalized Wasserstein Distance, Chebyshev Distance, Two-Sample Kolmogorov-Smirnov Test, Pearson's Chi-Squared Test | Classification (tabular data), Regression (tabular data) | Production data - model outputs | Recent past production data or validation data |
5151
| Data quality | Data quality tracks the data integrity of a model's input by comparing it to the model's training data or recent past production data. The data quality checks include checking for null values, type mismatch, or out-of-bounds of values. | Null value rate, data type error rate, out-of-bounds rate | Classification (tabular data), Regression (tabular data) | production data - model inputs | Recent past production data or training data |
52-
| Feature attribution drift | Feature attribution drift tracks the importance or contributions of features to prediction outputs in production by comparing it to feature importance at training time | Normalized discounted cumulative gain | Classification (tabular data), Regression (tabular data) | Production data | Training data |
52+
| Feature attribution drift | Feature attribution drift tracks the importance or contributions of features to prediction outputs in production by comparing it to feature importance at training time | Normalized discounted cumulative gain | Classification (tabular data), Regression (tabular data) | Production data - model inputs & outputs (*see the following note*) | Training data (required) |
5353

54+
> [!NOTE]
55+
> For 'feature attribution drift' signal (during Preview), the user must create a custom data asset of type 'uri_folder' that contains joined inputs and outputs (Model Data Collector can be leveraged). Additionally, 'target_column_name' is also a required field, which specifies the prediction column in your training dataset.
56+
5457
## How model monitoring works in Azure Machine Learning
5558

5659
Azure Machine Learning acquires monitoring signals by performing statistical computations on production inference data and reference data. This reference data can include the model's training data or validation data, while the production inference data refers to the model's input and output data collected in production.
@@ -84,4 +87,4 @@ Each machine learning model and its use cases are unique. Therefore, model monit
8487

8588
- [Perform continuous model monitoring in Azure Machine Learning](how-to-monitor-model-performance.md)
8689
- [Model data collection](concept-data-collection.md)
87-
- [Collect production inference data](how-to-collect-production-data.md)
90+
- [Collect production inference data](how-to-collect-production-data.md)

articles/machine-learning/how-to-collect-production-data.md

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -74,6 +74,7 @@ First, you'll need to add custom logging code to your scoring script (`score.py`
7474
global inputs_collector, outputs_collector
7575
inputs_collector = Collector(name='model_inputs')
7676
outputs_collector = Collector(name='model_outputs')
77+
inputs_outputs_collector = Collector(name='model_inputs_outputs')
7778
```
7879

7980
By default, Azure Machine Learning raises an exception if there's a failure during data collection. Optionally, you can use the `on_error` parameter to specify a function to run if logging failure happens. For instance, using the `on_error` parameter in the following code, Azure Machine Learning logs the error rather than throwing an exception:
@@ -106,6 +107,7 @@ def init():
106107
# instantiate collectors with appropriate names, make sure align with deployment spec
107108
inputs_collector = Collector(name='model_inputs')
108109
outputs_collector = Collector(name='model_outputs')
110+
inputs_outputs_collector = Collector(name='model_inputs_outputs') #note: this is used to enable Feature Attribution Drift
109111

110112
def run(data):
111113
# json data: { "data" : { "col1": [1,2,3], "col2": [2,3,4] } }
@@ -122,6 +124,13 @@ def run(data):
122124

123125
# collect outputs data, pass in correlation_context so inputs and outputs data can be correlated later
124126
outputs_collector.collect(output_df, context)
127+
128+
# create a dataframe with inputs/outputs joined - this creates a URI folder (not mltable)
129+
# input_output_df = input_df.merge(output_df, context)
130+
input_output_df = input_df.join(output_df)
131+
132+
# collect both your inputs and output
133+
inputs_outputs_collector.collect(input_output_df, context)
125134

126135
return output_df.to_dict()
127136

articles/machine-learning/how-to-monitor-model-performance.md

Lines changed: 23 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -66,7 +66,7 @@ Before following the steps in this article, make sure you have the following pre
6666
>
6767
> Model monitoring jobs are scheduled to run on serverless Spark compute pool with `Standard_E4s_v3` VM instance type support only. More VM instance type support will come in the future roadmap.
6868
69-
## Set up out-of-box model monitoring
69+
## Set up out-of-the-box model monitoring
7070

7171
If you deploy your model to production in an Azure Machine Learning online endpoint, Azure Machine Learning collects production inference data automatically and uses it for continuous monitoring.
7272

@@ -79,6 +79,9 @@ You can use Azure CLI, the Python SDK, or Azure Machine Learning studio for out-
7979
* smart defaults for metrics and thresholds.
8080
* A monitoring job is scheduled to run daily at 3:15am (for this example) to acquire monitoring signals and evaluate each metric result against its corresponding threshold. By default, when any threshold is exceeded, an alert email is sent to the user who set up the monitoring.
8181

82+
## Configure feature importance
83+
84+
For feature importance to be enabled with any of your signals (such as data drift or data quality,) you need to provide both the 'baseline_dataset' (typically training) dataset and 'target_column_name' fields.
8285

8386
# [Azure CLI](#tab/azure-cli)
8487

@@ -88,7 +91,7 @@ Azure Machine Learning model monitoring uses `az ml schedule` for model monitori
8891
az ml schedule create -f ./out-of-box-monitoring.yaml
8992
```
9093

91-
The following YAML contains the definition for out-of-box model monitoring.
94+
The following YAML contains the definition for out-of-the-box model monitoring.
9295

9396
```yaml
9497
# out-of-box-monitoring.yaml
@@ -117,7 +120,7 @@ create_monitor:
117120
118121
# [Python](#tab/python)
119122
120-
You can use the following code to set up out-of-box model monitoring:
123+
You can use the following code to set up out-of-the-box model monitoring:
121124
122125
```python
123126

@@ -269,18 +272,18 @@ create_monitor:
269272
dataset:
270273
input_dataset:
271274
path: azureml:my_model_production_data:1
272-
type: mltable
273-
dataset_context: model_inputs
275+
type: uri_folder
276+
dataset_context: model_inputs_outputs
274277
baseline_dataset:
275278
input_dataset:
276279
path: azureml:my_model_training_data:1
277280
type: mltable
278-
dataset_context: model_inputs
281+
dataset_context: training
279282
target_column_name: fraud_detected
280283
model_type: classification
281284
# if no metric_thresholds defined, use the default metric_thresholds
282285
metric_thresholds:
283-
threshold: 0.05
286+
threshold: 0.9
284287

285288
alert_notification:
286289
emails:
@@ -384,10 +387,10 @@ advanced_data_quality = DataQualitySignal(
384387
monitor_target_data = TargetDataset(
385388
dataset=MonitorInputData(
386389
input_dataset=Input(
387-
type="mltable",
388-
path="azureml:my_model_production_data:1"
390+
type="uri_folder",
391+
path="azureml:endpoint_name-deployment_name-model_inputs_outputs:1"
389392
),
390-
dataset_context=MonitorDatasetContext.MODEL_INPUTS,
393+
dataset_context=MonitorDatasetContext.MODEL_INPUTS_OUTPUTS,
391394
)
392395
)
393396
monitor_baseline_data = MonitorInputData(
@@ -398,7 +401,7 @@ monitor_baseline_data = MonitorInputData(
398401
target_column_name="fraud_detected",
399402
dataset_context=MonitorDatasetContext.TRAINING,
400403
)
401-
metric_thresholds = FeatureAttributionDriftMetricThreshold(threshold=0.05)
404+
metric_thresholds = FeatureAttributionDriftMetricThreshold(threshold=0.9)
402405

403406
feature_attribution_drift = FeatureAttributionDriftSignal(
404407
target_dataset=monitor_target_data,
@@ -447,7 +450,7 @@ created_monitor = poller.result()
447450

448451
# [Studio](#tab/azure-studio)
449452

450-
1. Complete the entires on the basic settings page as described in the [Set up out-of-box model monitoring](#set-up-out-of-box-model-monitoring) section.
453+
1. Complete the entires on the basic settings page as described in the [Set up out-of-box model monitoring](#set-up-out-of-the-box-model-monitoring) section.
451454
1. Select **More options** to open the advanced setup wizard.
452455

453456
1. In the "Configure dataset" section, add a dataset to be used as the comparison baseline. We recommend using the model training data as the comparison baseline for data drift and data quality, and using the model validation data as the comparison baseline for prediction drift.
@@ -471,18 +474,22 @@ created_monitor = poller.result()
471474

472475
1. Select **Add** to add another signal.
473476
1. In the "Add Signal" screen, select the **Feature Attribution Drift** panel.
474-
1. Enter a name for Feature Attribution Drift signal.
477+
1. Enter a name for Feature Attribution Drift signal. Feature attribution drift currently requires a few additional steps:
478+
1. Configure your data assets for Feature Attribution Drift
479+
1. In your model creation wizard, add your custom data asset from your [custom data collection](how-to-collect-production-data.md) called 'model inputs and outputs' which combines your joined model inputs and data assets as a separate data context.
480+
481+
:::image type="content" source="media/how-to-monitor-models/feature-attribution-drift-inputs-outputs.png" alt-text="Screenshot showing how to configure a custom data asset with inputs and outputs joined." lightbox="media/how-to-monitor-models/feature-attribution-drift-inputs-outputs.png":::
482+
483+
1. Specify your training reference dataset that will be used in the feature attribution drift component, and select your 'target column name' field, which is required to enable feature importance.
484+
1. Confirm your parameters are correct
475485
1. Adjust the data window size according to your business case.
476-
1. Select the training data as the baseline dataset.
477-
1. Select the target column name.
478486
1. Adjust the threshold according to your need.
479487
1. Select **Save** to return to the "Select monitoring signals" section.
480488
1. If you're done with editing or adding signals, select **Next**.
481489

482490
:::image type="content" source="media/how-to-monitor-models/model-monitoring-advanced-config-add-signal.png" alt-text="Screenshot showing settings for adding signals." lightbox="media/how-to-monitor-models/model-monitoring-advanced-config-add-signal.png":::
483491

484492
1. In the "Notification" screen, enable alert notification for each signal.
485-
1. (Optional) Enable "Azure Monitor" for all metrics to be sent to Azure Monitor.
486493
1. Select **Next**.
487494

488495
:::image type="content" source="media/how-to-monitor-models/model-monitoring-advanced-config-notification.png" alt-text="Screenshot of settings on the notification screen." lightbox="media/how-to-monitor-models/model-monitoring-advanced-config-notification.png":::
149 KB
Loading

0 commit comments

Comments
 (0)