Skip to content

Commit 689f37c

Browse files
Merge pull request #2729 from fbsolo-ms1/freshness-updates
Freshness update for how-to-monitor-datasets.md . . .
2 parents c237015 + dffea97 commit 689f37c

File tree

1 file changed

+4
-16
lines changed

1 file changed

+4
-16
lines changed

articles/machine-learning/v1/how-to-monitor-datasets.md

Lines changed: 4 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -8,16 +8,15 @@ ms.subservice: mldata
88
ms.reviewer: franksolomon
99
ms.author: xunwan
1010
author: SturgeonMi
11-
ms.date: 08/08/2023
11+
ms.date: 02/04/2025
1212
ms.topic: how-to
1313
ms.custom: UpdateFrequency5, data4ml, sdkv1
1414
#Customer intent: As a data scientist, I want to detect data drift in my datasets and set alerts for when drift is large.
1515
---
1616

1717
# Data drift (preview) will be retired, and replaced by Model Monitor
1818

19-
Data drift(preview) will be retired at 09/01/2025, and you can start to use [Model Monitor](../how-to-monitor-model-performance.md) for your data drift tasks.
20-
Please check the content below to understand the replacement, feature gaps and manual change steps.
19+
Data (preview) will be retired at 09/01/2025, and you can start to use [Model Monitor](../how-to-monitor-model-performance.md) for your data drift tasks. Please check the content below to understand the replacement, feature gaps and manual change steps.
2120

2221
[!INCLUDE [sdk v1](../includes/machine-learning-sdk-v1.md)]
2322

@@ -55,7 +54,6 @@ To create and work with dataset monitors, you need:
5554
## Prerequisites (Migrate to Model Monitor)
5655
When you migrate to Model Monitor, please check the prerequisites as mentioned in this article [Prerequisites of Azure Machine Learning model monitoring](../how-to-monitor-model-performance.md#prerequisites).
5756

58-
5957
## What is data drift?
6058

6159
Model accuracy degrades over time, largely because of data drift. For machine learning models, data drift is the change in model input data that leads to model performance degradation. Monitoring data drift helps detect these model performance issues.
@@ -89,7 +87,7 @@ Conceptually, there are three primary scenarios for setting up dataset monitors
8987

9088
Scenario | Description
9189
---|---
92-
Monitor a model's serving data for drift from the training data | Results from this scenario can be interpreted as monitoring a proxy for the model's accuracy, since model accuracy degrades when the serving data drifts from the training data.
90+
Monitor serving data of a model for drift from the training data | Results from this scenario can be interpreted as monitoring a proxy for the model's accuracy, since model accuracy degrades when the serving data drifts from the training data.
9391
Monitor a time series dataset for drift from a previous time period. | This scenario is more general, and can be used to monitor datasets involved upstream or downstream of model building. The target dataset must have a timestamp column. The baseline dataset can be any tabular dataset that has features in common with the target dataset.
9492
Perform analysis on past data. | This scenario can be used to understand historical data and inform decisions in settings for dataset monitors.
9593

@@ -115,7 +113,6 @@ In Model Monitor, you can find corresponding concepts as following, and you can
115113
* Reference dataset: similar to your baseline dataset for data drift detection, it is set as the recent past production inference dataset.
116114
* Production inference data: similar to your target dataset in data drift detection, the production inference data can be collected automatically from models deployed in production. It can also be inference data you store.
117115

118-
119116
## Create target dataset
120117

121118
The target dataset needs the `timeseries` trait set on it by specifying the timestamp column either from a column in the data or a virtual column derived from the path pattern of the files. Create the dataset with a timestamp through the [Python SDK](#sdk-dataset) or [Azure Machine Learning studio](#studio-dataset). A column representing a "timestamp" must be specified to add `timeseries` trait to the dataset. If your data is partitioned into folder structure with time info, such as '{yyyy/MM/dd}', create a virtual column through the path pattern setting and set it as the "partition timestamp" to enable time series API functionality.
@@ -180,9 +177,6 @@ Not supported.
180177

181178
---
182179

183-
184-
185-
186180
## Create dataset monitor
187181

188182
Create a dataset monitor to detect and alert to data drift on a new dataset. Use either the [Python SDK](#sdk-monitor) or [Azure Machine Learning studio](#studio-monitor).
@@ -254,7 +248,6 @@ monitor = monitor.enable_schedule()
254248
> [!TIP]
255249
> For a full example of setting up a `timeseries` dataset and data drift detector, see our [example notebook](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/work-with-data/datasets-tutorial/train-with-datasets/train-with-datasets.ipynb).
256250
257-
258251
# [Studio](#tab/azure-studio)
259252
<a name="studio-monitor"></a>
260253

@@ -267,7 +260,7 @@ monitor = monitor.enable_schedule()
267260

268261
:::image type="content" source="media/how-to-monitor-datasets/wizard.png" alt-text="Create a monitor wizard":::
269262

270-
1. **Select target dataset**. The target dataset is a tabular dataset with timestamp column specified which to analyze for data drift. The target dataset must have features in common with the baseline dataset, and should be a `timeseries` dataset, which new data is appended to. Historical data in the target dataset can be analyzed, or new data can be monitored.
263+
1. **Select target dataset**. The target dataset is a tabular dataset with a timestamp column specified which to analyze for data drift. The target dataset must have features in common with the baseline dataset, and should be a `timeseries` dataset, which new data is appended to. Historical data in the target dataset can be analyzed, or new data can be monitored.
271264

272265
1. **Select baseline dataset.** Select the tabular dataset to be used as the baseline for comparison of the target dataset over time. The baseline dataset must have features in common with the target dataset. Select a time range to use a slice of the target dataset, or specify a separate dataset to use as the baseline.
273266

@@ -293,7 +286,6 @@ Not supported
293286

294287
---
295288

296-
297289
## Create Model Monitor (Migrate to Model Monitor)
298290
When you migrate to Model Monitor, if you have deployed your model to production in an Azure Machine Learning online endpoint and enabled [data collection](../how-to-collect-production-data.md) at deployment time, Azure Machine Learning collects production inference data, and automatically stores it in Microsoft Azure Blob Storage. You can then use Azure Machine Learning model monitoring to continuously monitor this production inference data, and you can directly choose the model to create target dataset (production inference data in Model Monitor).
299291

@@ -416,7 +408,6 @@ The following YAML contains the definition for the out-of-box model monitoring.
416408

417409
---
418410

419-
420411
## Create Model Monitor via custom data preprocessing component (Migrate to Model Monitor)
421412
When you migrate to Model Monitor, if you didn't deploy your model to production in an Azure Machine Learning online endpoint, or you don't want to use [data collection](../how-to-collect-production-data.md), you can also [set up model monitoring with custom signals and metrics](../how-to-monitor-model-performance.md#set-up-model-monitoring-with-custom-signals-and-metrics).
422413

@@ -439,8 +430,6 @@ Your custom preprocessing component must have these input and output signatures:
439430

440431
For an example of a custom data preprocessing component, see [custom_preprocessing in the azuremml-examples GitHub repo](https://github.com/Azure/azureml-examples/tree/main/cli/monitoring/components/custom_preprocessing).
441432

442-
443-
444433
## Understand data drift results
445434

446435
This section shows you the results of monitoring a dataset, found in the **Datasets** / **Dataset monitors** page in Azure studio. You can update the settings, and analyze existing data for a specific time period on this page.
@@ -449,7 +438,6 @@ Start with the top-level insights into the magnitude of data drift and a highlig
449438

450439
:::image type="content" source="media/how-to-monitor-datasets/drift-overview.png" alt-text="Drift overview":::
451440

452-
453441
| Metric | Description |
454442
| ------ | ----------- |
455443
| Data drift magnitude | A percentage of drift between the baseline and target dataset over time. This percentage ranges from 0 to 100, 0 indicates identical datasets and 100 indicates the Azure Machine Learning data drift model can completely tell the two datasets apart. Noise in the precise percentage measured is expected due to machine learning techniques being used to generate this magnitude. |

0 commit comments

Comments
 (0)