You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/machine-learning/how-to-collect-production-data.md
+63-63Lines changed: 63 additions & 63 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -26,7 +26,6 @@ You can enable data collection for new or existing online endpoint deployments.
26
26
27
27
If you're interested in collecting production inference data for an MLflow model that is deployed to a real-time endpoint, see [Data collection for MLflow models](#collect-data-for-mlflow-models).
28
28
29
-
30
29
## Prerequisites
31
30
32
31
# [Azure CLI](#tab/azure-cli)
@@ -206,7 +205,7 @@ def predict(input_df):
206
205
return output_df
207
206
```
208
207
209
-
### Collect data for model performance monitoring
208
+
####Collect data for model performance monitoring
210
209
211
210
If you want to use your collected data for model performance monitoring, it's important that each logged row has a unique `correlationid` that can be used to correlate the data with ground truth data, when such data becomes available. The data collector will autogenerate a unique `correlationid` for each logged row and include this autogenerated ID in the `correlationid` field in the JSON object. For more information on the JSON schema, see [store collected data in a blob](#store-collected-data-in-a-blob).
212
211
@@ -322,7 +321,68 @@ For more information on how to format your deployment YAML for data collection w
322
321
323
322
For more information on how to format your deployment YAML for data collection with managed online endpoints, see [CLI (v2) managed online deployment YAML schema](reference-yaml-deployment-managed-online.md).
324
323
325
-
### Store collected data in a blob
324
+
## Perform payload logging
325
+
326
+
In addition to custom logging with the provided Python SDK, you can collect request and response HTTP payload data directly without the need to augment your scoring script (`score.py`).
327
+
328
+
1. To enable payload logging, in your deployment YAML, use the names `request` and `response`:
$ az ml online-deployment create -f deployment.YAML
349
+
```
350
+
351
+
With payload logging, the collected data is not guaranteed to be in tabular format. Therefore, if you want to use collected payload data with model monitoring, you'll be required to provide a preprocessing component to make the data tabular. If you're interested in a seamless model monitoring experience, we recommend using the [custom logging Python SDK](#perform-custom-logging-for-model-monitoring).
352
+
353
+
As your deployment is used, the collected data flows to your workspace Blob storage. The following JSON code is an example of an HTTP _request_ collected:
Data collection allows you to log production inference data to a Blob storage destination of your choice. The data destination settings are configurable at the `collection_name` level.
326
386
327
387
__Blob storage output/format__:
328
388
@@ -356,7 +416,6 @@ The collected data follows the following JSON schema. The collected data is avai
356
416
> [!TIP]
357
417
> Line breaks are shown only for readability. In your collected .jsonl files, there won't be any line breaks.
358
418
359
-
360
419
#### Store large payloads
361
420
362
421
If the payload of your data is greater than 4 MB, there will be an event in the `{instance_id}.jsonl` file contained within the `{endpoint_name}/{deployment_name}/request/.../{instance_id}.jsonl` path that points to a raw file path, which should have the following path: `blob_url/{blob_container}/{blob_path}/{endpoint_name}/{deployment_name}/{rolled_time}/{instance_id}.jsonl`. The collected data will exist at this path.
@@ -419,65 +478,6 @@ To view the collected data in Blob Storage from the studio UI:
419
478
420
479
:::image type="content" source="./media/how-to-collect-production-data/data-view.png" alt-text="Screenshot highlights tree structure of data in Datastore" lightbox="media/how-to-collect-production-data/data-view.png":::
421
480
422
-
## Log payload
423
-
424
-
In addition to custom logging with the provided Python SDK, you can collect request and response HTTP payload data directly without the need to augment your scoring script (`score.py`).
425
-
426
-
1. To enable payload logging, in your deployment YAML, use the names `request` and `response`:
$ az ml online-deployment create -f deployment.YAML
447
-
```
448
-
449
-
With payload logging, the collected data is not guaranteed to be in tabular format. Therefore, if you want to use collected payload data with model monitoring, you'll be required to provide a preprocessing component to make the data tabular. If you're interested in a seamless model monitoring experience, we recommend using the [custom logging Python SDK](#perform-custom-logging-for-model-monitoring).
450
-
451
-
As your deployment is used, the collected data flows to your workspace Blob storage. The following JSON code is an example of an HTTP _request_ collected:
If you're deploying an MLflow model to an Azure Machine Learning online endpoint, you can enable production inference data collection with single toggle in the studio UI. If data collection is toggled on, Azure Machine Learning auto-instruments your scoring script with custom logging code to ensure that the production data is logged to your workspace Blob Storage. Your model monitors can then use the data to monitor the performance of your MLflow model in production.
0 commit comments