Skip to content

Commit bb3a76b

Browse files
committed
change MDC formatting
1 parent 44bad1e commit bb3a76b

File tree

1 file changed

+63
-63
lines changed

1 file changed

+63
-63
lines changed

articles/machine-learning/how-to-collect-production-data.md

Lines changed: 63 additions & 63 deletions
Original file line numberDiff line numberDiff line change
@@ -26,7 +26,6 @@ You can enable data collection for new or existing online endpoint deployments.
2626

2727
If you're interested in collecting production inference data for an MLflow model that is deployed to a real-time endpoint, see [Data collection for MLflow models](#collect-data-for-mlflow-models).
2828

29-
3029
## Prerequisites
3130

3231
# [Azure CLI](#tab/azure-cli)
@@ -206,7 +205,7 @@ def predict(input_df):
206205
return output_df
207206
```
208207

209-
### Collect data for model performance monitoring
208+
#### Collect data for model performance monitoring
210209

211210
If you want to use your collected data for model performance monitoring, it's important that each logged row has a unique `correlationid` that can be used to correlate the data with ground truth data, when such data becomes available. The data collector will autogenerate a unique `correlationid` for each logged row and include this autogenerated ID in the `correlationid` field in the JSON object. For more information on the JSON schema, see [store collected data in a blob](#store-collected-data-in-a-blob).
212211

@@ -322,7 +321,68 @@ For more information on how to format your deployment YAML for data collection w
322321

323322
For more information on how to format your deployment YAML for data collection with managed online endpoints, see [CLI (v2) managed online deployment YAML schema](reference-yaml-deployment-managed-online.md).
324323

325-
### Store collected data in a blob
324+
## Perform payload logging
325+
326+
In addition to custom logging with the provided Python SDK, you can collect request and response HTTP payload data directly without the need to augment your scoring script (`score.py`).
327+
328+
1. To enable payload logging, in your deployment YAML, use the names `request` and `response`:
329+
330+
```yml
331+
$schema: http://azureml/sdk-2-0/OnlineDeployment.json
332+
333+
endpoint_name: my_endpoint
334+
name: blue
335+
model: azureml:my-model-m1:1
336+
environment: azureml:env-m1:1
337+
data_collector:
338+
collections:
339+
request:
340+
enabled: 'True'
341+
response:
342+
enabled: 'True'
343+
```
344+
345+
1. Deploy the model with payload logging enabled:
346+
347+
```bash
348+
$ az ml online-deployment create -f deployment.YAML
349+
```
350+
351+
With payload logging, the collected data is not guaranteed to be in tabular format. Therefore, if you want to use collected payload data with model monitoring, you'll be required to provide a preprocessing component to make the data tabular. If you're interested in a seamless model monitoring experience, we recommend using the [custom logging Python SDK](#perform-custom-logging-for-model-monitoring).
352+
353+
As your deployment is used, the collected data flows to your workspace Blob storage. The following JSON code is an example of an HTTP _request_ collected:
354+
355+
```json
356+
{"specversion":"1.0",
357+
"id":"19790b87-a63c-4295-9a67-febb2d8fbce0",
358+
"source":"/subscriptions/d511f82f-71ba-49a4-8233-d7be8a3650f4/resourceGroups/mire2etesting/providers/Microsoft.MachineLearningServices/workspaces/mirmasterenvws/onlineEndpoints/localdev-endpoint/deployments/localdev",
359+
"type":"azureml.inference.request",
360+
"datacontenttype":"application/json",
361+
"time":"2022-05-25T08:59:48Z",
362+
"data":{"data": [ [1,2,3,4,5,6,7,8,9,10], [10,9,8,7,6,5,4,3,2,1]]},
363+
"path":"/score",
364+
"method":"POST",
365+
"contentrange":"bytes 0-59/*",
366+
"correlationid":"f6e806c9-1a9a-446b-baa2-901373162105","xrequestid":"f6e806c9-1a9a-446b-baa2-901373162105"}
367+
```
368+
369+
And the following JSON code is another example of an HTTP _response_ collected:
370+
371+
```json
372+
{"specversion":"1.0",
373+
"id":"bbd80e51-8855-455f-a719-970023f41e7d",
374+
"source":"/subscriptions/d511f82f-71ba-49a4-8233-d7be8a3650f4/resourceGroups/mire2etesting/providers/Microsoft.MachineLearningServices/workspaces/mirmasterenvws/onlineEndpoints/localdev-endpoint/deployments/localdev",
375+
"type":"azureml.inference.response",
376+
"datacontenttype":"application/json",
377+
"time":"2022-05-25T08:59:48Z",
378+
"data":[11055.977245525679, 4503.079536107787],
379+
"contentrange":"bytes 0-38/39",
380+
"correlationid":"f6e806c9-1a9a-446b-baa2-901373162105","xrequestid":"f6e806c9-1a9a-446b-baa2-901373162105"}
381+
```
382+
383+
## Store collected data in blob storage
384+
385+
Data collection allows you to log production inference data to a Blob storage destination of your choice. The data destination settings are configurable at the `collection_name` level.
326386

327387
__Blob storage output/format__:
328388

@@ -356,7 +416,6 @@ The collected data follows the following JSON schema. The collected data is avai
356416
> [!TIP]
357417
> Line breaks are shown only for readability. In your collected .jsonl files, there won't be any line breaks.
358418

359-
360419
#### Store large payloads
361420

362421
If the payload of your data is greater than 4 MB, there will be an event in the `{instance_id}.jsonl` file contained within the `{endpoint_name}/{deployment_name}/request/.../{instance_id}.jsonl` path that points to a raw file path, which should have the following path: `blob_url/{blob_container}/{blob_path}/{endpoint_name}/{deployment_name}/{rolled_time}/{instance_id}.jsonl`. The collected data will exist at this path.
@@ -419,65 +478,6 @@ To view the collected data in Blob Storage from the studio UI:
419478

420479
:::image type="content" source="./media/how-to-collect-production-data/data-view.png" alt-text="Screenshot highlights tree structure of data in Datastore" lightbox="media/how-to-collect-production-data/data-view.png":::
421480

422-
## Log payload
423-
424-
In addition to custom logging with the provided Python SDK, you can collect request and response HTTP payload data directly without the need to augment your scoring script (`score.py`).
425-
426-
1. To enable payload logging, in your deployment YAML, use the names `request` and `response`:
427-
428-
```yml
429-
$schema: http://azureml/sdk-2-0/OnlineDeployment.json
430-
431-
endpoint_name: my_endpoint
432-
name: blue
433-
model: azureml:my-model-m1:1
434-
environment: azureml:env-m1:1
435-
data_collector:
436-
collections:
437-
request:
438-
enabled: 'True'
439-
response:
440-
enabled: 'True'
441-
```
442-
443-
1. Deploy the model with payload logging enabled:
444-
445-
```bash
446-
$ az ml online-deployment create -f deployment.YAML
447-
```
448-
449-
With payload logging, the collected data is not guaranteed to be in tabular format. Therefore, if you want to use collected payload data with model monitoring, you'll be required to provide a preprocessing component to make the data tabular. If you're interested in a seamless model monitoring experience, we recommend using the [custom logging Python SDK](#perform-custom-logging-for-model-monitoring).
450-
451-
As your deployment is used, the collected data flows to your workspace Blob storage. The following JSON code is an example of an HTTP _request_ collected:
452-
453-
```json
454-
{"specversion":"1.0",
455-
"id":"19790b87-a63c-4295-9a67-febb2d8fbce0",
456-
"source":"/subscriptions/d511f82f-71ba-49a4-8233-d7be8a3650f4/resourceGroups/mire2etesting/providers/Microsoft.MachineLearningServices/workspaces/mirmasterenvws/onlineEndpoints/localdev-endpoint/deployments/localdev",
457-
"type":"azureml.inference.request",
458-
"datacontenttype":"application/json",
459-
"time":"2022-05-25T08:59:48Z",
460-
"data":{"data": [ [1,2,3,4,5,6,7,8,9,10], [10,9,8,7,6,5,4,3,2,1]]},
461-
"path":"/score",
462-
"method":"POST",
463-
"contentrange":"bytes 0-59/*",
464-
"correlationid":"f6e806c9-1a9a-446b-baa2-901373162105","xrequestid":"f6e806c9-1a9a-446b-baa2-901373162105"}
465-
```
466-
467-
And the following JSON code is another example of an HTTP _response_ collected:
468-
469-
```json
470-
{"specversion":"1.0",
471-
"id":"bbd80e51-8855-455f-a719-970023f41e7d",
472-
"source":"/subscriptions/d511f82f-71ba-49a4-8233-d7be8a3650f4/resourceGroups/mire2etesting/providers/Microsoft.MachineLearningServices/workspaces/mirmasterenvws/onlineEndpoints/localdev-endpoint/deployments/localdev",
473-
"type":"azureml.inference.response",
474-
"datacontenttype":"application/json",
475-
"time":"2022-05-25T08:59:48Z",
476-
"data":[11055.977245525679, 4503.079536107787],
477-
"contentrange":"bytes 0-38/39",
478-
"correlationid":"f6e806c9-1a9a-446b-baa2-901373162105","xrequestid":"f6e806c9-1a9a-446b-baa2-901373162105"}
479-
```
480-
481481
## Collect data for MLflow models
482482

483483
If you're deploying an MLflow model to an Azure Machine Learning online endpoint, you can enable production inference data collection with single toggle in the studio UI. If data collection is toggled on, Azure Machine Learning auto-instruments your scoring script with custom logging code to ensure that the production data is logged to your workspace Blob Storage. Your model monitors can then use the data to monitor the performance of your MLflow model in production.

0 commit comments

Comments
 (0)