Skip to content

Commit b5e1c45

Browse files
committed
Edit content
1 parent c30a0db commit b5e1c45

File tree

6 files changed

+76
-77
lines changed

6 files changed

+76
-77
lines changed

articles/machine-learning/how-to-monitor-online-endpoints.md

Lines changed: 38 additions & 39 deletions
Original file line numberDiff line numberDiff line change
@@ -8,8 +8,8 @@ ms.reviewer: None
88
author: msakande
99
ms.author: mopeakande
1010
ms.subservice: mlops
11-
ms.date: 10/24/2023
12-
ms.topic: conceptual
11+
ms.date: 12/30/2024
12+
ms.topic: how-to
1313
ms.custom: how-to, devplatv2
1414
---
1515

@@ -23,22 +23,21 @@ Azure Machine Learning uses integration with Azure Monitor to track and monitor
2323

2424
* **Application insights**: Curated environments include integration with Application Insights, and you can enable or disable this integration when you create an online deployment. Built-in metrics and logs are sent to Application Insights, and you can use the built-in features of Application Insights (such as Live metrics, Transaction search, Failures, and Performance) for further analysis.
2525

26-
In this article you learn how to:
26+
In this article you see how to:
2727

28-
> [!div class="checklist"]
29-
> * Choose the right method to view and track metrics and logs
30-
> * View metrics for your online endpoint
31-
> * Create a dashboard for your metrics
32-
> * Create a metric alert
33-
> * View logs for your online endpoint
34-
> * Use Application Insights to track metrics and logs
28+
* Choose the right method to view and track metrics and logs.
29+
* View metrics for your online endpoint.
30+
* Create a dashboard for your metrics.
31+
* Create a metric alert.
32+
* View logs for your online endpoint.
33+
* Use Application Insights to track metrics and logs.
3534

3635
## Prerequisites
3736

38-
- An Azure Machine Learning online endpoint.
39-
- At least [Reader access](/azure/role-based-access-control/role-assignments-portal) on the endpoint.
37+
- An Azure Machine Learning online endpoint
38+
- At least [Reader access](/azure/role-based-access-control/role-assignments-portal) on the endpoint
4039

41-
## Metrics
40+
## Use metrics
4241

4342
In the Azure portal, you can view metrics pages for online endpoints or deployments. An easy way to access these metrics pages is through links that are available in the Azure Machine Learning studio user interface. You can find these links in the **Details** tab of an endpoint's page. These links lead to the metrics page in the Azure portal for the endpoint or deployment. Alternatively, you can also go to the Azure portal and search for the metrics page for the endpoint or deployment.
4443

@@ -78,33 +77,33 @@ To access metrics directly from the Azure portal, take the following steps:
7877

7978
The metrics that you see depend on the resource that you select. Metrics are scoped differently for online endpoints and online deployments.
8079

81-
#### Metrics at endpoint scope
80+
#### Metrics at the endpoint scope
8281

83-
[!INCLUDE [Microsoft.MachineLearningServices/workspaces](~/reusable-content/ce-skilling/azure/includes/azure-monitor/reference/metrics/microsoft-machinelearningservices-workspaces-onlineendpoints-metrics-include.md)]
82+
For information about metrics that are available at the online endpoint scope, see [Supported metrics for Microsoft.MachineLearningServices/workspaces/onlineEndpoints](monitor-azure-machine-learning-reference.md#supported-metrics-for-microsoftmachinelearningservicesworkspacesonlineendpoints).
8483

85-
**Bandwidth throttling**
84+
##### Bandwidth throttling
8685

87-
Bandwidth will be throttled if the quota limits are exceeded for _managed_ online endpoints. For more information on limits, see the article on [limits for online endpoints](how-to-manage-quotas.md#azure-machine-learning-online-endpoints-and-batch-endpoints). To determine if requests are throttled:
88-
- Monitor the "Network bytes" metric
89-
- The response trailers will have the fields: `ms-azureml-bandwidth-request-delay-ms` and `ms-azureml-bandwidth-response-delay-ms`. The values of the fields are the delays, in milliseconds, of the bandwidth throttling.
86+
Bandwidth is throttled if quota limits are exceeded for _managed_ online endpoints. For more information about limits for online endpoints, see [Azure Machine Learning online endpoints and batch endpoints](how-to-manage-quotas.md#azure-machine-learning-online-endpoints-and-batch-endpoints) in the article about quotas and limits in Azure Machine Learning. To determine whether requests are throttled:
87+
- Monitor the Network bytes metric.
88+
- Check for the following fields in the response trailers: `ms-azureml-bandwidth-request-delay-ms` and `ms-azureml-bandwidth-response-delay-ms`. The values of the fields are the delays, in milliseconds, of the bandwidth throttling.
9089

9190
For more information, see [Bandwidth limit issues](how-to-troubleshoot-online-endpoints.md#bandwidth-limit-issues).
9291

93-
#### Metrics at deployment scope
92+
#### Metrics at the deployment scope
9493

95-
[!INCLUDE [Microsoft.MachineLearningServices/workspaces/onlineEndpoints/deployments](~/reusable-content/ce-skilling/azure/includes/azure-monitor/reference/metrics/microsoft-machinelearningservices-workspaces-onlineendpoints-deployments-metrics-include.md)]
94+
For information about metrics that are available at the online endpoint scope, see [Supported metrics for Microsoft.MachineLearningServices/workspaces/onlineEndpoints/deployments](monitor-azure-machine-learning-reference.md#supported-metrics-for-microsoftmachinelearningservicesworkspacesonlineendpointsdeployments).
9695

9796
### Create dashboards and alerts
9897

99-
Azure Monitor allows you to create dashboards and alerts that are based on metrics.
98+
In Azure Monitor, you can create dashboards and alerts that are based on metrics.
10099

101100
#### Create dashboards and visualize queries
102101

103-
You can create custom dashboards and visualize metrics from multiple sources in the Azure portal, including the metrics for your online endpoint. For more information on creating dashboards and visualizing queries, see [Dashboards using log data](/azure/azure-monitor/visualize/tutorial-logs-dashboards) and [Dashboards using application data](/azure/azure-monitor/app/overview-dashboard#create-custom-kpi-dashboards-using-application-insights).
102+
You can create custom dashboards so that you can visualize metrics from multiple sources in the Azure portal, including the metrics for your online endpoint. For more information about creating dashboards and visualizing queries, see [Dashboards using log data](/azure/azure-monitor/visualize/tutorial-logs-dashboards) and [Dashboards using application data](/azure/azure-monitor/app/overview-dashboard#create-custom-kpi-dashboards-using-application-insights).
104103

105104
#### Create alerts
106105

107-
You can also create custom alerts so you can receive notifications about important status updates to your online endpoint:
106+
You can also create custom alerts so that you can receive notifications about important status updates to your online endpoint:
108107

109108
1. In the Azure portal, go to a metrics page, and then select **New alert rule**.
110109

@@ -124,20 +123,20 @@ You can also create custom alerts so you can receive notifications about importa
124123

125124
You can configure deployments to scale automatically based on metrics. To turn on the autoscale feature, you can use the UI or code. The options for code are the Azure Machine Learning CLI and the Azure Machine Learning SDK for Python. When you use code, you provide the IDs of metrics in the conditions for triggering automatic scaling. For those IDs, you can use the metrics that the table lists in the [Available metrics](#available-metrics) section. For more information, see [Autoscaling online endpoints](how-to-autoscale-endpoints.md).
126125

127-
## Logs
126+
## Use logs
128127

129128
There are three logs that you can turn on for online endpoints:
130129

131130
* **AmlOnlineEndpointTrafficLog**: This traffic log provides a way for you to check the information of requests to the endpoint. This log is useful in the following cases:
132-
* A request response isn't 200, and you want more information. The `ResponseCodeReason` column in the log lists the reason. For a description of status codes and reasons, you can also see [HTTPS status codes](how-to-troubleshoot-online-endpoints.md#http-status-codes) in the article about troubleshooting online endpoints.
131+
* A request response isn't 200, and you want more information. The `ResponseCodeReason` column in the log lists the reason. For a description of status codes and reasons, see [HTTPS status codes](how-to-troubleshoot-online-endpoints.md#http-status-codes) in the article about troubleshooting online endpoints.
133132
* You want to look up the response code and response reason of your model for a request. The `ModelStatusCode` and `ModelStatusReason` columns provide this information.
134-
* You want to know the duration of a request. The logs provide a breakdown of the latency that shows the total duration, the request duration, the response duration, and the delay caused by network throttling.
133+
* You want to know the duration of a request. The logs provide a breakdown of the latency that shows the total duration, the request duration, the response duration, and the delay that's caused by network throttling.
135134
* You want to check how many recent requests succeeded and failed. The logs provide this information.
136135
* **AmlOnlineEndpointConsoleLog**: This log contains statements that the containers write as output to the console. This log is useful in the following cases:
137136
* A container fails to start. The console log can be useful for debugging.
138137
* You want to monitor container behavior and make sure that all requests are correctly handled.
139-
* You want to write request IDs in the console log. Joining the request ID, the AmlOnlineEndpointConsoleLog, and AmlOnlineEndpointTrafficLog in the Log Analytics workspace, you can trace a request from the network entry point of an online endpoint to the container.
140-
* You want to run a performance analysis. For instance, you want to determine the time the model needs to process each request.
138+
* You want to trace a request from the network entry point of an online endpoint to the container. You can use a Log Analytics query that joins the request ID with information from the AmlOnlineEndpointConsoleLog and AmlOnlineEndpointTrafficLog logs.
139+
* You want to run a performance analysis, for instance, to determine the time the model takes to process each request.
141140
* **AmlOnlineEndpointEventLog**: This log contains event information about the container life cycle. Currently, the log provides information about the following types of events:
142141

143142
| Name | Message |
@@ -156,10 +155,10 @@ There are three logs that you can turn on for online endpoints:
156155
| Killing | Stopping container inference-server |
157156
| Killing | Stopping container model-mount |
158157

159-
### Turn logs on or off
158+
### Turn on logs
160159

161160
> [!IMPORTANT]
162-
> Logging uses Azure Log Analytics. If you don't currently have a Log Analytics workspace, you can create one by using the steps in [Create a Log Analytics workspace in the Azure portal](/azure/azure-monitor/logs/quick-create-workspace#create-a-workspace).
161+
> Logging uses Azure Log Analytics. If you don't currently have a Log Analytics workspace, you can create one by following the steps in [Create a Log Analytics workspace in the Azure portal](/azure/azure-monitor/logs/quick-create-workspace#create-a-workspace).
163162
164163
1. In the [Azure portal](https://portal.azure.com), go to the resource group that contains your endpoint, and then select the endpoint.
165164

@@ -175,7 +174,7 @@ There are three logs that you can turn on for online endpoints:
175174
1. Select **Save**.
176175

177176
> [!IMPORTANT]
178-
> It may take up to an hour for the connection to the Log Analytics workspace to be enabled. Wait an hour before continuing with the steps in the next section.
177+
> It may take up to an hour for the connection to the Log Analytics workspace to be available. Wait an hour before continuing with the steps in the next section.
179178
180179
### View logs
181180

@@ -185,7 +184,7 @@ There are three logs that you can turn on for online endpoints:
185184
- Go to the properties page for your online endpoint. Under **Monitoring**, select **Logs**.
186185
- Go to your Log Analytics workspace. On the left, select **Logs**.
187186

188-
1. Close the **Queries hub** window that automatically opens.
187+
1. Close the **Queries hub** window that opens by default.
189188

190189
1. Under **Other**, double-click **AmlOnlineEndpointConsoleLog**. If you don't see **AmlOnlineEndpointConsoleLog**, enter that value into the search field.
191190

@@ -205,19 +204,19 @@ Example queries are available for you to use. Take the following steps to view t
205204

206205
:::image type="content" source="./media/how-to-monitor-online-endpoints/example-queries.png" alt-text="Screenshot of the Queries tab of the Azure portal Logs page. Two example queries are visible, and the Queries tab and the search box are highlighted.":::
207206

208-
### Log column details
207+
### Log column details
209208

210209
The following tables provide detailed information about the data that's stored in each log:
211210

212-
**AmlOnlineEndpointTrafficLog**
211+
#### AmlOnlineEndpointTrafficLog
213212

214213
[!INCLUDE [endpoint-monitor-traffic-reference](includes/endpoint-monitor-traffic-reference.md)]
215214

216-
**AmlOnlineEndpointConsoleLog**
215+
#### AmlOnlineEndpointConsoleLog
217216

218217
[!INCLUDE [endpoint-monitor-console-reference](includes/endpoint-monitor-console-reference.md)]
219218

220-
**AmlOnlineEndpointEventLog**
219+
#### AmlOnlineEndpointEventLog
221220

222221
[!INCLUDE [endpoint-monitor-event-reference](includes/endpoint-monitor-event-reference.md)]
223222

@@ -237,5 +236,5 @@ When you turn on Application Insights, you can see high-level activity monitor g
237236

238237
## Related content
239238

240-
* Learn how to [view costs for your deployed endpoint](./how-to-view-online-endpoints-costs.md).
241-
* Read more about [metrics explorer](/azure/azure-monitor/essentials/metrics-charts).
239+
* [View costs for an Azure Machine Learning managed online endpoint](how-to-view-online-endpoints-costs.md).
240+
* [Analyze metrics with Azure Monitor metrics explorer](/azure/azure-monitor/essentials/analyze-metrics).

articles/machine-learning/includes/endpoint-monitor-console-reference.md

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -2,15 +2,15 @@
22
author: Blackmist
33
ms.service: azure-machine-learning
44
ms.topic: include
5-
ms.date: 04/22/2023
5+
ms.date: 12/30/2024
66
ms.author: larryfr
77
---
88

99
| Property | Description |
1010
|:--- |:--- |
11-
| TimeGenerated | The timestamp (UTC) of when the log was generated.
12-
| OperationName | The operation associated with log record.
13-
| InstanceId | The ID of the instance that generated this log record.
14-
| DeploymentName | The name of the deployment associated with the log record.
15-
| ContainerName | The name of the container where the log was generated.
16-
| Message | The content of the log.
11+
| TimeGenerated | The UTC time stamp of the time the log is generated |
12+
| OperationName | The operation associated with the log record |
13+
| InstanceId | The ID of the instance that generates the log record |
14+
| DeploymentName | The name of the deployment associated with the log record |
15+
| ContainerName | The name of the container where the log is generated |
16+
| Message | The content of the log |

articles/machine-learning/includes/endpoint-monitor-event-reference.md

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -2,15 +2,15 @@
22
author: Blackmist
33
ms.service: azure-machine-learning
44
ms.topic: include
5-
ms.date: 04/22/2023
5+
ms.date: 12/30/2024
66
ms.author: larryfr
77
---
88

99
| Property | Description |
1010
|:--- |:--- |
11-
| TimeGenerated | The timestamp (UTC) of when the log was generated.
12-
| OperationName | The operation associated with log record.
13-
| InstanceId | The ID of the instance that generated this log record.
14-
| DeploymentName | The name of the deployment associated with the log record.
15-
| Name | The name of the event.
16-
| Message | The content of the event.
11+
| TimeGenerated | The UTC time stamp of the time the log is generated |
12+
| OperationName | The operation associated with the log record |
13+
| InstanceId | The ID of the instance that generates the log record |
14+
| DeploymentName | The name of the deployment associated with the log record |
15+
| Name | The name of the event |
16+
| Message | The content of the event |

articles/machine-learning/includes/endpoint-monitor-traffic-reference.md

Lines changed: 23 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -2,31 +2,31 @@
22
author: Blackmist
33
ms.service: azure-machine-learning
44
ms.topic: include
5-
ms.date: 04/22/2023
5+
ms.date: 12/30/2024
66
ms.author: larryfr
77
---
88

99
| Property | Description |
1010
|:--- |:--- |
11-
| Method | The requested method from client.
12-
| Path | The requested path from client.
13-
| SubscriptionId | The machine learning subscription ID of the online endpoint.
14-
| AzureMLWorkspaceId | The machine learning workspace ID of the online endpoint.
15-
| AzureMLWorkspaceName | The machine learning workspace name of the online endpoint.
16-
| EndpointName | The name of the online endpoint.
17-
| DeploymentName | The name of the online deployment.
18-
| Protocol | The protocol of the request.
19-
| ResponseCode | The final response code returned to the client.
20-
| ResponseCodeReason | The final response code reason returned to the client.
21-
| ModelStatusCode | The response status code from model.
22-
| ModelStatusReason | The response status reason from model.
23-
| RequestPayloadSize | The total bytes received from the client.
24-
| ResponsePayloadSize | The total bytes sent back to the client.
25-
| UserAgent | The user-agent header of the request, including comments but truncated to a max of 70 characters.
26-
| XRequestId | The request ID generated by Azure Machine Learning for internal tracing.
27-
| XMSClientRequestId | The tracking ID generated by the client.
28-
| TotalDurationMs | Duration in milliseconds from the request start time to the last response byte sent back to the client. If the client disconnected, it measures from the start time to client disconnect time.
29-
| RequestDurationMs | Duration in milliseconds from the request start time to the last byte of the request received from the client.
30-
| ResponseDurationMs | Duration in milliseconds from the request start time to the first response byte read from the model.
31-
| RequestThrottlingDelayMs | Delay in milliseconds in request data transfer due to network throttling.
32-
| ResponseThrottlingDelayMs | Delay in milliseconds in response data transfer due to network throttling.
11+
| Method | The method that the client requests. |
12+
| Path | The path that the client requests. |
13+
| SubscriptionId | The machine learning subscription ID of the online endpoint. |
14+
| AzureMLWorkspaceId | The machine learning workspace ID of the online endpoint. |
15+
| AzureMLWorkspaceName | The machine learning workspace name of the online endpoint. |
16+
| EndpointName | The name of the online endpoint. |
17+
| DeploymentName | The name of the online deployment. |
18+
| Protocol | The protocol of the request. |
19+
| ResponseCode | The final response code that's returned to the client. |
20+
| ResponseCodeReason | The final response code reason that's returned to the client. |
21+
| ModelStatusCode | The response status code from the model. |
22+
| ModelStatusReason | The response status reason from the model. |
23+
| RequestPayloadSize | The total bytes received from the client. |
24+
| ResponsePayloadSize | The total bytes sent back to the client. |
25+
| UserAgent | The user-agent header of the request, including comments but truncated to a maximum of 70 characters. |
26+
| XRequestId | The request ID that Azure Machine Learning generates for internal tracing. |
27+
| XMSClientRequestId | The tracking ID that the client generates. |
28+
| TotalDurationMs | The duration in milliseconds from the request start time to the time the last response byte is sent back to the client. If the client disconnects, the duration is taken from the start time to the client disconnect time. |
29+
| RequestDurationMs | The duration in milliseconds from the request start time to the time the last byte of the request is received from the client. |
30+
| ResponseDurationMs | The duration in milliseconds from the request start time to the time the first response byte is read from the model. |
31+
| RequestThrottlingDelayMs | The delay in milliseconds in the request data transfer due to network throttling. |
32+
| ResponseThrottlingDelayMs | The delay in milliseconds in the response data transfer due to network throttling. |
0 Bytes
Loading

articles/machine-learning/toc.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -949,7 +949,7 @@ items:
949949
href: reference-managed-online-endpoints-vm-sku-list.md
950950
- name: Viewing managed online endpoint costs
951951
href: how-to-view-online-endpoints-costs.md
952-
- name: Monitoring online endpoints
952+
- name: Monitor online endpoints
953953
href: how-to-monitor-online-endpoints.md
954954
- name: Debug online endpoints locally VS Code
955955
href: how-to-debug-managed-online-endpoints-visual-studio-code.md

0 commit comments

Comments
 (0)