writing

Larry Franks · Larry Franks · commit 26fee0a2f4c1 · 2022-06-24T15:33:22.000-04:00
diff --git a/articles/machine-learning/how-to-deploy-managed-online-endpoints.md b/articles/machine-learning/how-to-deploy-managed-online-endpoints.md
@@ -320,32 +320,13 @@ Autoscale automatically runs the right amount of resources to handle the load on
 
 ### (Optional) Monitor SLA by using Azure Monitor
 
-To view metrics and set alerts based on your SLA, complete the steps that are described in [Monitor managed online endpoints](how-to-monitor-online-endpoints.md).
+To view metrics and set alerts based on your SLA, complete the steps that are described in [Monitor managed online endpoints](how-to-monitor-online-endpoints.md#monitor).
 
 ### (Optional) Integrate with Log Analytics
 
-The `get-logs` command provides only the last few hundred lines of logs from an automatically selected instance. However, Log Analytics provides a way to durably store and analyze logs. 
+The `get-logs` command provides only the last few hundred lines of logs from an automatically selected instance. However, Log Analytics provides a way to durably store and analyze logs. For more information on using logging, see [Monitor online endpoints](how-to-monitor-online-endpoints.md#logs)
 
-First, create a Log Analytics workspace by completing the steps in [Create a Log Analytics workspace in the Azure portal](../azure-monitor/logs/quick-create-workspace.md#create-a-workspace).
-
-Then, in the Azure portal:
-
-1. Go to the resource group.
-1. Select your endpoint.
-1. Select the **ARM resource page**.
-1. Select **Diagnostic settings**.
-1. Select **Add settings**.
-1. Select to enable sending console logs to the Log Analytics workspace.
-
-The logs might take up to an hour to connect. After an hour, send some scoring requests, and then check the logs by using the following steps:
-
-1. Open the Log Analytics workspace. 
-1. In the left menu, select **Logs**.
-1. Close the **Queries** dialog that automatically opens.
-1. Double-click **AmlOnlineEndpointConsoleLog**.
-1. Select **Run**.
-
-  [!INCLUDE [Email Notification Include](../../includes/machine-learning-email-notifications.md)]
+[!INCLUDE [Email Notification Include](../../includes/machine-learning-email-notifications.md)]
 
 ## Delete the endpoint and the deployment
 
diff --git a/articles/machine-learning/how-to-monitor-online-endpoints.md b/articles/machine-learning/how-to-monitor-online-endpoints.md
@@ -7,7 +7,7 @@ ms.service: machine-learning
 ms.author: larryfr
 author: blackmist
 ms.subservice: mlops
-ms.date: 06/01/2022
+ms.date: 06/24/2022
 ms.topic: conceptual
 ms.custom: how-to, devplatv2, event-tier1-build-2022
 ---
@@ -28,7 +28,7 @@ In this article you learn how to:
 - Deploy an Azure Machine Learning online endpoint.
 - You must have at least [Reader access](../role-based-access-control/role-assignments-portal.md) on the endpoint.
 
-## View metrics
+## Metrics
 
 Use the following steps to view metrics for a managed endpoint or deployment:
 1. Go to the [Azure portal](https://portal.azure.com).
@@ -38,11 +38,11 @@ Use the following steps to view metrics for a managed endpoint or deployment:
 
 1. In the left-hand column, select **Metrics**.
 
-## Available metrics
+### Available metrics
 
 Depending on the resource that you select, the metrics that you see will be different. Metrics are scoped differently for online endpoints and online deployments.
 
-### Metrics at endpoint scope
+#### Metrics at endpoint scope
 
 - Request Latency
 - Request Latency P50 (Request latency at the 50th percentile)
@@ -59,13 +59,13 @@ Split on the following dimensions:
 - Status Code
 - Status Code Class
 
-#### Bandwidth throttling
+**Bandwidth throttling**
 
 Bandwidth will be throttled if the limits are exceeded for _managed_ online endpoints (see managed online endpoints section in [Manage and increase quotas for resources with Azure Machine Learning](how-to-manage-quotas.md#azure-machine-learning-managed-online-endpoints)). To determine if requests are throttled:
 - Monitor the "Network bytes" metric
 - The response trailers will have the fields: `ms-azureml-bandwidth-request-delay-ms` and `ms-azureml-bandwidth-response-delay-ms`. The values of the fields are the delays, in milliseconds, of the bandwidth throttling.
 
-### Metrics at deployment scope
+#### Metrics at deployment scope
 
 - CPU Utilization Percentage
 - Deployment Capacity (the number of instances of the requested instance type)
@@ -78,11 +78,11 @@ Split on the following dimension:
 
 - InstanceId
 
-## Create a dashboard
+### Create a dashboard
 
 You can create custom dashboards to visualize data from multiple sources in the Azure portal, including the metrics for your online endpoint. For more information, see [Create custom KPI dashboards using Application Insights](../azure-monitor/app/tutorial-app-dashboards.md#add-custom-metric-chart).
     
-## Create an alert
+### Create an alert
 
 You can also create custom alerts to notify you of important status updates to your online endpoint:
 
@@ -97,6 +97,133 @@ You can also create custom alerts to notify you of important status updates to y
 1. Select **Add action groups** > **Create action groups** to specify what should happen when your alert is triggered.
 
 1. Choose **Create alert rule** to finish creating your alert.
+1. 
+## Logs
+
+There are three logs that can be enabled for online endpoints:
+
+* **AMLOnlineEndpointTrafficLog**: You could choose to enable traffic logs if you want to check the information of your request. Below are some cases: 
+
+    * If the response isn't 200, you could check the value of the column “ResponseCodeReason” to see what might happen. And check the reason following the below link https://www.envoyproxy.io/docs/envoy/latest/configuration/http/http_conn_man/response_code_details 
+
+    * You could check the response code and response reason of your model from the column “ModelStatusCode” and “ModelStatusReason”. 
+
+    * You want to check the duration of the request like total duration, the request/response duration, and the delay caused by the network throttling. You could check it from the logs to see the breakdown latency. 
+
+    * If you want to check how many requests or failed requests recently. You could also enable the logs. 
+
+* **AMLOnlineEndpointConsoleLog**: Contains logs that the user containers output to the console. Below are some cases: 
+
+    * If the user container fails to start, the console log may be useful for debugging. 
+
+    * Monitor user container behavior, and make sure that all requests are correctly handled. 
+
+    * Write request IDs in the console log. Joining the request ID, the AMLOnlineEndpointConsoleLog, and AMLOnlineEndpointTrafficLog in the Log Analytics workspace, you can trace a request from the network entry point of an online endpoint to the container.  
+
+    * Users may also use this log for performance analysis in determining the time required by the model to process each request. 
+
+* **AMLOnlineEndpointEventLog**: Contains event information regarding the user container’s life cycle. Currently, we provide information on the following types of events: 
+
+    | Name | Message |
+    | ----- | ----- | 
+    | BackOff | Back-off restarting failed container 
+    | Pulled | Container image "\<IMAGE\_NAME\>" already present on machine 
+    | Killing | Container inference-server failed liveness probe, will be restarted 
+    | Created | Created container image-fetcher 
+    | Created | Created container inference-server 
+    | Created | Created container model-mount 
+    | Unhealthy | Liveness probe failed: \<FAILURE\_CONTENT\> 
+    | Unhealthy | Readiness probe failed: \<FAILURE\_CONTENT\> 
+    | Started | Started container image-fetcher 
+    | Started | Started container inference-server 
+    | Started | Started container model-mount 
+    | Killing | Stopping container inference-server 
+    | Killing | Stopping container model-mount 
+
+### How to enable/disable logs
+
+> [!IMPORTANT]
+> Logging uses Azure Log Analytics. If you do not currently have a Log Analytics workspace, you can create one using the steps in [Create a Log Analytics workspace in the Azure portal](../azure-monitor/logs/quick-create-workspace.md#create-a-workspace).
+
+1. In the [Azure portal](https://portal.azure.com), go to the resource group that contains your endpoint and then select the endpoint.
+1. From the **Monitoring** section on the left of the page, select **Diagnostic settings** and then **Add settings**.
+1. Select the log categories to enable, select **Send to Log Analytics workspace**, and then select the Log Analytics workspace to use. Finally, enter a **Diagnostic setting name** and select **Save**.
+
+    :::image type="content" source="{source}" alt-text="{alt-text}":::
+
+    > [!IMPORTANT]
+    > It may take up to an hour for the connection to the Log Analytics workspace to be enabled. Wait an hour before continuing with the next steps.
+    
+1. Submit scoring requests to the endpoint. This activity should create entries in the logs.
+1. Open the Log Analytics workspace and select **Logs** from the left of the screen.
+1. Close the **Queries** dialog that automatically opens, and then double-click the **AmlOnlineEndpointConsoleLog**. If you don't see it, use the **Search** field.
+
+    :::image type="content" source="{source}" alt-text="{alt-text}":::
+
+1. Select **Run**.
+
+    :::image type="content" source="{source}" alt-text="{alt-text}":::
+
+### Example queries
+
+In Azure Log Analytics workspace, see the following example queries:
+
+* Online endpoint console logs
+* Online endpoint failed requests
+
+### Log column details 
+
+The following tables provide details on the data stored in each log:
+
+**AMLOnlineEndpointTrafficLog**
+
+| Field name | Description |
+| ---- | ---- |
+| Method | The requested method from client. 
+| Path | The requested path from client. 
+| SubscriptionId | The machine learning subscription ID of the online endpoint. 
+| WorkspaceId | The machine learning workspace ID of the online endpoint. 
+| EndpointName | The name of the online endpoint. 
+| DeploymentName | The name of the online deployment. 
+| Protocol | The protocol of the request. 
+| ResponseCode | The final response code returned to the user. 
+| ResponseCodeReason | The final response code reason returned to the user. 
+| ModelStatusCode | The response status code from model. 
+| ModelStatusReason | The response status reason from model. 
+| RequestPayloadSize | The total bytes received from the user client. 
+| ResponsePayloadSize | The total bytes sent back to the user client. 
+| UserAgent | The user-agent header of the request. 
+| XRequestId | The request ID generated by Azure Machine Learning for internal tracing. 
+| XMSClientRequestId | The tracking ID generated by user client. 
+| TotalDurationMs | Duration in milliseconds from the request start time to the last response byte sent back to the user client. If the user client disconnected, it measures from the start time to client disconnect time. 
+| RequestDurationMs | Duration in milliseconds from the request start time to the last byte of the request received from the user client. 
+| ResponseDurationMs | Duration in milliseconds from the request start time to the first response byte read from the model. 
+| RequestThrottlingDelayMs | Delay in milliseconds in request data transfer due to network throttling. 
+| ResponseThrottlingDelayMs | Delay in milliseconds in response data transfer due to network throttling. 
+
+**AMLOnlineEndpointConsoleLog**
+
+| Field Name | Description |
+| ----- | ----- |
+| TimeGenerated | The timestamp (UTC) of when the log was generated. 
+| OperationName | The operation associated with log record. 
+| InstanceId | The ID of the instance that generated this log record. 
+| DeploymentName | The name of the deployment associated with the log record. 
+| ContainerName | The name of the container where the log was generated. 
+| Message | The content of the log. 
+
+**AMLOnlineEndpointEventLog**
+
+
+| Field Name | Description |
+| ----- | ----- |
+| TimeGenerated | The timestamp (UTC) of when the log was generated. 
+| OperationName | The operation associated with log record. 
+| InstanceId | The ID of the instance that generated this log record. 
+| DeploymentName | The name of the deployment associated with the log record. 
+| Name | The name of the event. 
+| Message | The content of the event. 
+
 
 
 ## Next steps