Skip to content

Commit 78bb1c7

Browse files
committed
Curate for Azure Monitor duplication
1 parent e26c847 commit 78bb1c7

File tree

5 files changed

+12
-92
lines changed

5 files changed

+12
-92
lines changed

articles/machine-learning/concept-endpoints-online.md

Lines changed: 0 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -396,12 +396,6 @@ For more information, see [Network isolation with managed online endpoints](conc
396396

397397
Monitoring for Azure Machine Learning endpoints is possible via integration with [Azure Monitor](monitor-azure-machine-learning.md#what-is-azure-monitor). This integration allows you to view metrics in charts, configure alerts, query from log tables, use Application Insights to analyze events from user containers, and so on.
398398

399-
* **Metrics**: Use Azure Monitor to track various endpoint metrics, such as request latency, and drill down to deployment or status level. You can also track deployment-level metrics, such as CPU/GPU utilization and drill down to instance level. Azure Monitor allows you to track these metrics in charts and set up dashboards and alerts for further analysis.
400-
401-
* **Logs**: Send metrics to the Log Analytics Workspace where you can query logs using the Kusto query syntax. You can also send metrics to Storage Account and/or Event Hubs for further processing. In addition, you can use dedicated Log tables for online endpoint related events, traffic, and container logs. Kusto query allows complex analysis joining multiple tables.
402-
403-
* **Application insights**: Curated environments include the integration with Application Insights, and you can enable/disable it when you create an online deployment. Built-in metrics and logs are sent to Application insights, and you can use its built-in features such as Live metrics, Transaction search, Failures, and Performance for further analysis.
404-
405399
For more information on monitoring, see [Monitor online endpoints](how-to-monitor-online-endpoints.md).
406400

407401
### Secret injection in online deployments (preview)

articles/machine-learning/how-to-monitor-online-endpoints.md

Lines changed: 3 additions & 62 deletions
Original file line numberDiff line numberDiff line change
@@ -67,78 +67,19 @@ Depending on the resource that you select, the metrics that you see will be diff
6767

6868
#### Metrics at endpoint scope
6969

70-
- __Traffic__
71-
72-
| Metric ID | Unit | Description | Aggregate Method | Splittable By | Example Metric Alerts |
73-
| ---- | --- | --- | --- | --- | --- |
74-
| RequestsPerMinute | Count | The number of requests sent to Endpoint within a minute | Average | Deployment, ModelStatusCode, StatusCode, StatusCodeClass | Alert me when I have <= 0 transactions in the system |
75-
| RequestLatency | Milliseconds | The complete interval of time taken for a request to be responded | Average | Deployment | Alert me when average latency > 2 sec |
76-
| RequestLatency_P50 | Milliseconds | The request latency at the 50th percentile aggregated by all request latency values collected over a period of 60 seconds | Average | Deployment | Alert me when average latency > 2 sec |
77-
| RequestLatency_P90 | Milliseconds | The request latency at the 90th percentile aggregated by all request latency values collected over a period of 60 seconds | Average | Deployment | Alert me when average latency > 2 sec |
78-
| RequestLatency_P95 | Milliseconds | The request latency at the 95th percentile aggregated by all request latency values collected over a period of 60 seconds | Average | Deployment | Alert me when average latency > 2 sec |
79-
| RequestLatency_P99 | Milliseconds | The request latency at the 99th percentile aggregated by all request latency values collected over a period of 60 seconds | Average | Deployment | Alert me when average latency > 2 sec |
80-
81-
- __Network__
82-
83-
| Metric ID | Unit | Description | Aggregate Method | Splittable By | Example Metric Alerts |
84-
| ---- | --- | --- | --- | --- | --- |
85-
| NetworkBytes | Bytes per second | The bytes per second served for the endpoint | Average | - | - |
86-
| ConnectionsActive | Count | The total number of concurrent TCP connections active from clients | Average | - | - |
87-
| NewConnectionsPerSecond | Count | The average number of new TCP connections per second established from clients | Average | - | - |
88-
89-
- __Model Data Collection__
90-
91-
| Metric ID | Unit | Description | Aggregate Method | Splittable By | Example Metric Alerts |
92-
| ---- | --- | --- | --- | --- | --- |
93-
| DataCollectionEventsPerMinute | Count | The number of data collection events processed per minute | Average | Deployment, Type | - |
94-
| DataCollectionErrorsPerMinute | Count | The number of data collection events dropped per minute | Average | Deployment, Type, Reason | - |
95-
96-
For example, you can split along the deployment dimension to compare the request latency of different deployments under an endpoint.
70+
[!INCLUDE [Microsoft.MachineLearningServices/workspaces](~/reusable-content/ce-skilling/azure/includes/azure-monitor/reference/metrics/microsoft-machinelearningservices-workspaces-onlineendpoints-metrics-include.md)]
9771

9872
**Bandwidth throttling**
9973

10074
Bandwidth will be throttled if the quota limits are exceeded for _managed_ online endpoints. For more information on limits, see the article on [limits for online endpoints](how-to-manage-quotas.md#azure-machine-learning-online-endpoints-and-batch-endpoints). To determine if requests are throttled:
10175
- Monitor the "Network bytes" metric
10276
- The response trailers will have the fields: `ms-azureml-bandwidth-request-delay-ms` and `ms-azureml-bandwidth-response-delay-ms`. The values of the fields are the delays, in milliseconds, of the bandwidth throttling.
77+
10378
For more information, see [Bandwidth limit issues](how-to-troubleshoot-online-endpoints.md#bandwidth-limit-issues).
10479

10580
#### Metrics at deployment scope
10681

107-
- __Saturation__
108-
109-
| Metric ID | Unit | Description | Aggregate Method | Splittable By | Example Metric Alerts |
110-
| ---- | --- | --- | --- | --- | --- |
111-
| CpuUtilizationPercentage | Percent | How much percentage of CPU was utilized | Minimun, Maximum, Average | InstanceId | Alert me when % Capacity Used > 75% |
112-
| CpuMemoryUtilizationPercentage | Percent | How much percent of Memory was utilized | Minimun, Maximum, Average | InstanceId | |
113-
| DiskUtilization | Percent | How much disk space was utilized | Minimun, Maximum, Average | InstanceId, Disk | |
114-
| GpuUtilizationPercentage | Percent | Percentage of GPU utilization on an instance - Utilization is reported at one minute intervals | Minimun, Maximum, Average | InstanceId | |
115-
| GpuMemoryUtilizationPercentage | Percent | Percentage of GPU memory utilization on an instance - Utilization is reported at one minute intervals | Minimun, Maximum, Average | InstanceId | |
116-
| GpuEnergyJoules | Joule | Interval energy in Joules on a GPU node - Energy is reported at one minute intervals | Minimun, Maximum, Average | InstanceId | |
117-
118-
- __Availability__
119-
120-
| Metric ID | Unit | Description | Aggregate Method | Splittable By | Example Metric Alerts |
121-
| ---- | --- | --- | --- | --- | --- |
122-
| DeploymentCapacity | Count | The number of instances in the deployment | Minimum, Maximum, Average | InstanceId, State | Alert me when the % Availability of my service drops below 100% |
123-
124-
- __Traffic__
125-
126-
| Metric ID | Unit | Description | Aggregate Method | Splittable By | Example Metric Alerts |
127-
| ---- | --- | --- | --- | --- | --- |
128-
| RequestsPerMinute | Count | The number of requests sent to online deployment within a minute | Average | StatusCode | Alert me when I have <= 0 transactions in the system |
129-
| RequestLatency_P50 | Milliseconds | The average P50 request latency aggregated by all request latency values collected over the selected time period | Average | - | Alert me when average latency > 2 sec |
130-
| RequestLatency_P90 | Milliseconds | The average P90 request latency aggregated by all request latency values collected over the selected time period | Average | - | Alert me when average latency > 2 sec |
131-
| RequestLatency_P95 | Milliseconds | The average P95 request latency aggregated by all request latency values collected over the selected time period | Average | - | Alert me when average latency > 2 sec |
132-
| RequestLatency_P99 | Milliseconds | The average P99 request latency aggregated by all request latency values collected over the selected time period | Average | - | Alert me when average latency > 2 sec |
133-
134-
- __Model Data Collection__
135-
136-
| Metric ID | Unit | Description | Aggregate Method | Splittable By | Example Metric Alerts |
137-
| ---- | --- | --- | --- | --- | --- |
138-
| DataCollectionEventsPerMinute | Count | The number of data collection events processed per minute | Average | InstanceId, Type | - |
139-
| DataCollectionErrorsPerMinute | Count | The number of data collection events dropped per minute | Average | InstanceId, Type, Reason | - |
140-
141-
For instance, you can compare CPU and/or memory utilization between difference instances for an online deployment.
82+
[!INCLUDE [Microsoft.MachineLearningServices/workspaces/onlineEndpoints/deployments](~/reusable-content/ce-skilling/azure/includes/azure-monitor/reference/metrics/microsoft-machinelearningservices-workspaces-onlineendpoints-deployments-metrics-include.md)]
14283

14384
### Create dashboards and alerts
14485

articles/machine-learning/how-to-track-monitor-analyze-runs.md

Lines changed: 2 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -112,24 +112,9 @@ To cancel a job in the studio:
112112

113113
## Monitor job status by email notification
114114

115-
1. In the [Azure portal](https://portal.azure.com), in the left navigation bar, select the **Monitor** tab.
115+
You can use diagnostic settings to trigger email notifications. To learn how to create diagnostic settings, see [Create diagnostic settings in Azure Monitor](/azure/azure-monitor/essentials/create-diagnostic-settings).
116116

117-
1. Select **Diagnostic settings**, then choose **+ Add diagnostic setting**.
118-
119-
:::image type="content" source="media/how-to-track-monitor-analyze-runs/diagnostic-setting.png" alt-text="Screenshot of diagnostic settings for email notification.":::
120-
121-
1. Under **Category details**, select **AmlRunStatusChangedEvent**. Under **Destination details**, select **Send to Log Analytics workspace** and specify the **Subscription** and **Log Analytics workspace**.
122-
123-
:::image type="content" source="media/how-to-track-monitor-analyze-runs/log-location.png" alt-text="Screenshot of where to save email notification.":::
124-
125-
> [!NOTE]
126-
> The **Azure Log Analytics Workspace** is a different type of Azure resource than the **Azure Machine Learning service workspace**. If there are no options in that list, you can [create a Log Analytics workspace](../azure-monitor/logs/quick-create-workspace.md).
127-
128-
1. In the **Logs** tab, select **New alert rule**.
129-
130-
:::image type="content" source="media/how-to-track-monitor-analyze-runs/new-alert-rule.png" alt-text="Screenshot of button to add new alert rule.":::
131-
132-
1. To learn how to create and manage log alerts using Azure Monitor, see [Create or edit a log search alert rule](../azure-monitor/alerts/alerts-log.md).
117+
To learn how to create and manage log alerts using Azure Monitor, see [Create or edit a log search alert rule](/azure/azure-monitor/alerts/alerts-create-log-alert-rule).
133118

134119
## Related content
135120

articles/machine-learning/monitor-azure-machine-learning-reference.md

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -25,19 +25,19 @@ The metrics categories are **Model**, **Quota**, **Resource**, **Run**, and **Tr
2525
The following table lists the metrics available for the Microsoft.MachineLearningServices/workspaces resource type.
2626

2727
[!INCLUDE [horz-monitor-ref-metrics-tableheader](~/reusable-content/ce-skilling/azure/includes/azure-monitor/horizontals/horz-monitor-ref-metrics-tableheader.md)]
28-
[!INCLUDE [Microsoft.MachineLearningServices/workspaces](~/azure-reference-other-repo/azure-monitor-ref/supported-metrics/includes/microsoft-machinelearningservices-workspaces-metrics-include.md)]
28+
[!INCLUDE [Microsoft.MachineLearningServices/workspaces](~/reusable-content/ce-skilling/azure/includes/azure-monitor/reference/metrics/microsoft-machinelearningservices-workspaces-metrics-include.md)]
2929

3030
### Supported metrics for Microsoft.MachineLearningServices/workspaces/onlineEndpoints
3131
The following table lists the metrics available for the Microsoft.MachineLearningServices/workspaces/onlineEndpoints resource type.
3232

3333
[!INCLUDE [horz-monitor-ref-metrics-tableheader](~/reusable-content/ce-skilling/azure/includes/azure-monitor/horizontals/horz-monitor-ref-metrics-tableheader.md)]
34-
[!INCLUDE [Microsoft.MachineLearningServices/workspaces/onlineEndpoints](~/azure-reference-other-repo/azure-monitor-ref/supported-metrics/includes/microsoft-machinelearningservices-workspaces-onlineendpoints-metrics-include.md)]
34+
[!INCLUDE [Microsoft.MachineLearningServices/workspaces](~/reusable-content/ce-skilling/azure/includes/azure-monitor/reference/metrics/microsoft-machinelearningservices-workspaces-onlineendpoints-metrics-include.md)]
3535

3636
### Supported metrics for Microsoft.MachineLearningServices/workspaces/onlineEndpoints/deployments
3737
The following table lists the metrics available for the Microsoft.MachineLearningServices/workspaces/onlineEndpoints/deployments resource type.
3838

3939
[!INCLUDE [horz-monitor-ref-metrics-tableheader](~/reusable-content/ce-skilling/azure/includes/azure-monitor/horizontals/horz-monitor-ref-metrics-tableheader.md)]
40-
[!INCLUDE [Microsoft.MachineLearningServices/workspaces/onlineEndpoints/deployments](~/azure-reference-other-repo/azure-monitor-ref/supported-metrics/includes/microsoft-machinelearningservices-workspaces-onlineendpoints-deployments-metrics-include.md)]
40+
[!INCLUDE [Microsoft.MachineLearningServices/workspaces/onlineEndpoints/deployments](~/reusable-content/ce-skilling/azure/includes/azure-monitor/reference/metrics/microsoft-machinelearningservices-workspaces-onlineendpoints-deployments-metrics-include.md)]
4141

4242
[!INCLUDE [horz-monitor-ref-metrics-dimensions-intro](~/reusable-content/ce-skilling/azure/includes/azure-monitor/horizontals/horz-monitor-ref-metrics-dimensions-intro.md)]
4343
[!INCLUDE [horz-monitor-ref-metrics-dimensions](~/reusable-content/ce-skilling/azure/includes/azure-monitor/horizontals/horz-monitor-ref-metrics-dimensions.md)]
@@ -67,13 +67,13 @@ The valid values for the RunType dimension are:
6767
[!INCLUDE [horz-monitor-ref-resource-logs](~/reusable-content/ce-skilling/azure/includes/azure-monitor/horizontals/horz-monitor-ref-resource-logs.md)]
6868

6969
### Supported resource logs for Microsoft.MachineLearningServices/registries
70-
[!INCLUDE [Microsoft.MachineLearningServices/registries](~/azure-reference-other-repo/azure-monitor-ref/supported-logs/includes/microsoft-machinelearningservices-registries-logs-include.md)]
70+
[!INCLUDE [Microsoft.MachineLearningServices/registries](~/reusable-content/ce-skilling/azure/includes/azure-monitor/reference/logs/microsoft-machinelearningservices-registries-logs-include.md)]
7171

7272
### Supported resource logs for Microsoft.MachineLearningServices/workspaces
73-
[!INCLUDE [Microsoft.MachineLearningServices/workspaces](~/azure-reference-other-repo/azure-monitor-ref/supported-logs/includes/microsoft-machinelearningservices-workspaces-logs-include.md)]
73+
[!INCLUDE [Microsoft.MachineLearningServices/workspaces](~/reusable-content/ce-skilling/azure/includes/azure-monitor/reference/logs/microsoft-machinelearningservices-workspaces-logs-include.md)]
7474

7575
### Supported resource logs for Microsoft.MachineLearningServices/workspaces/onlineEndpoints
76-
[!INCLUDE [Microsoft.MachineLearningServices/workspaces/onlineEndpoints](~/azure-reference-other-repo/azure-monitor-ref/supported-logs/includes/microsoft-machinelearningservices-workspaces-onlineendpoints-logs-include.md)]
76+
[!INCLUDE [Microsoft.MachineLearningServices/workspaces/onlineEndpoints](~/reusable-content/ce-skilling/azure/includes/azure-monitor/reference/logs/microsoft-machinelearningservices-workspaces-onlineendpoints-logs-include.md)]
7777

7878
[!INCLUDE [horz-monitor-ref-logs-tables](~/reusable-content/ce-skilling/azure/includes/azure-monitor/horizontals/horz-monitor-ref-logs-tables.md)]
7979
### Machine Learning

articles/machine-learning/toc.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1401,7 +1401,7 @@
14011401
- name: Manage and optimize cost
14021402
displayName: cost-management,cost-optimization
14031403
href: how-to-manage-optimize-cost.md
1404-
- name: Monitor
1404+
- name: Monitor Machine Learning
14051405
href: monitor-azure-machine-learning.md
14061406
- name: Secure code
14071407
displayName: security threat

0 commit comments

Comments
 (0)