You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/batch/batch-diagnostics.md
+50-42Lines changed: 50 additions & 42 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,75 +1,79 @@
1
1
---
2
2
title: Metrics, alerts, and diagnostic logs
3
-
description: Record and analyze diagnostic log events for Azure Batch account resources like pools and tasks.
3
+
description: Learn how to record and analyze diagnostic log events for Azure Batch account resources like pools and tasks.
4
4
ms.topic: how-to
5
-
ms.date: 04/13/2021
5
+
ms.date: 04/05/2023
6
6
ms.custom: seodec18
7
7
8
8
---
9
9
# Batch metrics, alerts, and logs for diagnostic evaluation and monitoring
10
10
11
11
Azure Monitor collects [metrics](../azure-monitor/essentials/data-platform-metrics.md) and [diagnostic logs](../azure-monitor/essentials/platform-logs-overview.md) for resources in your Azure Batch account.
12
12
13
-
You can collect and consume this data in a variety of ways to monitor your Batch account and diagnose issues. You can also configure [metric alerts](../azure-monitor/alerts/alerts-overview.md) so you receive notifications when a metric reaches a specified value.
13
+
You can collect and consume this data in various ways to monitor your Batch account and diagnose issues. You can also configure [metric alerts](../azure-monitor/alerts/alerts-overview.md) so you receive notifications when a metric reaches a specified value.
14
14
15
15
## Batch metrics
16
16
17
-
[Metrics](../azure-monitor/essentials/data-platform-metrics.md)are Azure telemetry data (also called performance counters) that are emitted by your Azure resources and consumed by the Azure Monitor service. Examples of metrics in a Batch account are Pool Create Events, Low-Priority Node Count, and Task Complete Events. These metrics can help identify trends and can be used for data analysis.
17
+
[Metrics](../azure-monitor/essentials/data-platform-metrics.md) are Azure data (also called performance counters) that your Azure resources emit, and the Azure Monitor service consumes that data. Examples of metrics in a Batch account are Pool Create Events, Low-Priority Node Count, and Task Complete Events. These metrics can help identify trends and can be used for data analysis.
18
18
19
19
See the [list of supported Batch metrics](../azure-monitor/essentials/metrics-supported.md#microsoftbatchbatchaccounts).
20
20
21
21
Metrics are:
22
22
23
-
- Enabled by default in each Batch account without additional configuration
24
-
- Generated every 1 minute
25
-
- Not persisted automatically, but have a 30-day rolling history. You can persist activity metrics as part of diagnostic logging.
23
+
- Enabled by default in each Batch account without extra configuration.
24
+
- Generated every 1 minute.
25
+
- Not persisted automatically, but they have a 30-day rolling history. You can persist activity metrics as part of diagnostic logging.
26
26
27
27
## View Batch metrics
28
28
29
-
In the Azure portal, the **Overview** page for the Batch account will show key node, core, and task metrics by default.
29
+
In the Azure portal, the **Overview** page for the Batch account shows key node, core, and task metrics by default.
30
30
31
-
To view additional metrics for a Batch account:
31
+
To view other metrics for a Batch account:
32
32
33
-
1. In the Azure portal, select **All services** >**Batch accounts**, and then select the name of your Batch account.
34
-
1. Under **Monitoring**, select **Metrics**.
33
+
1. In the Azure portal, search and select**Batch accounts**, and then select the name of your Batch account.
34
+
1. Under **Monitoring** in the left side navigation menu, select **Metrics**.
35
35
1. Select **Add metric** and then choose a metric from the dropdown list.
36
-
1. Select an **Aggregation** option for the metric. For count-based metrics (like "Dedicated Core Count" or "Low-Priority Node Count"), use the **Avg** aggregation. For event-based metrics (like "Pool Resize Complete Events"), use the **Count**" aggregation. Avoid using the **Sum** aggregation, which adds up the values of all data points received over the period of the chart.
37
-
1. To add additional metrics, repeat steps 3 and 4.
36
+
1. Select an **Aggregation** option for the metric. For count-based metrics (like "Dedicated Core Count" or "Low-Priority Node Count"), use the **Avg** aggregation. For event-based metrics (like "Pool Resize Complete Events"), use the **Count** aggregation. Avoid using the **Sum** aggregation, which adds up the values of all data points received over the period of the chart.
37
+
1. To add other metrics, repeat steps 3 and 4.
38
+
39
+
:::image type="content" source="./media/batch-diagnostics/add-metric.png" alt-text="Screenshot of the metrics page of a batch account in the Azure portal. Metrics is highlighted in the left side navigation menu. The Metric and Aggregation options for a metric are highlighted as well.":::
40
+
38
41
39
42
You can also retrieve metrics programmatically with the Azure Monitor APIs. For an example, see [Retrieve Azure Monitor metrics with .NET](/samples/azure-samples/monitor-dotnet-metrics-api/monitor-dotnet-metrics-api/).
40
43
41
44
> [!NOTE]
42
-
> Metrics emitted in the last 3 minutes may still be aggregating, so values may be under-reported during this timeframe. Metric delivery is not guaranteed, and may be affected by out-of-order delivery, data loss, or duplication.
45
+
> Metrics emitted in the last 3 minutes might still be aggregating, so values might be under-reported during this time frame. Metric delivery is not guaranteed and might be affected by out-of-order delivery, data loss, or duplication.
43
46
44
47
## Batch metric alerts
45
48
46
-
You can configure near real-time metric alerts that trigger when the value of a specified metric crosses a threshold that you assign. The alert generates a notification when the alert is "Activated" (when the threshold is crossed and the alert condition is met) as well as when it is "Resolved" (when the threshold is crossed again and the condition is no longer met).
49
+
You can configure near real-time metric alerts that trigger when the value of a specified metric crosses a threshold that you assign. The alert generates a notification when the alert is *Activated* (when the threshold is crossed and the alert condition is met). The alert also generates an alert when it's *Resolved* (when the threshold is crossed again and the condition is no longer met).
47
50
48
-
Because metric delivery can be subject to inconsistencies such as out-of-order delivery, data loss, or duplication, we recommend avoiding alerts that trigger on a single data point. Instead, use thresholds to account for any inconsistencies such as out-of-order delivery, data loss, and duplication over a period of time.
51
+
Because metric delivery can be subject to inconsistencies such as out-of-order delivery, data loss, or duplication, you should avoid alerts that trigger on a single data point. Instead, use thresholds to account for any inconsistencies such as out-of-order delivery, data loss, and duplication over a period of time.
49
52
50
-
For example, you might want to configure a metric alert when your low priority core count falls to a certain level, so you can adjust the composition of your pools. For best results, set a period of 10 or more minutes, where the alert will be triggered if the average low priority core count falls below the threshold value for the entire period. This allows time for metrics to aggregate so that you get more accurate results.
53
+
For example, you might want to configure a metric alert when your low priority core count falls to a certain level. You could then use this alert to adjust the composition of your pools. For best results, set a period of 10 or more minutes where the alert will be triggered if the average low priority core count falls lower than the threshold value for the entire period. This time period allows for metrics to aggregate so that you get more accurate results.
51
54
52
55
To configure a metric alert in the Azure portal:
53
56
54
-
1. Select **All services** > **Batch accounts**, and then select the name of your Batch account.
55
-
1. Under **Monitoring**, select **Alerts**, then select **New alert rule**.
56
-
1. Select **Add condition**, then choose a metric.
57
-
1. Select the desired values for **Chart period**, **Threshold**, **Operator**, and **Aggregation type**.
58
-
1. Enter a **Threshold value** and select the **Unit** for the threshold. Then select **Done**.
59
-
1. Add an [action group](../azure-monitor/alerts/action-groups.md) to the alert either by selecting an existing action group or creating a new action group.
60
-
1. In the **Alert rule details** section, enter an **Alert rule name** and **Description**. If you want the alert to be enabled immediately, ensure that the **Enable alert rule upon creation** box is checked.
61
-
1. Select **Create alert rule**.
57
+
1. In the Azure portal, search and select **Batch accounts**, and then select the name of your Batch account.
58
+
1. Under **Monitoring** in the left side navigation menu, select **Alerts**, and then select **Create** > **Alert Rule**.
59
+
1. On the **Condition page**, select a **Signal** from the dropdown list.
60
+
1. Enter the logic for your **Alert Rule** in the fields specific to the **Signal** you choose. The following screenshot shows the options for **Task Fail Events**.
61
+
62
+
:::image type="content" source="./media/batch-diagnostics/create-alert-rule.png" alt-text="Screenshot of the Conditions tab on the Create and alert rule page." lightbox="./media/batch-diagnostics/create-alert-rule-lightbox.png":::
62
63
63
-
For more information about creating metric alerts, see [Understand how metric alerts work in Azure Monitor](../azure-monitor/alerts/alerts-metric-overview.md) and [Create, view, and manage metric alerts using Azure Monitor](../azure-monitor/alerts/alerts-metric.md).
64
+
1. Enter the name for your alert on the **Details** page.
65
+
1. Then select **Review + create** > **Create**.
64
66
65
-
You can also configure a near real-time alert using the [Azure Monitor REST API](/rest/api/monitor/). For more information, see [Overview of alerts in Microsoft Azure](../azure-monitor/alerts/alerts-overview.md). To include job, task, or pool-specific information in your alerts, see [Azure Monitor log Alerts](../azure-monitor/alerts/alerts-log.md).
67
+
For more information about creating metric alerts, see [Types of Azure Monitor alerts](../azure-monitor/alerts/alerts-metric-overview.md) and [Create a new alert rule](../azure-monitor/alerts/alerts-metric.md).
68
+
69
+
You can also configure a near real-time alert by using the [Azure Monitor REST API](/rest/api/monitor/). For more information, see [Overview of alerts in Microsoft Azure](../azure-monitor/alerts/alerts-overview.md). To include job, task, or pool-specific information in your alerts, see [Create a new alert rule](../azure-monitor/alerts/alerts-log.md).
66
70
67
71
## Batch diagnostics
68
72
69
73
[Diagnostic logs](../azure-monitor/essentials/platform-logs-overview.md) contain information emitted by Azure resources that describe the operation of each resource. For Batch, you can collect the following logs:
70
74
71
75
-**ServiceLog**: [events emitted by the Batch service](#service-log-events) during the lifetime of an individual resource such as a pool or task.
72
-
-**AllMetrics**: Metrics at the Batch account level.
76
+
-**AllMetrics**: metrics at the Batch account level.
73
77
74
78
You must explicitly enable diagnostic settings for each Batch account you want to monitor.
75
79
@@ -79,25 +83,29 @@ A common scenario is to select an Azure Storage account as the log destination.
79
83
80
84
Alternately, you can:
81
85
82
-
- Stream Batch diagnostic log events to an [Azure Event Hub](../event-hubs/event-hubs-about.md). Event Hubs can ingest millions of events per second, which you can then transform and store using any real-time analytics provider.
86
+
- Stream Batch diagnostic log events to [Azure Event Hubs](../event-hubs/event-hubs-about.md). Event Hubs can ingest millions of events per second, which you can then transform and store by using any real-time analytics provider.
83
87
- Send diagnostic logs to [Azure Monitor logs](../azure-monitor/logs/log-query-overview.md), where you can analyze them or export them for analysis in Power BI or Excel.
84
88
85
89
> [!NOTE]
86
-
> You may incur additional costs to store or process diagnostic log data with Azure services.
90
+
> You might incur additional costs to store or process diagnostic log data with Azure services.
87
91
88
92
### Enable collection of Batch diagnostic logs
89
93
90
-
To create a new diagnostic setting in the Azure portal, follow the steps below.
94
+
To create a new diagnostic setting in the Azure portal, use the following steps.
91
95
92
-
1. In the Azure portal, select **All services** >**Batch accounts**, and then select the name of your Batch account.
93
-
2. Under **Monitoring**, select **Diagnostic settings**.
96
+
1. In the Azure portal, search and select**Batch accounts**, and then select the name of your Batch account.
97
+
2. Under **Monitoring** in the left side navigation menu, select **Diagnostic settings**.
94
98
3. In **Diagnostic settings**, select **Add diagnostic setting**.
95
99
4. Enter a name for the setting.
96
-
5. Select a destination: **Send to Log Analytics**, **Archive to a storage account**, or **Stream to an event hub**. If you select a storage account, you can optionally select the number of days to retain data for each log. If you don't specify a number of days for retention, data is retained during the life of the storage account.
97
-
6. Select **ServiceLog**, **AllMetrics**, or both.
100
+
5. Select a destination: **Send to Log Analytics workspace**, **Archive to a storage account**, **Stream to an event hub**, or **Send to partner solution**. If you select a storage account, you can optionally select the number of days to retain data for each log. If you don't specify the number of days for retention, data is retained during the life of the storage account.
101
+
6. Select any options in either the **Logs** or **Metrics** section.
98
102
7. Select **Save** to create the diagnostic setting.
99
103
100
-
You can also enable log collection by [creating diagnostic settings in the Azure portal](../azure-monitor/essentials/diagnostic-settings.md), using a [Resource Manager template](../azure-monitor/essentials/resource-manager-diagnostic-settings.md), or using Azure PowerShell or the Azure CLI. For more information, see [Overview of Azure platform logs](../azure-monitor/essentials/platform-logs-overview.md).
104
+
The following screenshot shows an example diagnostic setting called *My diagnostic setting*. It sends **allLogs** and **AllMetrics** to a Log Analytics workspace.
105
+
106
+
:::image type="content" source="./media/batch-diagnostics/configure-diagnostic-setting.png" alt-text="Screenshot of the Diagnostic setting page that shows an example." lightbox="./media/batch-diagnostics/configure-diagnostic-setting-lightbox.png":::
107
+
108
+
You can also enable log collection by [creating diagnostic settings in the Azure portal](../azure-monitor/essentials/diagnostic-settings.md) by using a [Resource Manager template](../azure-monitor/essentials/resource-manager-diagnostic-settings.md). You can also use Azure PowerShell or the Azure CLI. For more information, see [Overview of Azure platform logs](../azure-monitor/essentials/platform-logs-overview.md).
Each `PT1H.json` blob file contains JSON-formatted events that occurred within the hour specified in the blob URL (for example, `h=12`). During the present hour, events are appended to the `PT1H.json` file as they occur. The minute value (`m=00`) is always `00`, since diagnostic log events are broken into individual blobs per hour. (All times are in UTC.)
123
131
124
-
Below is an example of a `PoolResizeCompleteEvent` entry in a `PT1H.json` log file. It includes information about the current and target number of dedicated and low-priority nodes, as well as the start and end time of the operation:
132
+
The following example shows a `PoolResizeCompleteEvent` entry in a `PT1H.json` log file. It includes information about the current and target number of dedicated and low-priority nodes, as well as the start and end time of the operation:
To access the logs in your storage account programmatically, use the Storage APIs.
138
+
To access the logs in your storage account programmatically, use the [Storage APIs](/rest/api/storageservices/).
131
139
132
140
### Service log events
133
141
134
-
Azure Batch service logs contain events emitted by the Batch service during the lifetime of an individual Batch resource, such as a pool or task. Each event emitted by Batch is logged in JSON format. For example, this is the body of a sample **pool create event**:
142
+
Azure Batch service logs contain events emitted by the Batch service during the lifetime of an individual Batch resource, such as a pool or task. Each event emitted by Batch is logged in JSON format. The following example shows the body of a sample **pool create event**:
135
143
136
144
```json
137
145
{
@@ -166,7 +174,7 @@ Azure Batch service logs contain events emitted by the Batch service during the
166
174
}
167
175
```
168
176
169
-
Service log events emitted by the Batch service include the following:
0 commit comments