Skip to content

Commit 1a3fd2b

Browse files
Merge pull request #233560 from mohitp930/mp452023-freshness-pass-79612
Freshness Pass for User Story: 79612
2 parents 93df5f0 + ba3123d commit 1a3fd2b

File tree

6 files changed

+50
-42
lines changed

6 files changed

+50
-42
lines changed

articles/batch/batch-diagnostics.md

Lines changed: 50 additions & 42 deletions
Original file line numberDiff line numberDiff line change
@@ -1,75 +1,79 @@
11
---
22
title: Metrics, alerts, and diagnostic logs
3-
description: Record and analyze diagnostic log events for Azure Batch account resources like pools and tasks.
3+
description: Learn how to record and analyze diagnostic log events for Azure Batch account resources like pools and tasks.
44
ms.topic: how-to
5-
ms.date: 04/13/2021
5+
ms.date: 04/05/2023
66
ms.custom: seodec18
77

88
---
99
# Batch metrics, alerts, and logs for diagnostic evaluation and monitoring
1010

1111
Azure Monitor collects [metrics](../azure-monitor/essentials/data-platform-metrics.md) and [diagnostic logs](../azure-monitor/essentials/platform-logs-overview.md) for resources in your Azure Batch account.
1212

13-
You can collect and consume this data in a variety of ways to monitor your Batch account and diagnose issues. You can also configure [metric alerts](../azure-monitor/alerts/alerts-overview.md) so you receive notifications when a metric reaches a specified value.
13+
You can collect and consume this data in various ways to monitor your Batch account and diagnose issues. You can also configure [metric alerts](../azure-monitor/alerts/alerts-overview.md) so you receive notifications when a metric reaches a specified value.
1414

1515
## Batch metrics
1616

17-
[Metrics](../azure-monitor/essentials/data-platform-metrics.md) are Azure telemetry data (also called performance counters) that are emitted by your Azure resources and consumed by the Azure Monitor service. Examples of metrics in a Batch account are Pool Create Events, Low-Priority Node Count, and Task Complete Events. These metrics can help identify trends and can be used for data analysis.
17+
[Metrics](../azure-monitor/essentials/data-platform-metrics.md) are Azure data (also called performance counters) that your Azure resources emit, and the Azure Monitor service consumes that data. Examples of metrics in a Batch account are Pool Create Events, Low-Priority Node Count, and Task Complete Events. These metrics can help identify trends and can be used for data analysis.
1818

1919
See the [list of supported Batch metrics](../azure-monitor/essentials/metrics-supported.md#microsoftbatchbatchaccounts).
2020

2121
Metrics are:
2222

23-
- Enabled by default in each Batch account without additional configuration
24-
- Generated every 1 minute
25-
- Not persisted automatically, but have a 30-day rolling history. You can persist activity metrics as part of diagnostic logging.
23+
- Enabled by default in each Batch account without extra configuration.
24+
- Generated every 1 minute.
25+
- Not persisted automatically, but they have a 30-day rolling history. You can persist activity metrics as part of diagnostic logging.
2626

2727
## View Batch metrics
2828

29-
In the Azure portal, the **Overview** page for the Batch account will show key node, core, and task metrics by default.
29+
In the Azure portal, the **Overview** page for the Batch account shows key node, core, and task metrics by default.
3030

31-
To view additional metrics for a Batch account:
31+
To view other metrics for a Batch account:
3232

33-
1. In the Azure portal, select **All services** > **Batch accounts**, and then select the name of your Batch account.
34-
1. Under **Monitoring**, select **Metrics**.
33+
1. In the Azure portal, search and select **Batch accounts**, and then select the name of your Batch account.
34+
1. Under **Monitoring** in the left side navigation menu, select **Metrics**.
3535
1. Select **Add metric** and then choose a metric from the dropdown list.
36-
1. Select an **Aggregation** option for the metric. For count-based metrics (like "Dedicated Core Count" or "Low-Priority Node Count"), use the **Avg** aggregation. For event-based metrics (like "Pool Resize Complete Events"), use the **Count**" aggregation. Avoid using the **Sum** aggregation, which adds up the values of all data points received over the period of the chart.
37-
1. To add additional metrics, repeat steps 3 and 4.
36+
1. Select an **Aggregation** option for the metric. For count-based metrics (like "Dedicated Core Count" or "Low-Priority Node Count"), use the **Avg** aggregation. For event-based metrics (like "Pool Resize Complete Events"), use the **Count** aggregation. Avoid using the **Sum** aggregation, which adds up the values of all data points received over the period of the chart.
37+
1. To add other metrics, repeat steps 3 and 4.
38+
39+
:::image type="content" source="./media/batch-diagnostics/add-metric.png" alt-text="Screenshot of the metrics page of a batch account in the Azure portal. Metrics is highlighted in the left side navigation menu. The Metric and Aggregation options for a metric are highlighted as well.":::
40+
3841

3942
You can also retrieve metrics programmatically with the Azure Monitor APIs. For an example, see [Retrieve Azure Monitor metrics with .NET](/samples/azure-samples/monitor-dotnet-metrics-api/monitor-dotnet-metrics-api/).
4043

4144
> [!NOTE]
42-
> Metrics emitted in the last 3 minutes may still be aggregating, so values may be under-reported during this timeframe. Metric delivery is not guaranteed, and may be affected by out-of-order delivery, data loss, or duplication.
45+
> Metrics emitted in the last 3 minutes might still be aggregating, so values might be under-reported during this time frame. Metric delivery is not guaranteed and might be affected by out-of-order delivery, data loss, or duplication.
4346
4447
## Batch metric alerts
4548

46-
You can configure near real-time metric alerts that trigger when the value of a specified metric crosses a threshold that you assign. The alert generates a notification when the alert is "Activated" (when the threshold is crossed and the alert condition is met) as well as when it is "Resolved" (when the threshold is crossed again and the condition is no longer met).
49+
You can configure near real-time metric alerts that trigger when the value of a specified metric crosses a threshold that you assign. The alert generates a notification when the alert is *Activated* (when the threshold is crossed and the alert condition is met). The alert also generates an alert when it's *Resolved* (when the threshold is crossed again and the condition is no longer met).
4750

48-
Because metric delivery can be subject to inconsistencies such as out-of-order delivery, data loss, or duplication, we recommend avoiding alerts that trigger on a single data point. Instead, use thresholds to account for any inconsistencies such as out-of-order delivery, data loss, and duplication over a period of time.
51+
Because metric delivery can be subject to inconsistencies such as out-of-order delivery, data loss, or duplication, you should avoid alerts that trigger on a single data point. Instead, use thresholds to account for any inconsistencies such as out-of-order delivery, data loss, and duplication over a period of time.
4952

50-
For example, you might want to configure a metric alert when your low priority core count falls to a certain level, so you can adjust the composition of your pools. For best results, set a period of 10 or more minutes, where the alert will be triggered if the average low priority core count falls below the threshold value for the entire period. This allows time for metrics to aggregate so that you get more accurate results.
53+
For example, you might want to configure a metric alert when your low priority core count falls to a certain level. You could then use this alert to adjust the composition of your pools. For best results, set a period of 10 or more minutes where the alert will be triggered if the average low priority core count falls lower than the threshold value for the entire period. This time period allows for metrics to aggregate so that you get more accurate results.
5154

5255
To configure a metric alert in the Azure portal:
5356

54-
1. Select **All services** > **Batch accounts**, and then select the name of your Batch account.
55-
1. Under **Monitoring**, select **Alerts**, then select **New alert rule**.
56-
1. Select **Add condition**, then choose a metric.
57-
1. Select the desired values for **Chart period**, **Threshold**, **Operator**, and **Aggregation type**.
58-
1. Enter a **Threshold value** and select the **Unit** for the threshold. Then select **Done**.
59-
1. Add an [action group](../azure-monitor/alerts/action-groups.md) to the alert either by selecting an existing action group or creating a new action group.
60-
1. In the **Alert rule details** section, enter an **Alert rule name** and **Description**. If you want the alert to be enabled immediately, ensure that the **Enable alert rule upon creation** box is checked.
61-
1. Select **Create alert rule**.
57+
1. In the Azure portal, search and select **Batch accounts**, and then select the name of your Batch account.
58+
1. Under **Monitoring** in the left side navigation menu, select **Alerts**, and then select **Create** > **Alert Rule**.
59+
1. On the **Condition page**, select a **Signal** from the dropdown list.
60+
1. Enter the logic for your **Alert Rule** in the fields specific to the **Signal** you choose. The following screenshot shows the options for **Task Fail Events**.
61+
62+
:::image type="content" source="./media/batch-diagnostics/create-alert-rule.png" alt-text="Screenshot of the Conditions tab on the Create and alert rule page." lightbox="./media/batch-diagnostics/create-alert-rule-lightbox.png":::
6263

63-
For more information about creating metric alerts, see [Understand how metric alerts work in Azure Monitor](../azure-monitor/alerts/alerts-metric-overview.md) and [Create, view, and manage metric alerts using Azure Monitor](../azure-monitor/alerts/alerts-metric.md).
64+
1. Enter the name for your alert on the **Details** page.
65+
1. Then select **Review + create** > **Create**.
6466

65-
You can also configure a near real-time alert using the [Azure Monitor REST API](/rest/api/monitor/). For more information, see [Overview of alerts in Microsoft Azure](../azure-monitor/alerts/alerts-overview.md). To include job, task, or pool-specific information in your alerts, see [Azure Monitor log Alerts](../azure-monitor/alerts/alerts-log.md).
67+
For more information about creating metric alerts, see [Types of Azure Monitor alerts](../azure-monitor/alerts/alerts-metric-overview.md) and [Create a new alert rule](../azure-monitor/alerts/alerts-metric.md).
68+
69+
You can also configure a near real-time alert by using the [Azure Monitor REST API](/rest/api/monitor/). For more information, see [Overview of alerts in Microsoft Azure](../azure-monitor/alerts/alerts-overview.md). To include job, task, or pool-specific information in your alerts, see [Create a new alert rule](../azure-monitor/alerts/alerts-log.md).
6670

6771
## Batch diagnostics
6872

6973
[Diagnostic logs](../azure-monitor/essentials/platform-logs-overview.md) contain information emitted by Azure resources that describe the operation of each resource. For Batch, you can collect the following logs:
7074

7175
- **ServiceLog**: [events emitted by the Batch service](#service-log-events) during the lifetime of an individual resource such as a pool or task.
72-
- **AllMetrics**: Metrics at the Batch account level.
76+
- **AllMetrics**: metrics at the Batch account level.
7377

7478
You must explicitly enable diagnostic settings for each Batch account you want to monitor.
7579

@@ -79,25 +83,29 @@ A common scenario is to select an Azure Storage account as the log destination.
7983

8084
Alternately, you can:
8185

82-
- Stream Batch diagnostic log events to an [Azure Event Hub](../event-hubs/event-hubs-about.md). Event Hubs can ingest millions of events per second, which you can then transform and store using any real-time analytics provider.
86+
- Stream Batch diagnostic log events to [Azure Event Hubs](../event-hubs/event-hubs-about.md). Event Hubs can ingest millions of events per second, which you can then transform and store by using any real-time analytics provider.
8387
- Send diagnostic logs to [Azure Monitor logs](../azure-monitor/logs/log-query-overview.md), where you can analyze them or export them for analysis in Power BI or Excel.
8488

8589
> [!NOTE]
86-
> You may incur additional costs to store or process diagnostic log data with Azure services.
90+
> You might incur additional costs to store or process diagnostic log data with Azure services.
8791
8892
### Enable collection of Batch diagnostic logs
8993

90-
To create a new diagnostic setting in the Azure portal, follow the steps below.
94+
To create a new diagnostic setting in the Azure portal, use the following steps.
9195

92-
1. In the Azure portal, select **All services** > **Batch accounts**, and then select the name of your Batch account.
93-
2. Under **Monitoring**, select **Diagnostic settings**.
96+
1. In the Azure portal, search and select **Batch accounts**, and then select the name of your Batch account.
97+
2. Under **Monitoring** in the left side navigation menu, select **Diagnostic settings**.
9498
3. In **Diagnostic settings**, select **Add diagnostic setting**.
9599
4. Enter a name for the setting.
96-
5. Select a destination: **Send to Log Analytics**, **Archive to a storage account**, or **Stream to an event hub**. If you select a storage account, you can optionally select the number of days to retain data for each log. If you don't specify a number of days for retention, data is retained during the life of the storage account.
97-
6. Select **ServiceLog**, **AllMetrics**, or both.
100+
5. Select a destination: **Send to Log Analytics workspace**, **Archive to a storage account**, **Stream to an event hub**, or **Send to partner solution**. If you select a storage account, you can optionally select the number of days to retain data for each log. If you don't specify the number of days for retention, data is retained during the life of the storage account.
101+
6. Select any options in either the **Logs** or **Metrics** section.
98102
7. Select **Save** to create the diagnostic setting.
99103

100-
You can also enable log collection by [creating diagnostic settings in the Azure portal](../azure-monitor/essentials/diagnostic-settings.md), using a [Resource Manager template](../azure-monitor/essentials/resource-manager-diagnostic-settings.md), or using Azure PowerShell or the Azure CLI. For more information, see [Overview of Azure platform logs](../azure-monitor/essentials/platform-logs-overview.md).
104+
The following screenshot shows an example diagnostic setting called *My diagnostic setting*. It sends **allLogs** and **AllMetrics** to a Log Analytics workspace.
105+
106+
:::image type="content" source="./media/batch-diagnostics/configure-diagnostic-setting.png" alt-text="Screenshot of the Diagnostic setting page that shows an example." lightbox="./media/batch-diagnostics/configure-diagnostic-setting-lightbox.png":::
107+
108+
You can also enable log collection by [creating diagnostic settings in the Azure portal](../azure-monitor/essentials/diagnostic-settings.md) by using a [Resource Manager template](../azure-monitor/essentials/resource-manager-diagnostic-settings.md). You can also use Azure PowerShell or the Azure CLI. For more information, see [Overview of Azure platform logs](../azure-monitor/essentials/platform-logs-overview.md).
101109

102110
### Access diagnostics logs in storage
103111

@@ -121,17 +129,17 @@ BATCHACCOUNTS/MYBATCHACCOUNT/y=2018/m=03/d=05/h=22/m=00/PT1H.json
121129

122130
Each `PT1H.json` blob file contains JSON-formatted events that occurred within the hour specified in the blob URL (for example, `h=12`). During the present hour, events are appended to the `PT1H.json` file as they occur. The minute value (`m=00`) is always `00`, since diagnostic log events are broken into individual blobs per hour. (All times are in UTC.)
123131

124-
Below is an example of a `PoolResizeCompleteEvent` entry in a `PT1H.json` log file. It includes information about the current and target number of dedicated and low-priority nodes, as well as the start and end time of the operation:
132+
The following example shows a `PoolResizeCompleteEvent` entry in a `PT1H.json` log file. It includes information about the current and target number of dedicated and low-priority nodes, as well as the start and end time of the operation:
125133

126134
```json
127135
{ "Tenant": "65298bc2729a4c93b11c00ad7e660501", "time": "2019-08-22T20:59:13.5698778Z", "resourceId": "/SUBSCRIPTIONS/XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX/RESOURCEGROUPS/MYRESOURCEGROUP/PROVIDERS/MICROSOFT.BATCH/BATCHACCOUNTS/MYBATCHACCOUNT/", "category": "ServiceLog", "operationName": "PoolResizeCompleteEvent", "operationVersion": "2017-06-01", "properties": {"id":"MYPOOLID","nodeDeallocationOption":"Requeue","currentDedicatedNodes":10,"targetDedicatedNodes":100,"currentLowPriorityNodes":0,"targetLowPriorityNodes":0,"enableAutoScale":false,"isAutoPool":false,"startTime":"2019-08-22 20:50:59.522","endTime":"2019-08-22 20:59:12.489","resultCode":"Success","resultMessage":"The operation succeeded"}}
128136
```
129137

130-
To access the logs in your storage account programmatically, use the Storage APIs.
138+
To access the logs in your storage account programmatically, use the [Storage APIs](/rest/api/storageservices/).
131139

132140
### Service log events
133141

134-
Azure Batch service logs contain events emitted by the Batch service during the lifetime of an individual Batch resource, such as a pool or task. Each event emitted by Batch is logged in JSON format. For example, this is the body of a sample **pool create event**:
142+
Azure Batch service logs contain events emitted by the Batch service during the lifetime of an individual Batch resource, such as a pool or task. Each event emitted by Batch is logged in JSON format. The following example shows the body of a sample **pool create event**:
135143

136144
```json
137145
{
@@ -166,7 +174,7 @@ Azure Batch service logs contain events emitted by the Batch service during the
166174
}
167175
```
168176

169-
Service log events emitted by the Batch service include the following:
177+
The Batch Service emits the following log events:
170178

171179
- [Pool create](batch-pool-create-event.md)
172180
- [Pool delete start](batch-pool-delete-start-event.md)
@@ -181,5 +189,5 @@ Service log events emitted by the Batch service include the following:
181189

182190
## Next steps
183191

184-
- Learn about the [Batch APIs and tools](batch-apis-tools.md) available for building Batch solutions.
185-
- Learn more about [monitoring Batch solutions](monitoring-overview.md).
192+
- [Overview of Batch APIs and tools](batch-apis-tools.md)
193+
- [Monitor Batch solutions](monitoring-overview.md)
55.4 KB
Loading
60 KB
Loading
91.4 KB
Loading
62.4 KB
Loading
64 KB
Loading

0 commit comments

Comments
 (0)