You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/azure-monitor/best-practices-cost.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -116,7 +116,7 @@ Diagnostic settings don't allow granular filtering of resource logs. You might r
116
116
117
117
See the documentation for other services that store their data in a Log Analytics workspace for recommendations on optimizing their data usage:
118
118
119
-
-**Container insights**: [Understand monitoring costs for Container insights](containers/container-insights-cost.md#controlling-ingestion-to-reduce-cost)
119
+
-**Container insights**: [Understand monitoring costs for Container insights](containers/container-insights-cost.md#control-ingestion-to-reduce-cost)
120
120
-**Microsoft Sentinel**: [Reduce costs for Microsoft Sentinel](../sentinel/billing-reduce-costs.md)
121
121
-**Defender for Cloud**: [Setting the security event option at the workspace level](../defender-for-cloud/working-with-log-analytics-agent.md#data-collection-tier)
- To disable environment variable collection for a specific container, set the key/value `[log_collection_settings.env_var] enabled = true` to enable variable collection globally. Then follow the steps [here](container-insights-manage-agent.md#how-to-disable-environment-variable-collection-on-a-container) to complete configuration for the specific container.
58
+
- To disable environment variable collection for a specific container, set the key/value `[log_collection_settings.env_var] enabled = true` to enable variable collection globally. Then follow the steps [here](container-insights-manage-agent.md#disable-environment-variable-collection-on-a-container) to complete configuration for the specific container.
59
59
- To disable stderr log collection cluster-wide, configure the key/value by using the following example: `[log_collection_settings.stderr] enabled = false`.
Copy file name to clipboardExpand all lines: articles/azure-monitor/containers/container-insights-log-alerts.md
+30-33Lines changed: 30 additions & 33 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -7,45 +7,47 @@ ms.reviewer: viviandiec
7
7
8
8
---
9
9
10
-
# How to create log alerts from Container insights
10
+
# Create log alerts from Container insights
11
11
12
-
Container insights monitors the performance of container workloads that are deployed to managed or self-managed Kubernetes clusters. To alert on what matters, this article describes how to create log-based alerts for the following situations with AKS clusters:
12
+
Container insights monitors the performance of container workloads that are deployed to managed or self-managed Kubernetes clusters. To alert on what matters, this article describes how to create log-based alerts for the following situations with Azure Kubernetes Service (AKS) clusters:
13
13
14
14
- When CPU or memory utilization on cluster nodes exceeds a threshold
15
15
- When CPU or memory utilization on any container within a controller exceeds a threshold as compared to a limit that's set on the corresponding resource
16
-
-*NotReady* status node counts
17
-
-*Failed*, *Pending*, *Unknown*, *Running*, or *Succeeded* pod-phase counts
16
+
-`NotReady` status node counts
17
+
-`Failed`, `Pending`, `Unknown`, `Running`, or `Succeeded` pod-phase counts
18
18
- When free disk space on cluster nodes exceeds a threshold
19
19
20
-
To alert for high CPU or memory utilization, or low free disk space on cluster nodes, use the queries that are provided to create a metric alert or a metric measurement alert. While metric alerts have lower latency than log alerts, log alerts provide advanced querying and greater sophistication. Log alert queries compare a datetime to the present by using the *now* operator and going back one hour. (Container insights stores all dates in Coordinated Universal Time (UTC) format.)
20
+
To alert for high CPU or memory utilization, or low free disk space on cluster nodes, use the queries that are provided to create a metric alert or a metric measurement alert. Metric alerts have lower latency than log alerts, but log alerts provide advanced querying and greater sophistication. Log alert queries compare a datetime to the present by using the `now` operator and going back one hour. (Container insights stores all dates in Coordinated Universal Time [UTC] format.)
21
21
22
22
> [!IMPORTANT]
23
-
> Most alert rules have a cost that's dependent on the type of rule, how many dimensions it includes, and how frequently it's run. Refer to **Alert rules**in [Azure Monitor pricing](https://azure.microsoft.com/pricing/details/monitor/) before you create any alert rules.
23
+
> Most alert rules have a cost that's dependent on the type of rule, how many dimensions it includes, and how frequently it's run. Before you create alert rules, see the "Alert rules" section in [Azure Monitor pricing](https://azure.microsoft.com/pricing/details/monitor/).
24
24
25
-
If you're not familiar with Azure Monitor alerts, see [Overview of alerts in Microsoft Azure](../alerts/alerts-overview.md) before you start. To learn more about alerts that use log queries, see [Log alerts in Azure Monitor](../alerts/alerts-unified-log.md). For more about metric alerts, see [Metric alerts in Azure Monitor](../alerts/alerts-metric-overview.md).
25
+
If you aren't familiar with Azure Monitor alerts, see [Overview of alerts in Microsoft Azure](../alerts/alerts-overview.md) before you start. To learn more about alerts that use log queries, see [Log alerts in Azure Monitor](../alerts/alerts-unified-log.md). For more about metric alerts, see [Metric alerts in Azure Monitor](../alerts/alerts-metric-overview.md).
26
26
27
27
## Log query measurements
28
28
[Log alerts](../alerts/alerts-unified-log.md) can measure two different things, which can be used to monitor virtual machines in different scenarios:
29
29
30
-
-[Result count](../alerts/alerts-unified-log.md#result-count): Counts the number of rows returned by the query, and can be used to work with events such as Windows event logs, syslog, application exceptions.
31
-
-[Calculation of a value](../alerts/alerts-unified-log.md#calculation-of-a-value): Makes a calculation based on a numeric column, and can be used to include any number of resources. For example, CPU percentage.
32
-
### Targeting resources and dimensions
30
+
-[Result count](../alerts/alerts-unified-log.md#result-count): Counts the number of rows returned by the query and can be used to work with events such as Windows event logs, Syslog, and application exceptions.
31
+
-[Calculation of a value](../alerts/alerts-unified-log.md#calculation-of-a-value): Makes a calculation based on a numeric column and can be used to include any number of resources. An example is CPU percentage.
33
32
34
-
You can monitor multiple instances’ values with one rule using dimensions. For example, you would use dimensions if you want to monitor the CPU usage on multiple instances running your web site or app, and create an alert for CPU usage of over 80%.
33
+
### Target resources and dimensions
35
34
36
-
To create resource-centric alerts at scale for a subscription or resource group, you can **Split by dimensions**. When you want to monitor the same condition on multiple Azure resources, splitting by dimensions splits the alerts into separate alerts by grouping unique combinations using numerical or string columns. Splitting on Azure resource ID column makes the specified resource into the alert target.
35
+
You can use one rule to monitor the values of multiple instances by using dimensions. For example, you would use dimensions if you wanted to monitor the CPU usage on multiple instances running your website or app, and create an alert for CPU usage of over 80%.
37
36
38
-
You may also decide not to split when you want a condition on multiple resources in the scope. For example, if you want to create an alert if at least five machines in the resource group scope have CPU usage over 80%.
37
+
To create resource-centric alerts at scale for a subscription or resource group, you can *split by dimensions*. When you want to monitor the same condition on multiple Azure resources, splitting by dimensions splits the alerts into separate alerts by grouping unique combinations by using numerical or string columns. Splitting an Azure resource ID column makes the specified resource into the alert target.
39
38
40
-
:::image type="content" source="../vm/media/monitor-virtual-machines/log-alert-rule.png" alt-text="Screenshot of a new log alert rule with split by dimensions." lightbox="../vm/media/monitor-virtual-machines/log-alert-rule.png":::
39
+
You might also decide not to split when you want a condition on multiple resources in the scope. For example, you might want to create an alert if at least five machines in the resource group scope have CPU usage over 80%.
40
+
41
+
:::image type="content" source="../vm/media/monitor-virtual-machines/log-alert-rule.png" alt-text="Screenshot that shows a new log alert rule with split by dimensions." lightbox="../vm/media/monitor-virtual-machines/log-alert-rule.png":::
42
+
43
+
You might want to see a list of the alerts by affected computer. You can use a custom workbook that uses a custom [resource graph](../../governance/resource-graph/overview.md) to provide this view. Use the following query to display alerts, and use the data source **Azure Resource Graph** in the workbook.
41
44
42
-
You might want to see a list of the alerts by affected computer. You can use a custom workbook that uses a custom [Resource Graph](../../governance/resource-graph/overview.md) to provide this view. Use the following query to display alerts, and use the data source **Azure Resource Graph** in the workbook.
43
45
## Create a log query alert rule
44
-
[This example of a log query alert](../vm/monitor-virtual-machine-alerts.md#example-log-query-alert) provides a complete walkthrough of creating a log query alert rule. You can use these same processes to create alert rules for AKS clusters using queries similar to the ones in this article.
46
+
[This example of a log query alert](../vm/monitor-virtual-machine-alerts.md#example-log-query-alert) provides a complete walkthrough of creating a log query alert rule. You can use these same processes to create alert rules for AKS clusters by using queries similar to the ones in this article.
45
47
46
-
## Resource utilization
48
+
## Resource utilization
47
49
48
-
**Average CPU utilization as an average of member nodes' CPU utilization every minute (metric measurement)**
50
+
Average CPU utilization as an average of member nodes' CPU utilization every minute (metric measurement):
49
51
50
52
```kusto
51
53
let endDateTime = now();
@@ -80,7 +82,7 @@ KubeNodeInventory
80
82
| summarize AggregatedValue = avg(UsagePercent) by bin(TimeGenerated, trendBinSize), ClusterName
81
83
```
82
84
83
-
**Average memory utilization as an average of member nodes' memory utilization every minute (metric measurement)**
85
+
Average memory utilization as an average of member nodes' memory utilization every minute (metric measurement):
84
86
85
87
```kusto
86
88
let endDateTime = now();
@@ -115,11 +117,10 @@ KubeNodeInventory
115
117
| summarize AggregatedValue = avg(UsagePercent) by bin(TimeGenerated, trendBinSize), ClusterName
116
118
```
117
119
118
-
119
120
>[!IMPORTANT]
120
121
>The following queries use the placeholder values \<your-cluster-name> and \<your-controller-name> to represent your cluster and controller. Replace them with values specific to your environment when you set up alerts.
121
122
122
-
**Average CPU utilization of all containers in a controller as an average of CPU utilization of every container instance in a controller every minute (metric measurement)**
123
+
Average CPU utilization of all containers in a controller as an average of CPU utilization of every container instance in a controller every minute (metric measurement):
123
124
124
125
```kusto
125
126
let endDateTime = now();
@@ -159,7 +160,7 @@ KubePodInventory
159
160
| summarize AggregatedValue = avg(UsagePercent) by bin(TimeGenerated, trendBinSize) , ContainerName
160
161
```
161
162
162
-
**Average memory utilization of all containers in a controller as an average of memory utilization of every container instance in a controller every minute (metric measurement)**
163
+
Average memory utilization of all containers in a controller as an average of memory utilization of every container instance in a controller every minute (metric measurement):
163
164
164
165
```kusto
165
166
let endDateTime = now();
@@ -199,9 +200,9 @@ KubePodInventory
199
200
| summarize AggregatedValue = avg(UsagePercent) by bin(TimeGenerated, trendBinSize) , ContainerName
200
201
```
201
202
202
-
## Resource availability
203
+
## Resource availability
203
204
204
-
**Nodes and counts that have a status of Ready and NotReady (metric measurement)**
205
+
Nodes and counts that have a status of Ready and NotReady (metric measurement):
| order by ClusterName asc, Computer asc, TimeGenerated desc
230
231
```
231
-
The following query returns pod phase counts based on all phases: *Failed*, *Pending*, *Unknown*, *Running*, or *Succeeded*.
232
+
233
+
The following query returns pod phase counts based on all phases: `Failed`, `Pending`, `Unknown`, `Running`, or `Succeeded`.
232
234
233
235
```kusto
234
236
let endDateTime = now();
@@ -265,7 +267,7 @@ KubePodInventory
265
267
```
266
268
267
269
>[!NOTE]
268
-
>To alert on certain pod phases, such as *Pending*, *Failed*, or *Unknown*, modify the last line of the query. For example, to alert on *FailedCount* use: <br/>`| summarize AggregatedValue = avg(FailedCount) by bin(TimeGenerated, trendBinSize)`
270
+
>To alert on certain pod phases, such as `Pending`, `Failed`, or `Unknown`, modify the last line of the query. For example, to alert on `FailedCount`, use`| summarize AggregatedValue = avg(FailedCount) by bin(TimeGenerated, trendBinSize)`.
269
271
270
272
The following query returns cluster nodes disks that exceed 90% free space used. To get the cluster ID, first run the following query and copy the value from the `ClusterId` property:
271
273
@@ -294,12 +296,8 @@ InsightsMetrics
294
296
| where AggregatedValue >= 90
295
297
```
296
298
299
+
Individual container restarts (number of results) alert when the individual system container restart count exceeds a threshold for the last 10 minutes:
297
300
298
-
299
-
**Individual container restarts (number of results)**<br>
300
-
Alerts when the individual system container restart count exceeds a threshold for last 10 minutes.
301
-
302
-
303
301
```kusto
304
302
let _threshold = 10m;
305
303
let _alertThreshold = 2;
@@ -317,6 +315,5 @@ KubePodInventory
317
315
318
316
## Next steps
319
317
320
-
- View [log query examples](container-insights-log-query.md) to see pre-defined queries and examples to evaluate or customize for alerting, visualizing, or analyzing your clusters.
321
-
318
+
- View [log query examples](container-insights-log-query.md) to see predefined queries and examples to evaluate or customize for alerting, visualizing, or analyzing your clusters.
322
319
- To learn more about Azure Monitor and how to monitor other aspects of your Kubernetes cluster, see [View Kubernetes cluster performance](container-insights-analyze.md) and [View Kubernetes cluster health](./container-insights-overview.md).
0 commit comments