Skip to content

Commit 6924eaf

Browse files
authored
Merge pull request #217970 from paulth1/containers-articles-batch-3
edit pass: Containers articles batch 3
2 parents 17c517b + 5635ede commit 6924eaf

8 files changed

+176
-185
lines changed

articles/azure-monitor/best-practices-cost.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -116,7 +116,7 @@ Diagnostic settings don't allow granular filtering of resource logs. You might r
116116

117117
See the documentation for other services that store their data in a Log Analytics workspace for recommendations on optimizing their data usage:
118118

119-
- **Container insights**: [Understand monitoring costs for Container insights](containers/container-insights-cost.md#controlling-ingestion-to-reduce-cost)
119+
- **Container insights**: [Understand monitoring costs for Container insights](containers/container-insights-cost.md#control-ingestion-to-reduce-cost)
120120
- **Microsoft Sentinel**: [Reduce costs for Microsoft Sentinel](../sentinel/billing-reduce-costs.md)
121121
- **Defender for Cloud**: [Setting the security event option at the workspace level](../defender-for-cloud/working-with-log-analytics-agent.md#data-collection-tier)
122122

articles/azure-monitor/containers/container-insights-agent-config.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -55,7 +55,7 @@ To configure and deploy your ConfigMap configuration file to your cluster:
5555

5656
- To exclude specific namespaces for stdout log collection, configure the key/value by using the following example:
5757
`[log_collection_settings.stdout] enabled = true exclude_namespaces = ["my-namespace-1", "my-namespace-2"]`.
58-
- To disable environment variable collection for a specific container, set the key/value `[log_collection_settings.env_var] enabled = true` to enable variable collection globally. Then follow the steps [here](container-insights-manage-agent.md#how-to-disable-environment-variable-collection-on-a-container) to complete configuration for the specific container.
58+
- To disable environment variable collection for a specific container, set the key/value `[log_collection_settings.env_var] enabled = true` to enable variable collection globally. Then follow the steps [here](container-insights-manage-agent.md#disable-environment-variable-collection-on-a-container) to complete configuration for the specific container.
5959
- To disable stderr log collection cluster-wide, configure the key/value by using the following example: `[log_collection_settings.stderr] enabled = false`.
6060

6161
Save your changes in the editor.

articles/azure-monitor/containers/container-insights-cost.md

Lines changed: 50 additions & 53 deletions
Large diffs are not rendered by default.

articles/azure-monitor/containers/container-insights-log-alerts.md

Lines changed: 30 additions & 33 deletions
Original file line numberDiff line numberDiff line change
@@ -7,45 +7,47 @@ ms.reviewer: viviandiec
77

88
---
99

10-
# How to create log alerts from Container insights
10+
# Create log alerts from Container insights
1111

12-
Container insights monitors the performance of container workloads that are deployed to managed or self-managed Kubernetes clusters. To alert on what matters, this article describes how to create log-based alerts for the following situations with AKS clusters:
12+
Container insights monitors the performance of container workloads that are deployed to managed or self-managed Kubernetes clusters. To alert on what matters, this article describes how to create log-based alerts for the following situations with Azure Kubernetes Service (AKS) clusters:
1313

1414
- When CPU or memory utilization on cluster nodes exceeds a threshold
1515
- When CPU or memory utilization on any container within a controller exceeds a threshold as compared to a limit that's set on the corresponding resource
16-
- *NotReady* status node counts
17-
- *Failed*, *Pending*, *Unknown*, *Running*, or *Succeeded* pod-phase counts
16+
- `NotReady` status node counts
17+
- `Failed`, `Pending`, `Unknown`, `Running`, or `Succeeded` pod-phase counts
1818
- When free disk space on cluster nodes exceeds a threshold
1919

20-
To alert for high CPU or memory utilization, or low free disk space on cluster nodes, use the queries that are provided to create a metric alert or a metric measurement alert. While metric alerts have lower latency than log alerts, log alerts provide advanced querying and greater sophistication. Log alert queries compare a datetime to the present by using the *now* operator and going back one hour. (Container insights stores all dates in Coordinated Universal Time (UTC) format.)
20+
To alert for high CPU or memory utilization, or low free disk space on cluster nodes, use the queries that are provided to create a metric alert or a metric measurement alert. Metric alerts have lower latency than log alerts, but log alerts provide advanced querying and greater sophistication. Log alert queries compare a datetime to the present by using the `now` operator and going back one hour. (Container insights stores all dates in Coordinated Universal Time [UTC] format.)
2121

2222
> [!IMPORTANT]
23-
> Most alert rules have a cost that's dependent on the type of rule, how many dimensions it includes, and how frequently it's run. Refer to **Alert rules** in [Azure Monitor pricing](https://azure.microsoft.com/pricing/details/monitor/) before you create any alert rules.
23+
> Most alert rules have a cost that's dependent on the type of rule, how many dimensions it includes, and how frequently it's run. Before you create alert rules, see the "Alert rules" section in [Azure Monitor pricing](https://azure.microsoft.com/pricing/details/monitor/).
2424
25-
If you're not familiar with Azure Monitor alerts, see [Overview of alerts in Microsoft Azure](../alerts/alerts-overview.md) before you start. To learn more about alerts that use log queries, see [Log alerts in Azure Monitor](../alerts/alerts-unified-log.md). For more about metric alerts, see [Metric alerts in Azure Monitor](../alerts/alerts-metric-overview.md).
25+
If you aren't familiar with Azure Monitor alerts, see [Overview of alerts in Microsoft Azure](../alerts/alerts-overview.md) before you start. To learn more about alerts that use log queries, see [Log alerts in Azure Monitor](../alerts/alerts-unified-log.md). For more about metric alerts, see [Metric alerts in Azure Monitor](../alerts/alerts-metric-overview.md).
2626

2727
## Log query measurements
2828
[Log alerts](../alerts/alerts-unified-log.md) can measure two different things, which can be used to monitor virtual machines in different scenarios:
2929

30-
- [Result count](../alerts/alerts-unified-log.md#result-count): Counts the number of rows returned by the query, and can be used to work with events such as Windows event logs, syslog, application exceptions.
31-
- [Calculation of a value](../alerts/alerts-unified-log.md#calculation-of-a-value): Makes a calculation based on a numeric column, and can be used to include any number of resources. For example, CPU percentage.
32-
### Targeting resources and dimensions
30+
- [Result count](../alerts/alerts-unified-log.md#result-count): Counts the number of rows returned by the query and can be used to work with events such as Windows event logs, Syslog, and application exceptions.
31+
- [Calculation of a value](../alerts/alerts-unified-log.md#calculation-of-a-value): Makes a calculation based on a numeric column and can be used to include any number of resources. An example is CPU percentage.
3332

34-
You can monitor multiple instances’ values with one rule using dimensions. For example, you would use dimensions if you want to monitor the CPU usage on multiple instances running your web site or app, and create an alert for CPU usage of over 80%.
33+
### Target resources and dimensions
3534

36-
To create resource-centric alerts at scale for a subscription or resource group, you can **Split by dimensions**. When you want to monitor the same condition on multiple Azure resources, splitting by dimensions splits the alerts into separate alerts by grouping unique combinations using numerical or string columns. Splitting on Azure resource ID column makes the specified resource into the alert target.
35+
You can use one rule to monitor the values of multiple instances by using dimensions. For example, you would use dimensions if you wanted to monitor the CPU usage on multiple instances running your website or app, and create an alert for CPU usage of over 80%.
3736

38-
You may also decide not to split when you want a condition on multiple resources in the scope. For example, if you want to create an alert if at least five machines in the resource group scope have CPU usage over 80%.
37+
To create resource-centric alerts at scale for a subscription or resource group, you can *split by dimensions*. When you want to monitor the same condition on multiple Azure resources, splitting by dimensions splits the alerts into separate alerts by grouping unique combinations by using numerical or string columns. Splitting an Azure resource ID column makes the specified resource into the alert target.
3938

40-
:::image type="content" source="../vm/media/monitor-virtual-machines/log-alert-rule.png" alt-text="Screenshot of a new log alert rule with split by dimensions." lightbox="../vm/media/monitor-virtual-machines/log-alert-rule.png":::
39+
You might also decide not to split when you want a condition on multiple resources in the scope. For example, you might want to create an alert if at least five machines in the resource group scope have CPU usage over 80%.
40+
41+
:::image type="content" source="../vm/media/monitor-virtual-machines/log-alert-rule.png" alt-text="Screenshot that shows a new log alert rule with split by dimensions." lightbox="../vm/media/monitor-virtual-machines/log-alert-rule.png":::
42+
43+
You might want to see a list of the alerts by affected computer. You can use a custom workbook that uses a custom [resource graph](../../governance/resource-graph/overview.md) to provide this view. Use the following query to display alerts, and use the data source **Azure Resource Graph** in the workbook.
4144

42-
You might want to see a list of the alerts by affected computer. You can use a custom workbook that uses a custom [Resource Graph](../../governance/resource-graph/overview.md) to provide this view. Use the following query to display alerts, and use the data source **Azure Resource Graph** in the workbook.
4345
## Create a log query alert rule
44-
[This example of a log query alert](../vm/monitor-virtual-machine-alerts.md#example-log-query-alert) provides a complete walkthrough of creating a log query alert rule. You can use these same processes to create alert rules for AKS clusters using queries similar to the ones in this article.
46+
[This example of a log query alert](../vm/monitor-virtual-machine-alerts.md#example-log-query-alert) provides a complete walkthrough of creating a log query alert rule. You can use these same processes to create alert rules for AKS clusters by using queries similar to the ones in this article.
4547

46-
## Resource utilization
48+
## Resource utilization
4749

48-
**Average CPU utilization as an average of member nodes' CPU utilization every minute (metric measurement)**
50+
Average CPU utilization as an average of member nodes' CPU utilization every minute (metric measurement):
4951

5052
```kusto
5153
let endDateTime = now();
@@ -80,7 +82,7 @@ KubeNodeInventory
8082
| summarize AggregatedValue = avg(UsagePercent) by bin(TimeGenerated, trendBinSize), ClusterName
8183
```
8284

83-
**Average memory utilization as an average of member nodes' memory utilization every minute (metric measurement)**
85+
Average memory utilization as an average of member nodes' memory utilization every minute (metric measurement):
8486

8587
```kusto
8688
let endDateTime = now();
@@ -115,11 +117,10 @@ KubeNodeInventory
115117
| summarize AggregatedValue = avg(UsagePercent) by bin(TimeGenerated, trendBinSize), ClusterName
116118
```
117119

118-
119120
>[!IMPORTANT]
120121
>The following queries use the placeholder values \<your-cluster-name> and \<your-controller-name> to represent your cluster and controller. Replace them with values specific to your environment when you set up alerts.
121122
122-
**Average CPU utilization of all containers in a controller as an average of CPU utilization of every container instance in a controller every minute (metric measurement)**
123+
Average CPU utilization of all containers in a controller as an average of CPU utilization of every container instance in a controller every minute (metric measurement):
123124

124125
```kusto
125126
let endDateTime = now();
@@ -159,7 +160,7 @@ KubePodInventory
159160
| summarize AggregatedValue = avg(UsagePercent) by bin(TimeGenerated, trendBinSize) , ContainerName
160161
```
161162

162-
**Average memory utilization of all containers in a controller as an average of memory utilization of every container instance in a controller every minute (metric measurement)**
163+
Average memory utilization of all containers in a controller as an average of memory utilization of every container instance in a controller every minute (metric measurement):
163164

164165
```kusto
165166
let endDateTime = now();
@@ -199,9 +200,9 @@ KubePodInventory
199200
| summarize AggregatedValue = avg(UsagePercent) by bin(TimeGenerated, trendBinSize) , ContainerName
200201
```
201202

202-
## Resource availability
203+
## Resource availability
203204

204-
**Nodes and counts that have a status of Ready and NotReady (metric measurement)**
205+
Nodes and counts that have a status of Ready and NotReady (metric measurement):
205206

206207
```kusto
207208
let endDateTime = now();
@@ -228,7 +229,8 @@ KubeNodeInventory
228229
NotReadyCount = todouble(NotReadyCount) / ClusterSnapshotCount
229230
| order by ClusterName asc, Computer asc, TimeGenerated desc
230231
```
231-
The following query returns pod phase counts based on all phases: *Failed*, *Pending*, *Unknown*, *Running*, or *Succeeded*.
232+
233+
The following query returns pod phase counts based on all phases: `Failed`, `Pending`, `Unknown`, `Running`, or `Succeeded`.
232234

233235
```kusto
234236
let endDateTime = now();
@@ -265,7 +267,7 @@ KubePodInventory
265267
```
266268

267269
>[!NOTE]
268-
>To alert on certain pod phases, such as *Pending*, *Failed*, or *Unknown*, modify the last line of the query. For example, to alert on *FailedCount* use: <br/>`| summarize AggregatedValue = avg(FailedCount) by bin(TimeGenerated, trendBinSize)`
270+
>To alert on certain pod phases, such as `Pending`, `Failed`, or `Unknown`, modify the last line of the query. For example, to alert on `FailedCount`, use `| summarize AggregatedValue = avg(FailedCount) by bin(TimeGenerated, trendBinSize)`.
269271
270272
The following query returns cluster nodes disks that exceed 90% free space used. To get the cluster ID, first run the following query and copy the value from the `ClusterId` property:
271273

@@ -294,12 +296,8 @@ InsightsMetrics
294296
| where AggregatedValue >= 90
295297
```
296298

299+
Individual container restarts (number of results) alert when the individual system container restart count exceeds a threshold for the last 10 minutes:
297300

298-
299-
**Individual container restarts (number of results)**<br>
300-
Alerts when the individual system container restart count exceeds a threshold for last 10 minutes.
301-
302-
303301
```kusto
304302
let _threshold = 10m;
305303
let _alertThreshold = 2;
@@ -317,6 +315,5 @@ KubePodInventory
317315

318316
## Next steps
319317

320-
- View [log query examples](container-insights-log-query.md) to see pre-defined queries and examples to evaluate or customize for alerting, visualizing, or analyzing your clusters.
321-
318+
- View [log query examples](container-insights-log-query.md) to see predefined queries and examples to evaluate or customize for alerting, visualizing, or analyzing your clusters.
322319
- To learn more about Azure Monitor and how to monitor other aspects of your Kubernetes cluster, see [View Kubernetes cluster performance](container-insights-analyze.md) and [View Kubernetes cluster health](./container-insights-overview.md).

0 commit comments

Comments
 (0)