Skip to content

Commit 37e5ecd

Browse files
authored
Merge pull request #253726 from bwren/waf-containers
Containers best practices guide
2 parents 67fe88c + 7a7a45f commit 37e5ecd

8 files changed

+199
-16
lines changed
Lines changed: 50 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,50 @@
1+
---
2+
title: Best practices for monitoring Kubernetes
3+
description: Provides a template for a Well-Architected Framework (WAF) article specific to monitoring Kubernetes with Azure Monitor.
4+
ms.topic: conceptual
5+
author: bwren
6+
ms.author: bwren
7+
ms.date: 03/29/2023
8+
ms.reviewer: bwren
9+
---
10+
11+
# Best practices for monitoring Kubernetes with Azure Monitor
12+
This article provides best practices for monitoring the health and performance of your [Azure Kubernetes Service (AKS)](../aks/intro-kubernetes.md) and [Azure Arc-enabled Kubernetes](../azure-arc/kubernetes/overview.md) clusters. The guidance is based on the five pillars of architecture excellence described in [Azure Well-Architected Framework](/azure/architecture/framework/).
13+
14+
15+
16+
## Reliability
17+
In the cloud, we acknowledge that failures happen. Instead of trying to prevent failures altogether, the goal is to minimize the effects of a single failing component. Use the following information to best leverage Azure Monitor to ensure the reliability of your Kubernetes clusters and monitoring environment.
18+
19+
[!INCLUDE [waf-containers-reliability](includes/waf-containers-reliability.md)]
20+
21+
22+
## Security
23+
Security is one of the most important aspects of any architecture. Azure Monitor provides features to employ both the principle of least privilege and defense-in-depth. Use the following information to monitor your Kubernetes clusters and ensure that only authorized users access collected data.
24+
25+
[!INCLUDE [waf-containers-security](includes/waf-containers-security.md)]
26+
27+
28+
## Cost optimization
29+
Cost optimization refers to ways to reduce unnecessary expenses and improve operational efficiencies. You can significantly reduce your cost for Azure Monitor by understanding your different configuration options and opportunities to reduce the amount of data that it collects. See [Azure Monitor cost and usage](usage-estimated-costs.md) to understand the different ways that Azure Monitor charges and how to view your monthly bill.
30+
31+
> [!NOTE]
32+
> See [Optimize costs in Azure Monitor](best-practices-cost.md) for cost optimization recommendations across all features of Azure Monitor.
33+
34+
[!INCLUDE [waf-containers-cost](includes/waf-containers-cost.md)]
35+
36+
37+
## Operational excellence
38+
Operational excellence refers to operations processes required keep a service running reliably in production. Use the following information to minimize the operational requirements for monitoring your Kubernetes clusters.
39+
40+
[!INCLUDE [waf-containers-operation](includes/waf-containers-operation.md)]
41+
42+
43+
## Performance efficiency
44+
Performance efficiency is the ability of your workload to scale to meet the demands placed on it by users in an efficient manner. Use the following information to monitor the performance of your Kubernetes clusters and ensure they're configured for maximum performance.
45+
46+
[!INCLUDE [waf-containers-performance](includes/waf-containers-performance.md)]
47+
48+
## Next step
49+
50+
- [Get best practices for a complete deployment of Azure Monitor](best-practices.md).

articles/azure-monitor/best-practices-cost.md

Lines changed: 7 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -40,28 +40,19 @@ This article describes [Cost optimization](/azure/architecture/framework/cost/)
4040
|:---|:---|
4141
| Collect only critical resource log data from Azure resources. | When you create [diagnostic settings](essentials/diagnostic-settings.md) to send [resource logs](essentials/resource-logs.md) for your Azure resources to a Log Analytics database, only specify those categories that you require. Since diagnostic settings don't allow granular filtering of resource logs, you can use a [workspace transformation](essentials/data-collection-transformations.md?#workspace-transformation-dcr) to further filter unneeded data for those resources that use a [supported table](logs/tables-feature-support.md). See [Diagnostic settings in Azure Monitor](essentials/diagnostic-settings.md#controlling-costs) for details on how to configure diagnostic settings and using transformations to filter their data. |
4242

43-
## Virtual machines
43+
## Alerts
4444

45-
[!INCLUDE [waf-vm-cost](includes/waf-vm-cost.md)]
45+
[!INCLUDE [waf-containers-cost](includes/waf-alerts-cost.md)]
4646

47-
## Container insights
4847

49-
### Design checklist
48+
## Virtual machines
5049

51-
> [!div class="checklist"]
52-
> - Configure agent collection to remove unneeded data.
53-
> - Modify settings for collection of metric data.
54-
> - Limit Prometheus metrics collected.
55-
> - Configure Basic Logs.
56-
### Configuration recommendations
50+
[!INCLUDE [waf-vm-cost](includes/waf-vm-cost.md)]
51+
52+
## Containers
5753

58-
| Recommendation | Benefit |
59-
|:---|:---|
60-
| Configure agent collection to remove unneeded data. | Analyze the data collected by Container insights as described in [Controlling ingestion to reduce cost](containers/container-insights-cost.md#control-ingestion-to-reduce-cost) and adjust your configuration to stop collection of data in ContainerLogs you don't need. |
61-
| Modify settings for collection of metric data. | You can reduce your costs by modifying the default collection settings Container insights uses for the collection of metric data. See [Enable cost optimization settings](containers/container-insights-cost-config.md) for details on modifying both the frequency that metric data is collected and the namespaces that are collected. |
62-
| Limit Prometheus metrics collected. | If you configured Prometheus metric scraping, then follow the recommendations at [Controlling ingestion to reduce cost](containers/container-insights-cost.md#prometheus-metrics-scraping) to optimize your data collection for cost. |
63-
| Configure Basic Logs. | [Convert your schema to ContainerLogV2](containers/container-insights-logging-v2.md) which is compatible with Basic logs and can provide significant cost savings as described in [Controlling ingestion to reduce cost](containers/container-insights-cost.md#configure-basic-logs). |
6454

55+
[!INCLUDE [waf-containers-cost](includes/waf-vm-cost.md)]
6556

6657

6758

Lines changed: 34 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,34 @@
1+
---
2+
author: bwren
3+
ms.author: bwren
4+
ms.service: azure-monitor
5+
ms.topic: include
6+
ms.date: 03/30/2023
7+
---
8+
9+
### Design checklist
10+
11+
> [!div class="checklist"]
12+
> - Don't enable Container insights collection of Prometheus metrics.
13+
> - Configure agent collection to modify data collection in Container insights.
14+
> - Modify settings for collection of metric data by Container insights.
15+
> - Disable Container insights collection of metric data if you don't use the Container insights experience in the Azure portal.
16+
> - If you don't query the container logs table regularly or use it for alerts, configure it as basic logs.
17+
> - Limit collection of resource logs you don't need.
18+
> - Use resource-specific logging for AKS resource logs and configure tables as basic logs.
19+
> - Use OpenCost to collect details about your Kubernetes costs.
20+
21+
### Configuration recommendations
22+
23+
24+
| Recommendation | Benefit |
25+
|:---|:---|
26+
| Don't enable Container insights collection of Prometheus metrics in Log Analytics workspace if you've enabled scraping of metrics with Prometheus. | In addition to scraping Prometheus metrics from your cluster using [Azure Monitor managed service for Prometheus](../containers/prometheus-metrics-enable.md), you can configure Container insights to [collect Prometheus metrics in your Log Analytics workspace](../containers/container-insights-prometheus-logs.md). This is redundant with the data in Managed Prometheus and will result in additional cost. |
27+
| Configure agent to modify data collection in Container insights. | Analyze the data collected by Container insights as described in [Controlling ingestion to reduce cost](../containers/container-insights-cost.md#control-ingestion-to-reduce-cost) and adjust your configuration to stop collection of data you don't need. |
28+
| Modify settings for collection of metric data by Container insights. | See [Enable cost optimization settings](../containers/container-insights-cost-config.md) for details on modifying both the frequency that metric data is collected and the namespaces that are collected by Container insights. |
29+
| Disable Container insights collection of metric data if you don't use the Container insights experience in the Azure portal. | Container insights collects many of the same metric values as [Managed Prometheus](../containers/prometheus-metrics-enable.md). You can disable collection of these metrics by configuring Container insights to only collect **Logs and events** as described in [Enable cost optimization settings in Container insights](../containers/container-insights-cost-config.md#custom-data-collection). This configuration disables the Container insights experience in the Azure portal, but you can use Grafana to visualize Prometheus metrics and Log Analytics to analyze log data collected by Container insights. |
30+
| If you don't query the container logs table regularly or use it for alerts, configure it as basic logs. | [Convert your Container insights schema to ContainerLogV2](../containers/container-insights-logging-v2.md) which is compatible with Basic logs and can provide significant cost savings as described in [Controlling ingestion to reduce cost](../containers/container-insights-cost.md#configure-basic-logs). |
31+
| Limit collection of resource logs you don't need. | Control plane logs for AKS clusters are implemented as resource logs in Azure Monitor. [Create a diagnostic setting](../../aks/monitor-aks.md#resource-logs) to send this data to a Log Analytics workspace. See [Collect control plane logs for AKS clusters](../containers/monitor-kubernetes.md#collect-control-plane-logs-for-aks-clusters) for recommendations on which categories you should collect. |
32+
| Use resource-specific logging for AKS resource logs and configure tables as basic logs. | AKS supports either Azure diagnostics mode or resource-specific mode for [resource logs](../../aks/monitor-aks.md#resource-logs). Specify resource logs to enable the option to configure the tables for [basic logs](../logs/basic-logs-configure.md), which provide a reduced ingestion charge for logs that you only occasionally query and don't use for alerting. |
33+
| Use OpenCost to collect details about your Kubernetes costs. | [OpenCost](https://www.opencost.io/docs/configuration/azure-prices) is an open-source, vendor-neutral CNCF sandbox project for understanding your Kubernetes costs and supporting your ability to for AKS cost visibility. It exports detailed costing data in addition to customer-specific Azure pricing to Azure storage to assist the cluster administrator in analyzing and categorizing costs. |
34+
Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,27 @@
1+
---
2+
author: bwren
3+
ms.author: bwren
4+
ms.service: azure-monitor
5+
ms.topic: include
6+
ms.date: 03/30/2023
7+
---
8+
9+
### Design checklist
10+
11+
> [!div class="checklist"]
12+
> - Review guidance for monitoring all layers of your Kubernetes environment.
13+
> - Use Azure Arc-enabled Kubernetes to monitor your clusters outside of Azure.
14+
> - Use Azure managed services for cloud native tools.
15+
> - Integrate AKS clusters into your existing monitoring tools.
16+
> - Use Azure policy to enable data collection from your Kubernetes cluster.
17+
18+
19+
### Configuration recommendations
20+
21+
| Recommendation | Benefit |
22+
|:---|:---|
23+
| Review guidance for monitoring all layers of your Kubernetes environment. | [Monitor your Kubernetes cluster performance with Container insights](../containers/container-insights-analyze.md) includes guidance and best practices for monitoring your entire Kubernetes environment from the network, cluster, and application layers. |
24+
| Use Azure Arc-enabled Kubernetes to monitor your clusters outside of Azure. | [Azure Arc-enabled Kubernetes](../containers/container-insights-enable-arc-enabled-clusters.md) allows your Kubernetes clusters running in other clouds to be monitored using the same tools as your AKS clusters, including Container insights and Azure Monitor managed service for Prometheus. |
25+
| Use Azure managed services for cloud native tools. | [Azure Monitor managed service for Prometheus](../essentials/prometheus-metrics-overview.md) and [Azure managed Grafana](../../managed-grafana/overview.md) support all the features of the cloud native tools Prometheus and Grafana without having to operate their underlying infrastructure. You can quickly provision these tools and onboard your Kubernetes clusters with minimal overhead. These services allow you to access an extensive library of community rules and dashboards to monitor your Kubernetes environment. |
26+
| Integrate AKS clusters into your existing monitoring tools. | If you have an existing investment in Prometheus and Grafana, integrate your AKS clusters and Azure managed services into your existing environment using the guidance in [Monitor Kubernetes clusters using Azure services and cloud native tools](../containers/monitor-kubernetes.md). |
27+
| Use Azure policy to enable data collection from your Kubernetes cluster. | Use [Azure Policy](../../governance/policy/overview.md) to enable data collection for enabling [Prometheus metrics](../essentials/prometheus-metrics-enable.md?tabs=azurepolicy), [Container insights](../containers/container-insights-enable-aks-policy.md), and [diagnostic settings](../essentials/diagnostic-settings-policy.md). This ensures that any new clusters are automatically monitored and enforces their monitoring configuration. |
Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,22 @@
1+
---
2+
author: bwren
3+
ms.author: bwren
4+
ms.service: azure-monitor
5+
ms.topic: include
6+
ms.date: 03/30/2023
7+
---
8+
9+
### Design checklist
10+
11+
> [!div class="checklist"]
12+
> - Enable collection of Prometheus metrics for your cluster.
13+
> - Enable Container insights to track performance of your cluster.
14+
> - Enable recommended Prometheus alerts.
15+
16+
### Configuration recommendations
17+
18+
| Recommendation | Benefit |
19+
|:---|:---|
20+
| Enable collection of Prometheus metrics for your cluster. | [Prometheus](https://prometheus.io) is a cloud-native metrics solution from the Cloud Native Compute Foundation and the most common tool used for collecting and analyzing metric data from Kubernetes clusters. [Enable Prometheus](../containers/prometheus-metrics-enable.md) on your cluster with [Azure Monitor managed service for Prometheus](../essentials/prometheus-metrics-overview.md) if you don't already have a Prometheus environment. Use [Azure Managed Grafana](../../managed-grafana/overview.md) to analyze the Prometheus data collected.<br><br>See [Customize scraping of Prometheus metrics in Azure Monitor managed service for Prometheus](../containers/prometheus-metrics-scrape-configuration.md) to collect additional metrics beyond the [default configuration](../containers/prometheus-metrics-scrape-default.md). |
21+
| Enable Container insights to track performance of your cluster. | When you [enable Container insights](../containers/prometheus-metrics-enable.md) for your Kubernetes cluster, you can use [views](../containers/container-insights-analyze.md) and [workbooks](../containers/container-insights-reports.md) to track the performance of the components of your cluster. This data may overlap with data collected by Prometheus. See [Cost optimization](../best-practices-containers.md#cost-optimization) for recommendations regarding cost. |
22+
| Enable recommended Prometheus alerts. | [Alerts](../alerts/alerts-overview.md) in Azure Monitor proactively notify you when issues are detected. Start with a set of [recommended Prometheus alert rules](../containers/container-insights-metric-alerts.md#enable-prometheus-alert-rules) that detect the most common availability and performance issues with your cluster. Potentially add [log query alerts](../containers/container-insights-log-alerts.md) using data collected by Container insights. |
Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,28 @@
1+
---
2+
author: bwren
3+
ms.author: bwren
4+
ms.service: azure-monitor
5+
ms.topic: include
6+
ms.date: 03/30/2023
7+
---
8+
9+
10+
### Design checklist
11+
12+
> [!div class="checklist"]
13+
> - Enable scraping of Prometheus metrics for your cluster.
14+
> - Enable Container insights for collection of logs and performance data from your cluster.
15+
> - Create diagnostic settings to collect control plane logs for AKS clusters.
16+
> - Enable recommended Prometheus alerts.
17+
> - Ensure the availability of the Log Analytics workspace supporting Container insights.
18+
19+
20+
### Configuration recommendations
21+
22+
| Recommendation | Benefit |
23+
|:---|:---|
24+
| Enable scraping of Prometheus metrics for your cluster. | [Enable Prometheus](../containers/prometheus-metrics-enable.md) on your cluster with [Azure Monitor managed service for Prometheus](../essentials/prometheus-metrics-overview.md) if you don't already have a Prometheus environment. Use [Azure Managed Grafana](../../managed-grafana/overview.md) to analyze the Prometheus data collected. See [Customize scraping of Prometheus metrics in Azure Monitor managed service for Prometheus](../containers/prometheus-metrics-scrape-configuration.md) to collect additional metrics beyond the [default configuration](../containers/prometheus-metrics-scrape-default.md). |
25+
| Enable Container insights for collection of logs and performance data from your cluster. | [Container insights](../containers/container-insights-overview.md) collects stdout/stderr logs, performance metrics, and Kubernetes events from each node in your cluster. It provides dashboards and reports for analyzing this data, including the availability of your nodes and other components. Use [Log Analytics](../logs/log-analytics-overview.md) to identify any availability errors in your collected logs. |
26+
| Create diagnostic settings to collect control plane logs for AKS clusters. | AKS implements control planes logs as [resource logs](../essentials/resource-logs.md) in Azure Monitor. [Create a diagnostic setting](../essentials/diagnostic-settings.md) to send these logs to your Log Analytics workspace so you can use [log queries](../logs/log-query-overview.md) to identify errors and issues affecting availability. |
27+
| Enable recommended Prometheus alerts. | [Alerts](../alerts/alerts-overview.md) in Azure Monitor proactively notify you when issues are detected. Start with a set of [recommended Prometheus alert rules](../containers/container-insights-metric-alerts.md#enable-prometheus-alert-rules) that detect the most common availability and performance issues with your cluster. Potentially add [log query alerts](../containers/container-insights-log-alerts.md) using data collected by Container insights. |
28+
| Ensure the availability of the Log Analytics workspace supporting Container insights. | Container insights relies on a Log Analytics workspace. See [Best practices for Azure Monitor Logs](../best-practices-logs.md#reliability) for recommendations to ensure the reliability of the workspace. |

0 commit comments

Comments
 (0)