Skip to content

Commit bb36418

Browse files
authored
Merge pull request #250823 from bwren/aks
Updates to AKS monitoring
2 parents 72dcca5 + af1495c commit bb36418

File tree

5 files changed

+60
-26
lines changed

5 files changed

+60
-26
lines changed
14.6 KB
Loading
-180 KB
Loading

articles/aks/monitor-aks-reference.md

Lines changed: 17 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -49,23 +49,23 @@ The following table lists [dimensions](../azure-monitor/essentials/data-platform
4949

5050
AKS implements control plane logs for the cluster as [resource logs in Azure Monitor.](../azure-monitor/essentials/resource-logs.md). See [Resource logs](monitor-aks.md#resource-logs) for details on creating a diagnostic setting to collect these logs and [Sample queries](monitor-aks-reference.md#resource-logs) for query examples.
5151

52-
The following table lists the resource log categories you can collect for AKS. All logs are written to the [AzureDiagnostics](/azure/azure-monitor/reference/tables/azurediagnostics) table.
53-
54-
55-
| Category | Description |
56-
|:---|:---|
57-
| kube-apiserver | Logs from the API server. |
58-
| kube-audit | Audit log data for every audit event including get, list, create, update, delete, patch, and post. |
59-
| kube-audit-admin | Subset of the kube-audit log category. Significantly reduces the number of logs by excluding the get and list audit events from the log. |
60-
| kube-controller-manager | Gain deeper visibility of issues that may arise between Kubernetes and the Azure control plane. A typical example is the AKS cluster having a lack of permissions to interact with Azure. |
61-
| kube-scheduler | Logs from the scheduler. |
62-
| cluster-autoscaler | Understand why the AKS cluster is scaling up or down, which may not be expected. This information is also useful to correlate time intervals where something interesting may have happened in the cluster. |
63-
| cloud-controller-manager | Logs from the cloud-node-manager component of the Kubernetes cloud controller manager.|
64-
| guard | Managed Azure Active Directory and Azure RBAC audits. For managed Azure AD, this includes token in and user info out. For Azure RBAC, this includes access reviews in and out. |
65-
| csi-azuredisk-controller | Logs from the Azure Disk CSI storage driver. |
66-
| csi-azurefile-controller | Logs from the Azure Files CSI storage driver. |
67-
| csi-snapshot-controller | Logs from the Azure CSI snapshot driver controller. |
68-
| AllMetrics | Includes all platform metrics. Sends these values to Log Analytics workspace where it can be evaluated with other data using log queries. |
52+
The following table lists the resource log categories you can collect for AKS. It also includes the table the logs for each category are sent to when you send the logs to a Log Analytics workspace using [resource-specific mode](../azure-monitor/essentials/resource-logs.md#resource-specific). In [Azure diagnostics mode](../azure-monitor/essentials/resource-logs.md#azure-diagnostics-mode), all logs are written to the [AzureDiagnostics](/azure/azure-monitor/reference/tables/azurediagnostics) table.
53+
54+
55+
| Category | Description | Table<br>(resource-specific mode) |
56+
|:---|:---|:---|
57+
| kube-apiserver | Logs from the API server. | AKSControlPlane |
58+
| kube-audit | Audit log data for every audit event including get, list, create, update, delete, patch, and post. | AKSAudit |
59+
| kube-audit-admin | Subset of the kube-audit log category. Significantly reduces the number of logs by excluding the get and list audit events from the log. | AKSAuditAdmin |
60+
| kube-controller-manager | Gain deeper visibility of issues that may arise between Kubernetes and the Azure control plane. A typical example is the AKS cluster having a lack of permissions to interact with Azure. | AKSControlPlane |
61+
| kube-scheduler | Logs from the scheduler. | AKSControlPlane |
62+
| cluster-autoscaler | Understand why the AKS cluster is scaling up or down, which may not be expected. This information is also useful to correlate time intervals where something interesting may have happened in the cluster. | AKSControlPlane |
63+
| cloud-controller-manager | Logs from the cloud-node-manager component of the Kubernetes cloud controller manager.| AKSControlPlane |
64+
| guard | Managed Azure Active Directory and Azure RBAC audits. For managed Azure AD, this includes token in and user info out. For Azure RBAC, this includes access reviews in and out. | AKSControlPlane |
65+
| csi-azuredisk-controller | Logs from the Azure Disk CSI storage driver. | AKSControlPlane |
66+
| csi-azurefile-controller | Logs from the Azure Files CSI storage driver. | AKSControlPlane |
67+
| csi-snapshot-controller | Logs from the Azure CSI driver snapshot controller. | AKSControlPlane |
68+
| AllMetrics | Includes all platform metrics. Sends these values to Log Analytics workspace where it can be evaluated with other data using log queries. | AzureMetrics |
6969

7070
For reference, see a list of [all resource logs category types supported in Azure Monitor](../azure-monitor/essentials/resource-logs-schema.md).
7171

articles/aks/monitor-aks.md

Lines changed: 35 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ author: bwren
55
ms.author: bwren
66
ms.topic: conceptual
77
ms.custom: subject-monitoring
8-
ms.date: 08/01/2023
8+
ms.date: 09/11/2023
99
---
1010

1111

@@ -38,8 +38,12 @@ AKS generates the same kinds of monitoring data as other Azure resources that ar
3838

3939
The **Monitoring** tab on the **Overview** page offers a quick way to get started viewing monitoring data in the Azure portal for each AKS cluster. This includes graphs with common metrics for the cluster separated by node pool. Click on any of these graphs to further analyze the data in [metrics explorer](../azure-monitor/essentials/metrics-getting-started.md).
4040

41+
The **Overview** page also includes links to [Managed Prometheus](#integrations) and [Container insights](#integrations) for the current cluster. If you haven't already enabled these tools, you'll be prompted to do so. You may also see a banner at the top of the screen recommending that you enable additional features to improve monitoring of your cluster.
42+
4143
:::image type="content" source="media/monitor-aks/overview.png" alt-text="Screenshot of AKS overview page." lightbox="media/monitor-aks/overview.png":::
4244

45+
46+
4347
> [!TIP]
4448
> Access monitoring features for all AKS clusters in your subscription from the **Monitoring** menu in the Azure portal, or for a single AKS cluster from the **Monitor** section of the **Kubernetes services** menu.
4549
@@ -51,24 +55,49 @@ Control plane logs for AKS clusters are implemented as [resource logs](../azure-
5155
See [Create diagnostic settings](../azure-monitor/essentials/diagnostic-settings.md#create-diagnostic-settings) for the detailed process for creating a diagnostic setting using the Azure portal, CLI, or PowerShell. When you create a diagnostic setting, you specify which categories of logs to collect. The categories for AKS are listed in [AKS monitoring data reference](monitor-aks-reference.md#resource-logs).
5256

5357
> [!IMPORTANT]
54-
> There can be substantial cost when collecting resource logs for AKS, particularly for *kube-audit* logs. Consider disabling kube-audit logging when not required. An alternative approach to significantly reduce cost is by enabling collection from *kube-audit-admin*, which excludes the get and list audit events. See [Monitor Azure Kubernetes Service (AKS) with Azure Monitor]() for further recommendations and [Cost optimization and Azure Monitor][cost-optimization-azure-monitor] for further strategies to reduce your Azure Monitor.
58+
> There can be substantial cost when collecting resource logs for AKS, particularly for *kube-audit* logs. Consider the following recommendations to reduce the amount of data collected:
59+
>
60+
> - Disable kube-audit logging when not required.
61+
> - Enable collection from *kube-audit-admin*, which excludes the get and list audit events.
62+
> - Enable resource-specific logs as described below and configure `AKSAudit` table as [basic logs](../azure-monitor/logs/basic-logs-configure.md).
63+
>
64+
> See [Monitor Kubernetes clusters using Azure services and cloud native tools](../azure-monitor/containers/monitor-kubernetes.md) for further recommendations and [Cost optimization and Azure Monitor](../azure-monitor/best-practices-cost.md) for further strategies to reduce your monitoring costs.
5565
5666
:::image type="content" source="media/monitor-aks/diagnostic-setting-categories.png" alt-text="Screenshot of AKS diagnostic setting dialog box." lightbox="media/monitor-aks/diagnostic-setting-categories.png":::
5767

68+
AKS supports either [Azure diagnostics mode](../azure-monitor/essentials/resource-logs.md#azure-diagnostics-mode) or [resource-specific mode](../azure-monitor/essentials/resource-logs.md#resource-specific) for resource logs. This specifies the tables in the Log Analytics workspace where the data is sent. Azure diagnostics mode sends all data to the [AzureDiagnostics table](/azure/azure-monitor/reference/tables/azurediagnostics), while resource-specific mode sends data to [AKS Audit](/azure/azure-monitor/reference/tables/aksaudit), [AKS Audit Admin](/azure/azure-monitor/reference/tables/aksauditadmin), and [AKS Control Plane](/azure/azure-monitor/reference/tables/akscontrolplane) as shown in the table at [Resource logs](monitor-aks-reference.md#resource-logs).
69+
70+
Resource-specific mode is recommended for AKS for the following reasons:
71+
72+
- Data is easier to query because it's in individual tables dedicated to AKS.
73+
- Supports configuration as [basic logs](../azure-monitor/logs/basic-logs-configure.md) for significant cost savings.
74+
75+
For more details on the difference between collection modes including how to change an existing setting, see [Select the collection mode](../azure-monitor/essentials/resource-logs.md#select-the-collection-mode).
76+
77+
> [!NOTE]
78+
> The ability to select the collection mode isn't available in the Azure portal in all regions yet. For those regions where it's not yet available, use CLI to create the diagnostic setting with a command such as the following:
79+
>
80+
> ```azurecli
81+
> az monitor diagnostic-settings create --name AKS-Diagnostics --resource /subscriptions/xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx/resourceGroups/myresourcegroup/providers/Microsoft.ContainerService/managedClusters/my-cluster --logs '[{""category"": ""kube-audit"",""enabled"": true}, {""category"": ""kube-audit-admin"", ""enabled"": true}, {""category"": ""kube-apiserver"", ""enabled"": true}, {""category"": ""kube-controller-manager"", ""enabled"": true}, {""category"": ""kube-scheduler"", ""enabled"": true}, {""category"": ""cluster-autoscaler"", ""enabled"": true}, {""category"": ""cloud-controller-manager"", ""enabled"": true}, {""category"": ""guard"", ""enabled"": true}, {""category"": ""csi-azuredisk-controller"", ""enabled"": true}, {""category"": ""csi-azurefile-controller"", ""enabled"": true}, {""category"": ""csi-snapshot-controller"", ""enabled"": true}]' --workspace /subscriptions/xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx/resourcegroups/myresourcegroup/providers/microsoft.operationalinsights/workspaces/myworkspace --export-to-resource-specific true
82+
> ```
5883
5984
## Sample log queries
6085
6186
> [!IMPORTANT]
6287
> When you select **Logs** from the menu for an AKS cluster, Log Analytics is opened with the query scope set to the current cluster. This means that log queries will only include data from that resource. If you want to run a query that includes data from other clusters or data from other Azure services, select **Logs** from the **Azure Monitor** menu. See [Log query scope and time range in Azure Monitor Log Analytics](../azure-monitor/logs/scope.md) for details.
6388
6489
65-
The resource logs for AKS are stored in the [AzureDiagnostics](/azure/azure-monitor/reference/tables/azurediagnostics) table. You can distinguish different logs with the **Category** column. For a description of each category, see [AKS reference resource logs](monitor-aks-reference.md).
90+
If the [diagnostic setting for your cluster](#resource-logs) uses Azure diagnostics mode, the resource logs for AKS are stored in the [AzureDiagnostics](/azure/azure-monitor/reference/tables/azurediagnostics) table. You can distinguish different logs with the **Category** column. For a description of each category, see [AKS reference resource logs](monitor-aks-reference.md).
6691
6792
| Description | Log query |
6893
|:---|:---|
69-
| Count logs for each category | AzureDiagnostics<br>\| where ResourceType == "MANAGEDCLUSTERS"<br>\| summarize count() by Category |
70-
| All API server logs | AzureDiagnostics<br>\| where Category == "kube-apiserver" |
71-
| All kube-audit logs withing a time range | let starttime = datetime("2023-02-23");<br>let endtime = datetime("2023-02-24");<br>AzureDiagnostics<br>\| where TimeGenerated between(starttime..endtime)<br>\| where Category == "kube-audit"<br>\| extend event = parse_json(log_s)<br>\| extend HttpMethod = tostring(event.verb)<br>\| extend User = tostring(event.user.username)<br>\| extend Apiserver = pod_s<br>\| extend SourceIP = tostring(event.sourceIPs[0])<br>\| project TimeGenerated, Category, HttpMethod, User, Apiserver, SourceIP, OperationName, event |
94+
| Count logs for each category<br>(Azure diagnostics mode) | AzureDiagnostics<br>\| where ResourceType == "MANAGEDCLUSTERS"<br>\| summarize count() by Category |
95+
| All API server logs<br>(Azure diagnostics mode) | AzureDiagnostics<br>\| where Category == "kube-apiserver" |
96+
| All kube-audit logs in a time range<br>(Azure diagnostics mode) | let starttime = datetime("2023-02-23");<br>let endtime = datetime("2023-02-24");<br>AzureDiagnostics<br>\| where TimeGenerated between(starttime..endtime)<br>\| where Category == "kube-audit"<br>\| extend event = parse_json(log_s)<br>\| extend HttpMethod = tostring(event.verb)<br>\| extend User = tostring(event.user.username)<br>\| extend Apiserver = pod_s<br>\| extend SourceIP = tostring(event.sourceIPs[0])<br>\| project TimeGenerated, Category, HttpMethod, User, Apiserver, SourceIP, OperationName, event |
97+
| All audit logs<br>(resource-specific mode) | AKSAudit |
98+
| All audit logs excluding the get and list audit events <br>(resource-specific mode) | AKSAuditAdmin |
99+
| All API server logs<br>(resource-specific mode) | AKSControlPlane<br>\| where Category == "kube-apiserver" |
100+
72101
73102
74103
To access a set of prebuilt queries in the Log Analytics workspace, see the [Log Analytics queries interface](../azure-monitor/logs/queries.md#queries-interface) and select resource type **Kubernetes Services**. For a list of common queries for Container insights, see [Container insights queries](../azure-monitor/containers/container-insights-log-query.md).

articles/azure-monitor/essentials/diagnostic-settings.md

Lines changed: 8 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -250,6 +250,8 @@ Use the [az monitor diagnostic-settings create](/cli/azure/monitor/diagnostic-se
250250
251251
The following example CLI command creates a diagnostic setting by using all three destinations. The syntax is slightly different depending on your client.
252252

253+
To specify [resource-specific mode](resource-logs.md#resource-specific) if the service supports it, add the `export-to-resource-specific` parameter with a value of `true`.`
254+
253255
**CMD client**
254256

255257
```azurecli
@@ -260,7 +262,8 @@ az monitor diagnostic-settings create ^
260262
--metrics "[{""category"": ""AllMetrics"",""enabled"": true}]" ^
261263
--storage-account /subscriptions/xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx/resourceGroups/myresourcegroup/providers/Microsoft.Storage/storageAccounts/mystorageaccount ^
262264
--workspace /subscriptions/xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx/resourcegroups/myresourcegroup/providers/microsoft.operationalinsights/workspaces/myworkspace ^
263-
--event-hub-rule /subscriptions/xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx/resourceGroups/myresourcegroup/providers/Microsoft.EventHub/namespaces/myeventhub/authorizationrules/RootManageSharedAccessKey
265+
--event-hub-rule /subscriptions/xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx/resourceGroups/myresourcegroup/providers/Microsoft.EventHub/namespaces/myeventhub/authorizationrules/RootManageSharedAccessKey ^
266+
--export-to-resource-specific true
264267
```
265268

266269
**PowerShell client**
@@ -273,7 +276,8 @@ az monitor diagnostic-settings create `
273276
--metrics '[{""category"": ""AllMetrics"",""enabled"": true}]' `
274277
--storage-account /subscriptions/xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx/resourceGroups/myresourcegroup/providers/Microsoft.Storage/storageAccounts/mystorageaccount `
275278
--workspace /subscriptions/xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx/resourcegroups/myresourcegroup/providers/microsoft.operationalinsights/workspaces/myworkspace `
276-
--event-hub-rule /subscriptions/xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx/resourceGroups/myresourcegroup/providers/Microsoft.EventHub/namespaces/myeventhub/authorizationrules/RootManageSharedAccessKey
279+
--event-hub-rule /subscriptions/xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx/resourceGroups/myresourcegroup/providers/Microsoft.EventHub/namespaces/myeventhub/authorizationrules/RootManageSharedAccessKey `
280+
--export-to-resource-specific true
277281
```
278282

279283
**Bash client**
@@ -286,7 +290,8 @@ az monitor diagnostic-settings create \
286290
--metrics '[{"category": "AllMetrics","enabled": true}]' \
287291
--storage-account /subscriptions/xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx/resourceGroups/myresourcegroup/providers/Microsoft.Storage/storageAccounts/mystorageaccount \
288292
--workspace /subscriptions/xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx/resourcegroups/myresourcegroup/providers/microsoft.operationalinsights/workspaces/myworkspace \
289-
--event-hub-rule /subscriptions/xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx/resourceGroups/myresourcegroup/providers/Microsoft.EventHub/namespaces/myeventhub/authorizationrules/RootManageSharedAccessKey
293+
--event-hub-rule /subscriptions/xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx/resourceGroups/myresourcegroup/providers/Microsoft.EventHub/namespaces/myeventhub/authorizationrules/RootManageSharedAccessKey \
294+
--export-to-resource-specific true
290295
```
291296

292297
# [Resource Manager](#tab/arm)

0 commit comments

Comments
 (0)