You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
ms.custom: sap:Create, Upgrade, Scale and Delete operations (cluster or nodepool)
10
10
---
11
11
# Troubleshoot API server and etcd problems in Azure Kubernetes Services
@@ -16,7 +16,7 @@ Microsoft has tested the reliability and performance of the API server at a scal
16
16
17
17
## Prerequisites
18
18
19
-
-[Azure CLI](/cli/azure/install-azure-cli).
19
+
-The [Azure CLI](/cli/azure/install-azure-cli).
20
20
21
21
- The Kubernetes [kubectl](https://kubernetes.io/docs/reference/kubectl/overview/) tool. To install kubectl by using Azure CLI, run the [az aks install-cli](/cli/azure/aks#az-aks-install-cli) command.
22
22
@@ -89,7 +89,40 @@ Although it's helpful to know which clients generate the highest request volume,
89
89
90
90
### Step 2: Identify and chart the average latency of API server requests per user agent
91
91
92
-
To identify the average latency of API server requests per user agent as plotted on a time chart, run the following query:
92
+
**1.a.** Use the API Server Resource Intensive Listing Detector in Azure Portal
93
+
94
+
> **New:** Azure Kubernetes Service now provides a built-in analyzer to help you identify agents making resource-intensive LIST calls, which are a leading cause of API server and etcd performance issues.
95
+
96
+
**How to access the detector:**
97
+
98
+
1. Open your AKS cluster in the Azure portal.
99
+
2. Go to **Diagnose and solve problems**.
100
+
3. Click **Cluster and Control Plane Availability and Performance**.
101
+
4. Select **API server resource intensive listing detector**.
102
+
103
+
This detector analyzes recent API server activity and highlights agents or workloads generating large or frequent LIST calls. It provides a summary of potential impacts, such as request timeouts, increased 408/503 errors, node instability, health probe failures, and OOM-Kills in API server or etcd.
104
+
105
+
#### How to interpret the detector output
106
+
107
+
-**Summary:**
108
+
Indicates if resource-intensive LIST calls were detected and describes possible impacts on your cluster.
109
+
-**Analysis window:**
110
+
Shows the 30-minute window analyzed, with peak memory and CPU usage.
111
+
-**Read types:**
112
+
Explains whether LIST calls were served from the API server cache (preferred) or required fetching from etcd (most impactful).
113
+
-**Charts and tables:**
114
+
Identify which agents, namespaces, or workloads are generating the most resource-intensive LIST calls.
115
+
116
+
> Only successful LIST calls are counted. Failed or throttled calls are excluded.
117
+
118
+
The analyzer also provides actionable recommendations directly in the Azure portal, tailored to the detected patterns, to help you remediate and optimize your cluster.
119
+
120
+
> [!NOTE]
121
+
> The API server resource intensive listing detector is available to all users with access to the AKS resource in the Azure portal. No special permissions or prerequisites are required.
122
+
>
123
+
> After identifying the offending agents and applying the above recommendations, you can further use [Priority and Fairness](https://kubernetes.io/docs/concepts/cluster-administration/flow-control/) or refer to [this section](/troubleshoot/azure/azure-kubernetes/create-upgrade-delete/troubleshoot-apiserver-etcd?branch=pr-en-us-9260&tabs=resource-specific#cause-3-an-offending-client-makes-excessive-list-or-put-calls) to throttle or isolate problematic clients.
124
+
125
+
**1.b.** Additionally, you can also run following query to identify the average latency of API server requests per user agent as plotted on a time chart:
0 commit comments