Skip to content

Commit 8dd3a0d

Browse files
author
Amson Liu
authored
Merge pull request #9260 from kthakar1990/patch-1
AB#6542: Update troubleshoot-apiserver-etcd.md
2 parents 49170be + b1fa3d0 commit 8dd3a0d

File tree

1 file changed

+37
-4
lines changed

1 file changed

+37
-4
lines changed

support/azure/azure-kubernetes/create-upgrade-delete/troubleshoot-apiserver-etcd.md

Lines changed: 37 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -3,9 +3,9 @@ title: Troubleshoot API server and etcd problems in AKS
33
description: Provides a troubleshooting guide for API server and etcd problems in Azure Kubernetes Services.
44
author: seguler
55
ms.author: segule
6-
ms.date: 11/15/2024
6+
ms.date: 07/22/2025
77
ms.service: azure-kubernetes-service
8-
ms.reviewer: mikerooney, v-weizhu, axelg, josebl, aritraghosh, v-leedennis
8+
ms.reviewer: kthakar1990, v-weizhu, axelg, josebl, aritraghosh, v-leedennis, v-liuamson
99
ms.custom: sap:Create, Upgrade, Scale and Delete operations (cluster or nodepool)
1010
---
1111
# Troubleshoot API server and etcd problems in Azure Kubernetes Services
@@ -16,7 +16,7 @@ Microsoft has tested the reliability and performance of the API server at a scal
1616

1717
## Prerequisites
1818

19-
- [Azure CLI](/cli/azure/install-azure-cli).
19+
- The [Azure CLI](/cli/azure/install-azure-cli).
2020

2121
- The Kubernetes [kubectl](https://kubernetes.io/docs/reference/kubectl/overview/) tool. To install kubectl by using Azure CLI, run the [az aks install-cli](/cli/azure/aks#az-aks-install-cli) command.
2222

@@ -89,7 +89,40 @@ Although it's helpful to know which clients generate the highest request volume,
8989

9090
### Step 2: Identify and chart the average latency of API server requests per user agent
9191

92-
To identify the average latency of API server requests per user agent as plotted on a time chart, run the following query:
92+
**1.a.** Use the API Server Resource Intensive Listing Detector in Azure Portal
93+
94+
> **New:** Azure Kubernetes Service now provides a built-in analyzer to help you identify agents making resource-intensive LIST calls, which are a leading cause of API server and etcd performance issues.
95+
96+
**How to access the detector:**
97+
98+
1. Open your AKS cluster in the Azure portal.
99+
2. Go to **Diagnose and solve problems**.
100+
3. Click **Cluster and Control Plane Availability and Performance**.
101+
4. Select **API server resource intensive listing detector**.
102+
103+
This detector analyzes recent API server activity and highlights agents or workloads generating large or frequent LIST calls. It provides a summary of potential impacts, such as request timeouts, increased 408/503 errors, node instability, health probe failures, and OOM-Kills in API server or etcd.
104+
105+
#### How to interpret the detector output
106+
107+
- **Summary:**
108+
Indicates if resource-intensive LIST calls were detected and describes possible impacts on your cluster.
109+
- **Analysis window:**
110+
Shows the 30-minute window analyzed, with peak memory and CPU usage.
111+
- **Read types:**
112+
Explains whether LIST calls were served from the API server cache (preferred) or required fetching from etcd (most impactful).
113+
- **Charts and tables:**
114+
Identify which agents, namespaces, or workloads are generating the most resource-intensive LIST calls.
115+
116+
> Only successful LIST calls are counted. Failed or throttled calls are excluded.
117+
118+
The analyzer also provides actionable recommendations directly in the Azure portal, tailored to the detected patterns, to help you remediate and optimize your cluster.
119+
120+
> [!NOTE]
121+
> The API server resource intensive listing detector is available to all users with access to the AKS resource in the Azure portal. No special permissions or prerequisites are required.
122+
>
123+
> After identifying the offending agents and applying the above recommendations, you can further use [Priority and Fairness](https://kubernetes.io/docs/concepts/cluster-administration/flow-control/) or refer to [this section](/troubleshoot/azure/azure-kubernetes/create-upgrade-delete/troubleshoot-apiserver-etcd?branch=pr-en-us-9260&tabs=resource-specific#cause-3-an-offending-client-makes-excessive-list-or-put-calls) to throttle or isolate problematic clients.
124+
125+
**1.b.** Additionally, you can also run following query to identify the average latency of API server requests per user agent as plotted on a time chart:
93126

94127
### [Resource-specific](#tab/resource-specific)
95128

0 commit comments

Comments
 (0)