Skip to content

Commit 0c21853

Browse files
authored
Update troubleshoot-apiserver-etcd.md
1 parent 6cfb8d7 commit 0c21853

File tree

1 file changed

+33
-0
lines changed

1 file changed

+33
-0
lines changed

support/azure/azure-kubernetes/create-upgrade-delete/troubleshoot-apiserver-etcd.md

Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -279,6 +279,39 @@ The following procedure shows you how to throttle an offending client's LIST Pod
279279
kubectl get --raw /metrics | grep "restrict-bad-client"
280280
```
281281

282+
### Solution 3c: Use the API Server Resource Intensive Listing Detector in Azure Portal
283+
284+
> **New:** Azure Kubernetes Service now provides a built-in analyzer to help you identify agents making resource-intensive LIST calls, which are a leading cause of API server and etcd performance issues.
285+
286+
**How to access the detector:**
287+
288+
1. Open your AKS cluster in the Azure portal.
289+
2. Go to **Diagnose and solve problems**.
290+
3. Click **Cluster and Control Plane Availability and Performance**.
291+
4. Select **API server resource intensive listing detector**.
292+
293+
This detector analyzes recent API server activity and highlights agents or workloads generating large or frequent LIST calls. It provides a summary of potential impacts, such as request timeouts, increased 408/503 errors, node instability, health probe failures, and OOM-Kills in API server or etcd.
294+
295+
#### How to interpret the detector output
296+
297+
- **Summary:**
298+
Indicates if resource-intensive LIST calls were detected and describes possible impacts on your cluster.
299+
- **Analysis window:**
300+
Shows the 30-minute window analyzed, with peak memory and CPU usage.
301+
- **Read types:**
302+
Explains whether LIST calls were served from the API server cache (preferred) or required fetching from etcd (most impactful).
303+
- **Charts and tables:**
304+
Identify which agents, namespaces, or workloads are generating the most resource-intensive LIST calls.
305+
306+
> Only successful LIST calls are counted. Failed or throttled calls are excluded.
307+
308+
The analyzer also provides actionable recommendations directly in the Azure portal, tailored to the detected patterns, to help you remediate and optimize your cluster.
309+
310+
> **Note:**
311+
> The API server resource intensive listing detector is available to all users with access to the AKS resource in the Azure portal. No special permissions or prerequisites are required.
312+
>
313+
> After identifying the offending agents and applying the above recommendations, you can further use [Priority and Fairness](https://kubernetes.io/docs/concepts/cluster-administration/flow-control/) to throttle or isolate problematic clients.
314+
282315
## Cause 4: A custom webhook might cause a deadlock in API server pods
283316

284317
A custom webhook, such as Kyverno, might be causing a deadlock within API server pods.

0 commit comments

Comments
 (0)