Skip to content

Commit 45ffb19

Browse files
authored
Merge pull request #9223 from genlin/main6342
Troubleshoot Pod Scheduling Errors in AKS
2 parents 0cf0e00 + ce522f9 commit 45ffb19

File tree

2 files changed

+133
-0
lines changed

2 files changed

+133
-0
lines changed
Lines changed: 131 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,131 @@
1+
---
2+
title: Troubleshoot Pod scheduler Errors in Azure Kubernetes Service
3+
description: Explains common scheduler errors, their causes, and how to resolve them.
4+
ms.date: 06/30/2025
5+
ms.reviewer:
6+
ms.service: azure-kubernetes-service
7+
ms.custom: sap:Node/node pool availability and performance
8+
---
9+
10+
# Troubleshoot pod scheduler errors in Azure Kubernetes Service
11+
12+
When you deploy workloads in Azure Kubernetes Service (AKS), you might encounter scheduler errors that prevent Pods from running. This article provides solutions to common scheduler errors.
13+
14+
## Error: 0/(X) nodes are available: Y node(s) had volume node affinity conflict
15+
16+
> [!NOTE]
17+
> X and Y represent the number of nodes. These values depend on your cluster configuration.
18+
19+
Pods remain in the Pending state with the following scheduler error:
20+
21+
>0/(X) nodes are available: Y node(s) had volume node affinity conflict.
22+
23+
### Cause
24+
25+
[Persistent Volumes](https://kubernetes.io/docs/concepts/storage/persistent-volumes/#node-affinity) define `nodeAffinity` rules that restrict which nodes can access the volume. If none of the available nodes satisfy the volume's affinity rules, the scheduler cannot assign the Pod to any node.
26+
27+
### Solution
28+
29+
1. Review the node affinity set on your Persistent Volume resource:
30+
31+
```bash
32+
kubectl get pv <pv-name> -o yaml
33+
```
34+
2. Check node labels in the cluster:
35+
36+
```bash
37+
kubectl get nodes --show-labels
38+
```
39+
3. Make sure that at least one node's labels match the `nodeAffinity` specified in the Persistent Volume's YAML spec.
40+
4. To resolve the conflict, Update the Persistent Volume's `nodeAffinity` rules to match existing node labels or add the required labels to the correct node:
41+
42+
```bash
43+
kubectl label nodes <node-name> <key>=<value>
44+
```
45+
Or, modify the PV's affinity rules to match existing node labels.
46+
5. After resolving the conflict, monitor the Pod status or retry the deployment.
47+
48+
49+
## Error: 0/(X) nodes are available: Insufficient CPU
50+
51+
Pods remain in the Pending state with the scheduler error:
52+
53+
>Error: 0/(X) nodes are available: Insufficient CPU.
54+
55+
### Cause
56+
57+
This issue occurs when one or more of the following conditions are met:
58+
59+
- All node resources are in use.
60+
- The pending Pod's resource requests exceed available CPU on the nodes.
61+
- The node pools lack sufficient resources or have incorrect configuration settings.
62+
63+
### Solution
64+
65+
1. Review CPU usage on all nodes and verify if there is enough unallocated CPU to meet the pod's request.
66+
67+
```bash
68+
kubectl describe pod <pod-name>
69+
kubectl describe nodes
70+
```
71+
2. If no node has enough CPU, increase the number of nodes or use larger VM sizes in the node pool:
72+
73+
```bash
74+
75+
az aks nodepool scale \
76+
  --resource-group <resource-group> \
77+
  --cluster-name <aks-name> \
78+
  --name <nodepool-name> \
79+
  --node-count <desired-node-count>
80+
```
81+
3. Optimize Pod resource requests. Make sure that CPU requests and limits are appropriate for your node sizes.
82+
4. Verify if any scheduling constraints are restricting pod placement across available nodes.
83+
84+
## Error: 0/(X) nodes are available: Y node(s) had untolerated taint
85+
86+
Pods remain in the Pending state with the error:
87+
88+
>Error: 0/(X) nodes are available: Y node(s) had untolerated taint.
89+
90+
### Cause
91+
92+
The Kubernetes scheduler tries to assign the Pod to a node, but all nodes are rejected for one of the following reasons:
93+
94+
- The node has a taint (`key=value:effect`) that the Pod doesn't tolerate.
95+
96+
- The node has other taint-based restrictions that prevent the Pod from being scheduled.
97+
98+
### Solution
99+
100+
1. Check node taints:
101+
```bash
102+
 kubectl get nodes -o json | jq '.items[].spec.taints'
103+
```
104+
2. Add necessary tolerations to Pod spec: Edit your deployment or Pod YAML to include matching tolerations for the taints on your nodes. For example, if your node has the taint key=value:NoSchedule, your Pod spec must include:
105+
106+
```yml
107+
tolerations:
108+
- key: "key"
109+
operator: "Equal"
110+
value: "value"
111+
effect: "NoSchedule"
112+
```
113+
If the taint isn't needed, you can remove it from the node:
114+
115+
```bash
116+
kubectl taint nodes <node-name> <key>:<effect>-
117+
```
118+
4. Redeploy or monitor the Pod status:
119+
120+
```bash
121+
kubectl get pods -o wide
122+
```
123+
## Reference
124+
125+
- [Kubernetes: Use Azure Disks with Azure Kubernetes Service](/azure/aks/azure-disks-dynamic-pv)
126+
- [Kubernetes: Use node taints](/azure/aks/use-node-taints)
127+
- [Kubernetes Documentation: Insufficient CPU](https://kubernetes.io/docs/concepts/scheduling-eviction/resource-bin-packing/#insufficient-resource)
128+
- [Kubernetes Documentation: Assign and Schedule Pods with Taints and Tolerations](https://kubernetes.io/docs/concepts/scheduling-eviction/taint-and-toleration/)
129+
130+
[!INCLUDE [Third-party disclaimer](../../../includes/third-party-contact-disclaimer.md)]
131+
[!INCLUDE [Azure Help Support](../../../includes/azure-help-support.md)]

support/azure/azure-kubernetes/toc.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -172,6 +172,8 @@
172172
href: availability-performance/high-memory-consumption-disk-intensive-applications.md
173173
- name: Troubleshoot cluster service health probe mode issues
174174
href: availability-performance/cluster-service-health-probe-mode-issues.md
175+
- name: Troubleshoot pod scheduler errors
176+
href: availability-performance/troubleshoot-pod-scheduler-errors.md
175177
- name: Troubleshoot node not ready
176178
items:
177179
- name: Basic troubleshooting

0 commit comments

Comments
 (0)