Skip to content

Commit 646c0e7

Browse files
author
Jess Egler
committed
Adds page with troubleshooting a NotReady node in a KubernetesCluster
1 parent eca28d2 commit 646c0e7

File tree

1 file changed

+83
-0
lines changed

1 file changed

+83
-0
lines changed
Lines changed: 83 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,83 @@
1+
---
2+
title: Troubleshoot KubernetesCluster problems pertaining to a not ready node
3+
description: Learn what to do when you see a node in NotReady in your kubernetesCluster.
4+
ms.service: azure-operator-nexus
5+
ms.custom: troubleshooting
6+
ms.topic: troubleshooting
7+
ms.date: 02/19/2025
8+
ms.author: jessegler
9+
author: jessegler
10+
---
11+
# Troubleshoot a KubernetesCluster with a node in NotReady state
12+
13+
Follow this troubleshooting guide if you see a kubernetesCluster with a node in NotReady.
14+
15+
## Prerequisites
16+
17+
- Ability to run kubectl commands against the KubernetesCluster
18+
- Familiarize yourself with the capabilities referenced in this article by reviewing the [BMM actions](howto-baremetal-functions.md).
19+
20+
## Cause
21+
22+
- After Baremetalmachine restart or Cluster runtime upgrade a node may enter the **NotReady** status.
23+
- Tainting, cordoning, or powering off a Baremetalmachine will cause nodes running on that Baremetalmachine to become **NotReady**. If possible, remove the taint, uncordon, or power on the Baremetalmachine. If not possible, the following the procedure below may allow the node to reschedule to a different Baremetalmachine.
24+
25+
## Procedure
26+
27+
Delete the node by following the example below. This will allow the Cluster to attempt to reschedule and restart the node.
28+
29+
~~~bash
30+
# get list of nodes, wide
31+
32+
$ kubectl get nodes -owide
33+
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
34+
mytest-naks1-3b466a17-agentpool1-md-6bg5h-7qt2b Ready <none> 6d3h v1.27.3 10.4.74.30 <none> CBL-Mariner/Linux 5.15.153.1-2.cm2 containerd://1.6.26
35+
mytest-naks1-3b466a17-agentpool1-md-6bg5h-dqmzw Ready <none> 6d3h v1.27.3 10.4.74.31 <none> CBL-Mariner/Linux 5.15.153.1-2.cm2 containerd://1.6.26
36+
mytest-naks1-3b466a17-agentpool1-md-6bg5h-lkhhq NotReady <none> 6d3h v1.27.3 10.4.74.29 <none> CBL-Mariner/Linux 5.15.153.1-2.cm2 containerd://1.6.26
37+
mytest-naks1-3b466a17-control-plane-6q7ns Ready control-plane 6d3h v1.27.3 10.4.74.14 <none> CBL-Mariner/Linux 5.15.153.1-2.cm2 containerd://1.6.26
38+
mytest-naks1-3b466a17-control-plane-8qqvz Ready control-plane 6d3h v1.27.3 10.4.74.28 <none> CBL-Mariner/Linux 5.15.153.1-2.cm2 containerd://1.6.26
39+
mytest-naks1-3b466a17-control-plane-g42mh Ready control-plane 6d3h v1.27.3 10.4.74.32 <none> CBL-Mariner/Linux 5.15.153.1-2.cm2 containerd://1.6.26
40+
41+
# see NotReady node. issue kubectl command to delete it:
42+
43+
$ kubectl delete node mytest-naks1-3b466a17-agentpool1-md-6bg5h-lkhhq
44+
node "mytest-naks1-3b466a17-agentpool1-md-6bg5h-lkhhq" deleted
45+
46+
47+
# node list is now shows that node is gone:
48+
49+
$ kubectl get nodes -owide
50+
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
51+
mytest-naks1-3b466a17-agentpool1-md-6bg5h-7qt2b Ready <none> 6d3h v1.27.3 10.4.74.30 <none> CBL-Mariner/Linux 5.15.153.1-2.cm2 containerd://1.6.26
52+
mytest-naks1-3b466a17-agentpool1-md-6bg5h-dqmzw Ready <none> 6d3h v1.27.3 10.4.74.31 <none> CBL-Mariner/Linux 5.15.153.1-2.cm2 containerd://1.6.26
53+
mytest-naks1-3b466a17-control-plane-6q7ns Ready control-plane 6d3h v1.27.3 10.4.74.14 <none> CBL-Mariner/Linux 5.15.153.1-2.cm2 containerd://1.6.26
54+
mytest-naks1-3b466a17-control-plane-8qqvz Ready control-plane 6d3h v1.27.3 10.4.74.28 <none> CBL-Mariner/Linux 5.15.153.1-2.cm2 containerd://1.6.26
55+
mytest-naks1-3b466a17-control-plane-g42mh Ready control-plane 6d3h v1.27.3 10.4.74.32 <none> CBL-Mariner/Linux 5.15.153.1-2.cm2 containerd://1.6.26
56+
57+
58+
# wait 5-15 minutes while its replaced. see it returned with a new name:
59+
60+
$ kubectl get nodes -owide
61+
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
62+
mytest-naks1-3b466a17-agentpool1-md-6bg5h-7qt2b Ready <none> 6d3h v1.27.3 10.4.74.30 <none> CBL-Mariner/Linux 5.15.153.1-2.cm2 containerd://1.6.26
63+
mytest-naks1-3b466a17-agentpool1-md-6bg5h-dqmzw Ready <none> 6d3h v1.27.3 10.4.74.31 <none> CBL-Mariner/Linux 5.15.153.1-2.cm2 containerd://1.6.26
64+
mytest-naks1-3b466a17-agentpool1-md-6bg5h-nxkks NotReady <none> 42s v1.27.3 10.4.74.12 <none> CBL-Mariner/Linux 5.15.153.1-2.cm2 containerd://1.6.26
65+
mytest-naks1-3b466a17-control-plane-6q7ns Ready control-plane 6d3h v1.27.3 10.4.74.14 <none> CBL-Mariner/Linux 5.15.153.1-2.cm2 containerd://1.6.26
66+
mytest-naks1-3b466a17-control-plane-8qqvz Ready control-plane 6d3h v1.27.3 10.4.74.28 <none> CBL-Mariner/Linux 5.15.153.1-2.cm2 containerd://1.6.26
67+
mytest-naks1-3b466a17-control-plane-g42mh Ready control-plane 6d3h v1.27.3 10.4.74.32 <none> CBL-Mariner/Linux 5.15.153.1-2.cm2 containerd://1.6.26
68+
69+
70+
# wait a bit longer, while the new NotReady node becomes Ready:
71+
72+
$ kubectl get nodes -owide
73+
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
74+
mytest-naks1-3b466a17-agentpool1-md-6bg5h-7qt2b Ready <none> 6d3h v1.27.3 10.4.74.30 <none> CBL-Mariner/Linux 5.15.153.1-2.cm2 containerd://1.6.26
75+
mytest-naks1-3b466a17-agentpool1-md-6bg5h-dqmzw Ready <none> 6d3h v1.27.3 10.4.74.31 <none> CBL-Mariner/Linux 5.15.153.1-2.cm2 containerd://1.6.26
76+
mytest-naks1-3b466a17-agentpool1-md-6bg5h-nxkks Ready <none> 97s v1.27.3 10.4.74.12 <none> CBL-Mariner/Linux 5.15.153.1-2.cm2 containerd://1.6.26
77+
mytest-naks1-3b466a17-control-plane-6q7ns Ready control-plane 6d3h v1.27.3 10.4.74.14 <none> CBL-Mariner/Linux 5.15.153.1-2.cm2 containerd://1.6.26
78+
mytest-naks1-3b466a17-control-plane-8qqvz Ready control-plane 6d3h v1.27.3 10.4.74.28 <none> CBL-Mariner/Linux 5.15.153.1-2.cm2 containerd://1.6.26
79+
mytest-naks1-3b466a17-control-plane-g42mh Ready control-plane 6d3h v1.27.3 10.4.74.32 <none> CBL-Mariner/Linux 5.15.153.1-2.cm2 containerd://1.6.26
80+
~~~
81+
82+
If you still have questions, [contact support](https://portal.azure.com/?#blade/Microsoft_Azure_Support/HelpAndSupportBlade).
83+
For more information about Support plans, see [Azure Support plans](https://azure.microsoft.com/support/plans/response/).

0 commit comments

Comments
 (0)