Skip to content

Commit 645b865

Browse files
Merge pull request #294980 from j-egler/main
Adds page with troubleshooting a NotReady node in a KubernetesCluster
2 parents 983727e + ab5caf7 commit 645b865

File tree

2 files changed

+91
-0
lines changed

2 files changed

+91
-0
lines changed

articles/operator-nexus/TOC.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -367,6 +367,8 @@
367367
href: troubleshoot-internet-host-virtual-machine.md
368368
- name: Troubleshoot VM errors after BMM restart
369369
href: troubleshoot-vm-error-after-reboot.md
370+
- name: Troubleshoot NotReady KuberentesCluster node
371+
href: troubleshoot-not-ready-kubernetes-cluster-node.md
370372
- name:
371373
Troubleshooting dual-stack configuration issues for Nexus Kubernetes
372374
cluster
Lines changed: 89 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,89 @@
1+
---
2+
title: Troubleshoot KubernetesCluster with a node in NotReady
3+
description: Learn what to do when you see a node in NotReady in your kubernetesCluster.
4+
ms.service: azure-operator-nexus
5+
ms.custom: troubleshooting
6+
ms.topic: troubleshooting
7+
ms.date: 02/19/2025
8+
ms.author: jessegler
9+
author: jessegler
10+
---
11+
# Troubleshoot a KubernetesCluster with a node in NotReady state
12+
13+
Follow this troubleshooting guide if you see a kubernetesCluster with a node in **NotReady**.
14+
15+
## Prerequisites
16+
17+
- Ability to run kubectl commands against the KubernetesCluster
18+
- Familiarity with the capabilities referenced in this article by reviewing the [Baremetalmachine actions](howto-baremetal-functions.md).
19+
20+
## Cause
21+
22+
- After Baremetalmachine restart or Cluster runtime upgrade, a node may enter the **NotReady** status.
23+
- Tainting, cordoning, or powering off a Baremetalmachine causes nodes running on that Baremetalmachine to become **NotReady**. If possible, remove the taint, uncordon, or power on the Baremetalmachine. If not possible, the following the procedure below may allow the node to reschedule to a different Baremetalmachine.
24+
25+
## Procedure
26+
27+
Delete the node by following the instructions below. This will allow the Cluster to attempt to reschedule and restart the node.
28+
29+
30+
1. Use kubectl to list the nodes using the wide flag. Observe the node in **NotReady** status.
31+
32+
~~~bash
33+
$ kubectl get nodes -owide
34+
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
35+
mytest-naks1-3b466a17-agentpool1-md-6bg5h-7qt2b Ready <none> 6d3h v1.27.3 10.4.74.30 <none> CBL-Mariner/Linux 5.15.153.1-2.cm2 containerd://1.6.26
36+
mytest-naks1-3b466a17-agentpool1-md-6bg5h-dqmzw Ready <none> 6d3h v1.27.3 10.4.74.31 <none> CBL-Mariner/Linux 5.15.153.1-2.cm2 containerd://1.6.26
37+
mytest-naks1-3b466a17-agentpool1-md-6bg5h-lkhhq NotReady <none> 6d3h v1.27.3 10.4.74.29 <none> CBL-Mariner/Linux 5.15.153.1-2.cm2 containerd://1.6.26
38+
mytest-naks1-3b466a17-control-plane-6q7ns Ready control-plane 6d3h v1.27.3 10.4.74.14 <none> CBL-Mariner/Linux 5.15.153.1-2.cm2 containerd://1.6.26
39+
mytest-naks1-3b466a17-control-plane-8qqvz Ready control-plane 6d3h v1.27.3 10.4.74.28 <none> CBL-Mariner/Linux 5.15.153.1-2.cm2 containerd://1.6.26
40+
mytest-naks1-3b466a17-control-plane-g42mh Ready control-plane 6d3h v1.27.3 10.4.74.32 <none> CBL-Mariner/Linux 5.15.153.1-2.cm2 containerd://1.6.26
41+
~~~
42+
43+
1. Issue the kubectl command to delete the node.
44+
45+
~~~bash
46+
$ kubectl delete node mytest-naks1-3b466a17-agentpool1-md-6bg5h-lkhhq
47+
node "mytest-naks1-3b466a17-agentpool1-md-6bg5h-lkhhq" deleted
48+
~~~
49+
50+
1. List the nodes again and see that the node is gone.
51+
52+
~~~bash
53+
$ kubectl get nodes -owide
54+
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
55+
mytest-naks1-3b466a17-agentpool1-md-6bg5h-7qt2b Ready <none> 6d3h v1.27.3 10.4.74.30 <none> CBL-Mariner/Linux 5.15.153.1-2.cm2 containerd://1.6.26
56+
mytest-naks1-3b466a17-agentpool1-md-6bg5h-dqmzw Ready <none> 6d3h v1.27.3 10.4.74.31 <none> CBL-Mariner/Linux 5.15.153.1-2.cm2 containerd://1.6.26
57+
mytest-naks1-3b466a17-control-plane-6q7ns Ready control-plane 6d3h v1.27.3 10.4.74.14 <none> CBL-Mariner/Linux 5.15.153.1-2.cm2 containerd://1.6.26
58+
mytest-naks1-3b466a17-control-plane-8qqvz Ready control-plane 6d3h v1.27.3 10.4.74.28 <none> CBL-Mariner/Linux 5.15.153.1-2.cm2 containerd://1.6.26
59+
mytest-naks1-3b466a17-control-plane-g42mh Ready control-plane 6d3h v1.27.3 10.4.74.32 <none> CBL-Mariner/Linux 5.15.153.1-2.cm2 containerd://1.6.26
60+
~~~
61+
62+
1. Wait 5-15 minutes for the node to be replaced. See that its returned with a new name. It will show **NotReady** as it comes up.
63+
64+
~~~bash
65+
$ kubectl get nodes -owide
66+
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
67+
mytest-naks1-3b466a17-agentpool1-md-6bg5h-7qt2b Ready <none> 6d3h v1.27.3 10.4.74.30 <none> CBL-Mariner/Linux 5.15.153.1-2.cm2 containerd://1.6.26
68+
mytest-naks1-3b466a17-agentpool1-md-6bg5h-dqmzw Ready <none> 6d3h v1.27.3 10.4.74.31 <none> CBL-Mariner/Linux 5.15.153.1-2.cm2 containerd://1.6.26
69+
mytest-naks1-3b466a17-agentpool1-md-6bg5h-nxkks NotReady <none> 42s v1.27.3 10.4.74.12 <none> CBL-Mariner/Linux 5.15.153.1-2.cm2 containerd://1.6.26
70+
mytest-naks1-3b466a17-control-plane-6q7ns Ready control-plane 6d3h v1.27.3 10.4.74.14 <none> CBL-Mariner/Linux 5.15.153.1-2.cm2 containerd://1.6.26
71+
mytest-naks1-3b466a17-control-plane-8qqvz Ready control-plane 6d3h v1.27.3 10.4.74.28 <none> CBL-Mariner/Linux 5.15.153.1-2.cm2 containerd://1.6.26
72+
mytest-naks1-3b466a17-control-plane-g42mh Ready control-plane 6d3h v1.27.3 10.4.74.32 <none> CBL-Mariner/Linux 5.15.153.1-2.cm2 containerd://1.6.26
73+
~~~
74+
75+
1. Wait a bit longer and the **NotReady** node becomes **Ready**.
76+
77+
~~~bash
78+
$ kubectl get nodes -owide
79+
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
80+
mytest-naks1-3b466a17-agentpool1-md-6bg5h-7qt2b Ready <none> 6d3h v1.27.3 10.4.74.30 <none> CBL-Mariner/Linux 5.15.153.1-2.cm2 containerd://1.6.26
81+
mytest-naks1-3b466a17-agentpool1-md-6bg5h-dqmzw Ready <none> 6d3h v1.27.3 10.4.74.31 <none> CBL-Mariner/Linux 5.15.153.1-2.cm2 containerd://1.6.26
82+
mytest-naks1-3b466a17-agentpool1-md-6bg5h-nxkks Ready <none> 97s v1.27.3 10.4.74.12 <none> CBL-Mariner/Linux 5.15.153.1-2.cm2 containerd://1.6.26
83+
mytest-naks1-3b466a17-control-plane-6q7ns Ready control-plane 6d3h v1.27.3 10.4.74.14 <none> CBL-Mariner/Linux 5.15.153.1-2.cm2 containerd://1.6.26
84+
mytest-naks1-3b466a17-control-plane-8qqvz Ready control-plane 6d3h v1.27.3 10.4.74.28 <none> CBL-Mariner/Linux 5.15.153.1-2.cm2 containerd://1.6.26
85+
mytest-naks1-3b466a17-control-plane-g42mh Ready control-plane 6d3h v1.27.3 10.4.74.32 <none> CBL-Mariner/Linux 5.15.153.1-2.cm2 containerd://1.6.26
86+
~~~
87+
88+
If you still have questions, [contact support](https://portal.azure.com/?#blade/Microsoft_Azure_Support/HelpAndSupportBlade).
89+
For more information about Support plans, see [Azure Support plans](https://azure.microsoft.com/support/plans/response/).

0 commit comments

Comments
 (0)