Skip to content

Commit c48d2dc

Browse files
authored
Merge pull request #45004 from mburke5678/BZ-2078674
BZ2078674: Restoring cluster to a previous state fail in case nodes certificate are updated after the last backup
2 parents 81664d2 + f90684d commit c48d2dc

File tree

1 file changed

+47
-0
lines changed

1 file changed

+47
-0
lines changed

modules/dr-restoring-cluster-state.adoc

Lines changed: 47 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -139,6 +139,53 @@ static-pod-resources/kube-controller-manager-pod-7/kube-controller-manager-pod.y
139139
starting kube-scheduler-pod.yaml
140140
static-pod-resources/kube-scheduler-pod-8/kube-scheduler-pod.yaml
141141
----
142+
+
143+
[NOTE]
144+
====
145+
The restore process can cause nodes to enter the `NotReady` state if the node certificates were updated after the last etcd backup.
146+
====
147+
148+
. Check the nodes to ensure they are in the `Ready` state.
149+
150+
.. Run the following command:
151+
+
152+
[source,terminal]
153+
----
154+
$ oc get nodes -w
155+
----
156+
+
157+
.Sample output
158+
[source,terminal]
159+
----
160+
NAME STATUS ROLES AGE VERSION
161+
host-172-25-75-28 Ready master 3d20h v1.23.3+e419edf
162+
host-172-25-75-38 Ready infra,worker 3d20h v1.23.3+e419edf
163+
host-172-25-75-40 Ready master 3d20h v1.23.3+e419edf
164+
host-172-25-75-65 Ready master 3d20h v1.23.3+e419edf
165+
host-172-25-75-74 Ready infra,worker 3d20h v1.23.3+e419edf
166+
host-172-25-75-79 Ready worker 3d20h v1.23.3+e419edf
167+
host-172-25-75-86 Ready worker 3d20h v1.23.3+e419edf
168+
host-172-25-75-98 Ready infra,worker 3d20h v1.23.3+e419edf
169+
----
170+
+
171+
It can take several minutes for all nodes to report their state.
172+
173+
.. If any nodes are in the `NotReady` state, log in to the nodes and remove all of the PEM files from the `/var/lib/kubelet/pki` directory on each node. You can SSH into the nodes or use the terminal window in the web console.
174+
+
175+
[source,terminal]
176+
----
177+
$ ssh -i <ssh-key-path> core@<master-hostname>
178+
----
179+
+
180+
.Sample `pki` directory
181+
[sample,terminal]
182+
----
183+
sh-4.4# pwd
184+
/var/lib/kubelet/pki
185+
sh-4.4# ls
186+
kubelet-client-2022-04-28-11-24-09.pem kubelet-server-2022-04-28-11-24-15.pem
187+
kubelet-client-current.pem kubelet-server-current.pem
188+
----
142189

143190
. Restart the kubelet service on all control plane hosts.
144191

0 commit comments

Comments
 (0)