Skip to content

Commit 27091dd

Browse files
authored
Merge pull request #90359 from openshift-cherrypick-robot/cherry-pick-90079-to-enterprise-4.16
[enterprise-4.16] [OCPBUGS-49434]: Revising etcd restore procedure
2 parents 4d57f25 + bb58eac commit 27091dd

File tree

1 file changed

+71
-0
lines changed

1 file changed

+71
-0
lines changed

modules/dr-restoring-cluster-state.adoc

Lines changed: 71 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -413,6 +413,77 @@ $ oc -n openshift-ovn-kubernetes delete pod -l app=ovnkube-node --field-selector
413413
----
414414
+
415415
416+
.. Check the status of the OVN pods by running the following command:
417+
+
418+
[source,terminal]
419+
----
420+
$ oc get po -n openshift-ovn-kubernetes
421+
----
422+
+
423+
424+
... If any OVN pods are in the `Terminating` status, delete the node that is running that OVN pod by running the following command. Replace `<node>` with the name of the node you are deleting:
425+
+
426+
[source,terminal]
427+
----
428+
$ oc delete node <node>
429+
----
430+
+
431+
432+
... Use SSH to log in to the OVN pod node with the `Terminating` status by running the following command:
433+
+
434+
[source,terminal]
435+
----
436+
$ ssh -i <ssh-key-path> core@<node>
437+
----
438+
+
439+
440+
... Move all PEM files from the `/var/lib/kubelet/pki` directory by running the following command:
441+
+
442+
[source,terminal]
443+
----
444+
$ sudo mv /var/lib/kubelet/pki/* /tmp
445+
----
446+
+
447+
448+
... Restart the kubelet service by running the following command:
449+
+
450+
[source,terminal]
451+
----
452+
$ sudo systemctl restart kubelet.service
453+
----
454+
+
455+
456+
... Return to the recovery etcd machines by running the following command:
457+
+
458+
[source,terminal]
459+
----
460+
$ oc get csr
461+
----
462+
+
463+
.Example output
464+
+
465+
[source,terminal]
466+
----
467+
NAME AGE SIGNERNAME REQUESTOR CONDITION
468+
csr-<uuid> 8m3s kubernetes.io/kubelet-serving system:node:<node_name> Pending
469+
----
470+
471+
... Approve all new CSRs by running the following command, replacing `csr-<uuid>` with the name of the CSR:
472+
+
473+
[source,terminal]
474+
----
475+
oc adm certificate approve csr-<uuid>
476+
----
477+
+
478+
479+
... Verify that the node is back by running the following command:
480+
+
481+
[source,terminal]
482+
----
483+
$ oc get nodes
484+
----
485+
+
486+
416487
.. Verify that the `ovnkube-node` pod is running again with:
417488
+
418489
[source,terminal]

0 commit comments

Comments
 (0)