Merge pull request #36639 from chinmayi-chandrasekar/RFE1932_node_reboot_steps

Bob Furu · web-flow · commit ad4a4bf2d6ab · 2021-10-04T10:38:06.000-04:00
diff --git a/modules/nodes-nodes-rebooting-affinity.adoc b/modules/nodes-nodes-rebooting-affinity.adoc
@@ -10,10 +10,10 @@ violated if there are no other suitable locations to deploy a pod. Pod
 anti-affinity can be set to either required or preferred.
 
 With this in place, if only two infrastructure nodes are available and one is rebooted, the container image registry
-pod is prevented from running on the other node. `*oc get pods*` reports the pod as unready until a suitable node is available. 
+pod is prevented from running on the other node. `*oc get pods*` reports the pod as unready until a suitable node is available.
 Once a node is available and all pods are back in ready state, the next node can be restarted.
 
-.Procedure 
+.Procedure
 
 To reboot a node using pod anti-affinity:
 
@@ -35,7 +35,7 @@ spec:
             matchExpressions:
             - key: registry <4>
               operator: In <5>
-              values: 
+              values:
               - default
           topologyKey: kubernetes.io/hostname
 ----
@@ -49,5 +49,5 @@ This example assumes the container image registry pod has a label of
 `registry=default`. Pod anti-affinity can use any Kubernetes match
 expression.
 
-. Enable the `MatchInterPodAffinity` scheduler predicate in the scheduling policy file. 
-
+. Enable the `MatchInterPodAffinity` scheduler predicate in the scheduling policy file.
+. Perform a graceful restart of the node.
diff --git a/modules/nodes-nodes-rebooting-gracefully.adoc b/modules/nodes-nodes-rebooting-gracefully.adoc
@@ -0,0 +1,61 @@
+// Module included in the following assemblies:
+//
+// * nodes/nodes-nodes-rebooting.adoc
+
+[id="nodes-nodes-rebooting-gracefully_{context}"]
+= Rebooting a node gracefully
+
+Before rebooting a node, it is recommended to backup etcd data to avoid any data loss on the node.
+
+.Procedure
+
+To perform a graceful restart of a node:
+
+. Mark the node as unschedulable:
++
+[source,terminal]
+----
+$ oc adm cordon <node1>
+----
++
+. Drain the node to remove all the running pods:
++
+[source,terminal]
+----
+$ oc adm drain <node1> --ignore-daemonsets --delete-emptydir-data
+----
++
+. Access the node in debug mode:
++
+[source,terminal]
+----
+$ oc debug node/<node1>
+----
++
+. Restart the node:
++
+[source,terminal]
+----
+$ systemctl reboot
+----
++
+. Mark the node as schedulable after the reboot is complete:
++
+[source,terminal]
+----
+$ oc adm uncordon <node1>
+----
++
+. Verify that the node is ready:
++
+[source,terminal]
+----
+$ oc get node <node1>
+----
++
+.Example output
+[source,terminal]
+----
+NAME    STATUS  ROLES    AGE     VERSION
+<node1> Ready   worker   6d22h   v1.18.3+b0068a8
+----
diff --git a/nodes/nodes/nodes-nodes-rebooting.adoc b/nodes/nodes/nodes-nodes-rebooting.adoc
@@ -9,7 +9,7 @@ toc::[]
 
 
 To reboot a node without causing an outage for applications running on the
-platform, it is important to first evacuate the pods. For pods that are 
+platform, it is important to first evacuate the pods. For pods that are
 made highly available by the routing tier, nothing
 else needs to be done. For other pods needing storage, typically databases, it
 is critical to ensure that they can remain in operation with one pod
@@ -37,3 +37,8 @@ include::modules/nodes-nodes-rebooting-affinity.adoc[leveloffset=+1]
 
 include::modules/nodes-nodes-rebooting-router.adoc[leveloffset=+1]
 
+include::modules/nodes-nodes-rebooting-gracefully.adoc[leveloffset=+1]
+
+.Additional information
+
+For information on etcd data backup, see xref:../../backup_and_restore/backing-up-etcd.adoc#backup-etcd[Backing up etcd data].