fix issue#41216

borball · openshift-cherrypick-robot · commit 40e99e6a303c · 2022-06-17T19:58:16.000Z
diff --git a/modules/nodes-nodes-rebooting-gracefully.adoc b/modules/nodes-nodes-rebooting-gracefully.adoc
@@ -8,6 +8,13 @@
 
 Before rebooting a node, it is recommended to backup etcd data to avoid any data loss on the node.
 
+[NOTE]
+====
+For Single Node OpenShift (SNO) clusters that require users to perform the `oc login` command rather than having the certificates in `kubeconfig` file to manage the cluster, the `oc adm` commands might not be available after cordoning and draining the node. This is because the `openshift-oauth-apiserver` pod is not running due to the cordon. You can use SSH to access the nodes as indicated in the following procedure.
+
+In an SNO cluster, pods cannot be rescheduled when cordoning and draining. However, doing so gives the pods, especially your workload pods, time to properly stop and release associated resources.
+==== 
+
 .Procedure
 
 To perform a graceful restart of a node:
@@ -23,7 +30,22 @@ $ oc adm cordon <node1>
 +
 [source,terminal]
 ----
-$ oc adm drain <node1> --ignore-daemonsets --delete-emptydir-data
+$ oc adm drain <node1> --ignore-daemonsets --delete-emptydir-data --force
+----
++
+You might receive errors that pods associated with custom pod disruption budgets (PDB) cannot be evicted.
++
+.Example error
+[source,terminal]
+----
+error when evicting pods/"rails-postgresql-example-1-72v2w" -n "rails" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget.
+----
++
+In this case, run the drain command again, adding the `disable-eviction` flag, which bypasses the PDB checks:
++
+[source,terminal]
+----
+$ oc adm drain <node1> --ignore-daemonsets --delete-emptydir-data --force --disable-eviction 
 ----
 
 . Access the node in debug mode:
@@ -48,13 +70,43 @@ $ systemctl reboot
 ----
 +
 In a moment, the node enters the `NotReady` state.
++
+[NOTE]
+====
+With some SNO clusters, the `oc` commands might not be available after you cordon and drain the node because the `openshift-oauth-apiserver` pod is not running. You can use SSH to connect to the node and perform the reboot.
+
+[source,terminal]
+----
+$ ssh core@<master-node>.<cluster_name>.<base_domain>
+----
+
+[source,terminal]
+----
+$ sudo systemctl reboot
+----
+====
 
-. Mark the node as schedulable after the reboot is complete:
+. After the reboot is complete, mark the node as schedulable by running the following command:
 +
 [source,terminal]
 ----
 $ oc adm uncordon <node1>
 ----
++
+[NOTE]
+====
+With some SNO clusters, the `oc` commands might not be available after you cordon and drain the node because the `openshift-oauth-apiserver` pod is not running. You can use SSH to connect to the node and uncordon it.
+
+[source,terminal]
+----
+$ ssh core@<target_node>
+----
+
+[source,terminal]
+----
+$ sudo oc adm uncordon <node> --kubeconfig /etc/kubernetes/static-pod-resources/kube-apiserver-certs/secrets/node-kubeconfigs/localhost.kubeconfig
+----
+====
 
 . Verify that the node is ready:
 +
@@ -69,3 +121,6 @@ $ oc get node <node1>
 NAME    STATUS  ROLES    AGE     VERSION
 <node1> Ready   worker   6d22h   v1.18.3+b0068a8
 ----
+
+. If you undeployed any applications in the previous step, revert the changes.
+