Skip to content

Commit 40e99e6

Browse files
borballopenshift-cherrypick-robot
authored andcommitted
fix issue#41216
1 parent e5086d0 commit 40e99e6

File tree

1 file changed

+57
-2
lines changed

1 file changed

+57
-2
lines changed

modules/nodes-nodes-rebooting-gracefully.adoc

Lines changed: 57 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,13 @@
88

99
Before rebooting a node, it is recommended to backup etcd data to avoid any data loss on the node.
1010

11+
[NOTE]
12+
====
13+
For Single Node OpenShift (SNO) clusters that require users to perform the `oc login` command rather than having the certificates in `kubeconfig` file to manage the cluster, the `oc adm` commands might not be available after cordoning and draining the node. This is because the `openshift-oauth-apiserver` pod is not running due to the cordon. You can use SSH to access the nodes as indicated in the following procedure.
14+
15+
In an SNO cluster, pods cannot be rescheduled when cordoning and draining. However, doing so gives the pods, especially your workload pods, time to properly stop and release associated resources.
16+
====
17+
1118
.Procedure
1219

1320
To perform a graceful restart of a node:
@@ -23,7 +30,22 @@ $ oc adm cordon <node1>
2330
+
2431
[source,terminal]
2532
----
26-
$ oc adm drain <node1> --ignore-daemonsets --delete-emptydir-data
33+
$ oc adm drain <node1> --ignore-daemonsets --delete-emptydir-data --force
34+
----
35+
+
36+
You might receive errors that pods associated with custom pod disruption budgets (PDB) cannot be evicted.
37+
+
38+
.Example error
39+
[source,terminal]
40+
----
41+
error when evicting pods/"rails-postgresql-example-1-72v2w" -n "rails" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget.
42+
----
43+
+
44+
In this case, run the drain command again, adding the `disable-eviction` flag, which bypasses the PDB checks:
45+
+
46+
[source,terminal]
47+
----
48+
$ oc adm drain <node1> --ignore-daemonsets --delete-emptydir-data --force --disable-eviction
2749
----
2850

2951
. Access the node in debug mode:
@@ -48,13 +70,43 @@ $ systemctl reboot
4870
----
4971
+
5072
In a moment, the node enters the `NotReady` state.
73+
+
74+
[NOTE]
75+
====
76+
With some SNO clusters, the `oc` commands might not be available after you cordon and drain the node because the `openshift-oauth-apiserver` pod is not running. You can use SSH to connect to the node and perform the reboot.
77+
78+
[source,terminal]
79+
----
80+
$ ssh core@<master-node>.<cluster_name>.<base_domain>
81+
----
82+
83+
[source,terminal]
84+
----
85+
$ sudo systemctl reboot
86+
----
87+
====
5188
52-
. Mark the node as schedulable after the reboot is complete:
89+
. After the reboot is complete, mark the node as schedulable by running the following command:
5390
+
5491
[source,terminal]
5592
----
5693
$ oc adm uncordon <node1>
5794
----
95+
+
96+
[NOTE]
97+
====
98+
With some SNO clusters, the `oc` commands might not be available after you cordon and drain the node because the `openshift-oauth-apiserver` pod is not running. You can use SSH to connect to the node and uncordon it.
99+
100+
[source,terminal]
101+
----
102+
$ ssh core@<target_node>
103+
----
104+
105+
[source,terminal]
106+
----
107+
$ sudo oc adm uncordon <node> --kubeconfig /etc/kubernetes/static-pod-resources/kube-apiserver-certs/secrets/node-kubeconfigs/localhost.kubeconfig
108+
----
109+
====
58110
59111
. Verify that the node is ready:
60112
+
@@ -69,3 +121,6 @@ $ oc get node <node1>
69121
NAME STATUS ROLES AGE VERSION
70122
<node1> Ready worker 6d22h v1.18.3+b0068a8
71123
----
124+
125+
. If you undeployed any applications in the previous step, revert the changes.
126+

0 commit comments

Comments
 (0)