Skip to content

Commit ad4a4bf

Browse files
author
Bob Furu
authored
Merge pull request #36639 from chinmayi-chandrasekar/RFE1932_node_reboot_steps
2 parents 7c42e4b + 5e3ceed commit ad4a4bf

File tree

3 files changed

+72
-6
lines changed

3 files changed

+72
-6
lines changed

modules/nodes-nodes-rebooting-affinity.adoc

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -10,10 +10,10 @@ violated if there are no other suitable locations to deploy a pod. Pod
1010
anti-affinity can be set to either required or preferred.
1111

1212
With this in place, if only two infrastructure nodes are available and one is rebooted, the container image registry
13-
pod is prevented from running on the other node. `*oc get pods*` reports the pod as unready until a suitable node is available.
13+
pod is prevented from running on the other node. `*oc get pods*` reports the pod as unready until a suitable node is available.
1414
Once a node is available and all pods are back in ready state, the next node can be restarted.
1515

16-
.Procedure
16+
.Procedure
1717

1818
To reboot a node using pod anti-affinity:
1919

@@ -35,7 +35,7 @@ spec:
3535
matchExpressions:
3636
- key: registry <4>
3737
operator: In <5>
38-
values:
38+
values:
3939
- default
4040
topologyKey: kubernetes.io/hostname
4141
----
@@ -49,5 +49,5 @@ This example assumes the container image registry pod has a label of
4949
`registry=default`. Pod anti-affinity can use any Kubernetes match
5050
expression.
5151

52-
. Enable the `MatchInterPodAffinity` scheduler predicate in the scheduling policy file.
53-
52+
. Enable the `MatchInterPodAffinity` scheduler predicate in the scheduling policy file.
53+
. Perform a graceful restart of the node.
Lines changed: 61 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,61 @@
1+
// Module included in the following assemblies:
2+
//
3+
// * nodes/nodes-nodes-rebooting.adoc
4+
5+
[id="nodes-nodes-rebooting-gracefully_{context}"]
6+
= Rebooting a node gracefully
7+
8+
Before rebooting a node, it is recommended to backup etcd data to avoid any data loss on the node.
9+
10+
.Procedure
11+
12+
To perform a graceful restart of a node:
13+
14+
. Mark the node as unschedulable:
15+
+
16+
[source,terminal]
17+
----
18+
$ oc adm cordon <node1>
19+
----
20+
+
21+
. Drain the node to remove all the running pods:
22+
+
23+
[source,terminal]
24+
----
25+
$ oc adm drain <node1> --ignore-daemonsets --delete-emptydir-data
26+
----
27+
+
28+
. Access the node in debug mode:
29+
+
30+
[source,terminal]
31+
----
32+
$ oc debug node/<node1>
33+
----
34+
+
35+
. Restart the node:
36+
+
37+
[source,terminal]
38+
----
39+
$ systemctl reboot
40+
----
41+
+
42+
. Mark the node as schedulable after the reboot is complete:
43+
+
44+
[source,terminal]
45+
----
46+
$ oc adm uncordon <node1>
47+
----
48+
+
49+
. Verify that the node is ready:
50+
+
51+
[source,terminal]
52+
----
53+
$ oc get node <node1>
54+
----
55+
+
56+
.Example output
57+
[source,terminal]
58+
----
59+
NAME STATUS ROLES AGE VERSION
60+
<node1> Ready worker 6d22h v1.18.3+b0068a8
61+
----

nodes/nodes/nodes-nodes-rebooting.adoc

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ toc::[]
99

1010

1111
To reboot a node without causing an outage for applications running on the
12-
platform, it is important to first evacuate the pods. For pods that are
12+
platform, it is important to first evacuate the pods. For pods that are
1313
made highly available by the routing tier, nothing
1414
else needs to be done. For other pods needing storage, typically databases, it
1515
is critical to ensure that they can remain in operation with one pod
@@ -37,3 +37,8 @@ include::modules/nodes-nodes-rebooting-affinity.adoc[leveloffset=+1]
3737

3838
include::modules/nodes-nodes-rebooting-router.adoc[leveloffset=+1]
3939

40+
include::modules/nodes-nodes-rebooting-gracefully.adoc[leveloffset=+1]
41+
42+
.Additional information
43+
44+
For information on etcd data backup, see xref:../../backup_and_restore/backing-up-etcd.adoc#backup-etcd[Backing up etcd data].

0 commit comments

Comments
 (0)