Skip to content

Commit 6bb16f3

Browse files
tmulqueerohennes
authored andcommitted
TELCODOCS-903: graceful node shutodwn - taking over from Tony
1 parent 44e8b8e commit 6bb16f3

File tree

4 files changed

+190
-0
lines changed

4 files changed

+190
-0
lines changed

_topic_maps/_topic_map.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2209,6 +2209,8 @@ Topics:
22092209
File: nodes-nodes-working
22102210
- Name: Managing nodes
22112211
File: nodes-nodes-managing
2212+
- Name: Managing graceful node shutdown
2213+
File: nodes-nodes-graceful-shutdown
22122214
- Name: Managing the maximum number of pods per node
22132215
File: nodes-nodes-managing-max-pods
22142216
- Name: Using the Node Tuning Operator
Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,25 @@
1+
// Module included in the following assembly:
2+
// * nodes/nodes-nodes-graceful-shutdown
3+
4+
:_content-type: CONCEPT
5+
[id="nodes-nodes-cluster-timeout-graceful-shutdown_{context}"]
6+
= About graceful node shutdown
7+
8+
During a graceful node shutdown, the kubelet sends a termination signal to pods running on the node and postpones the node shutdown until all the pods evicted. If a node unexpectedly shuts down, the graceful node shutdown feature minimizes interruption to workloads running on these pods.
9+
10+
During a graceful node shutdown, the kubelet stops pods in two phases:
11+
12+
* Regular pod termination
13+
* Critical pod termination
14+
15+
You can define shutdown grace periods for regular and critical pods by configuring the following specifications in the `KubeletConfig` custom resource:
16+
17+
* `shutdownGracePeriod`: Specifies the total duration for pod termination for regular and critical pods.
18+
* `shutdownGracePeriodCriticalPods`: Specifies the duration for critical pod termination. This value must be less than the `shutdownGracePeriod` value.
19+
20+
For example, if the `shutdownGracePeriod` value is `30s`, and the `shutdownGracePeriodCriticalPods` value is `10s`, the kubelet delays the node shutdown by 30 seconds. During the shutdown, the first 20 (30-10) seconds are reserved for gracefully shutting down regular pods, and the last 10 seconds are reserved for gracefully shutting down critical pods.
21+
22+
To define a critical pod, assign a pod priority value greater than or equal to `2000000000`. To define a regular pod, assign a pod priority value of less than `2000000000`.
23+
24+
For more information about how to define a priority value for pods, see the _Additional resources_ section.
25+
Lines changed: 143 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,143 @@
1+
// Module included in the following assembly:
2+
// * nodes/nodes-nodes-graceful-shutdown
3+
4+
:_content-type: PROCEDURE
5+
[id="nodes-nodes-activating-graceful-shutdown_{context}"]
6+
= Configuring graceful node shutdown
7+
8+
To configure graceful node shutdown, create a `KubeletConfig` custom resource (CR) to specify a shutdown grace period for pods on a set of nodes. The graceful node shutdown feature minimizes interruption to workloads that run on these pods.
9+
10+
[NOTE]
11+
====
12+
If you do not configure graceful node shutdown, the default grace period is `0` and the pod is forcefully evicted from the node.
13+
====
14+
15+
.Prerequisites
16+
17+
* You have access to the cluster with the `cluster-admin` role.
18+
* You have defined priority classes for pods that require critical or regular classification.
19+
20+
.Procedure
21+
22+
. Define shutdown grace periods in the `KubeletConfig` CR by saving the following YAML in the `kubelet-gns.yaml` file:
23+
+
24+
[source,yaml]
25+
----
26+
apiVersion: machineconfiguration.openshift.io/v1
27+
kind: KubeletConfig
28+
metadata:
29+
name: graceful-shutdown
30+
namespace: openshift-machine-config-operator
31+
spec:
32+
machineConfigPoolSelector:
33+
matchLabels:
34+
pools.operator.machineconfiguration.openshift.io/worker: "" <1>
35+
kubeletConfig:
36+
shutdownGracePeriod: "3m" <2>
37+
shutdownGracePeriodCriticalPods: "2m" <3>
38+
----
39+
<1> This example applies shutdown grace periods to nodes with the `worker` role.
40+
<2> Define a time period for regular pods to shut down.
41+
<3> Define a time period for critical pods to shut down.
42+
43+
. Create the `KubeletConfig` CR by running the following command:
44+
+
45+
[source,terminal]
46+
----
47+
$ oc create -f kubelet-gns.yaml
48+
----
49+
+
50+
.Example output
51+
[source,terminal]
52+
----
53+
kubeletconfig.machineconfiguration.openshift.io/graceful-shutdown created
54+
----
55+
56+
.Verification
57+
58+
. View the kubelet logs for a node to verify the grace period configuration by using the command line or by viewing the `kublet.conf` file.
59+
+
60+
[NOTE]
61+
====
62+
Ensure that the log messages for `shutdownGracePeriodRequested` and `shutdownGracePeriodCriticalPods` match the values set in the `KubeletConfig` CR.
63+
====
64+
65+
.. To view the logs by using the command line, run the following command, replacing `<node_name>` with the name of the node:
66+
+
67+
[source,bash]
68+
----
69+
$ oc adm node-logs <node_name> -u kubelet
70+
----
71+
+
72+
.Example output
73+
[source,terminal]
74+
----
75+
Sep 12 22:13:46
76+
ci-ln-qv5pvzk-72292-xvkd9-worker-a-dmbr4
77+
hyperkube[22317]: I0912 22:13:46.687472
78+
22317 nodeshutdown_manager_linux.go:134]
79+
"Creating node shutdown manager"
80+
shutdownGracePeriodRequested="3m0s" <1>
81+
shutdownGracePeriodCriticalPods="2m0s"
82+
shutdownGracePeriodByPodPriority=[
83+
{Priority:0
84+
ShutdownGracePeriodSeconds:1200}
85+
{Priority:2000000000
86+
ShutdownGracePeriodSeconds:600}]
87+
...
88+
----
89+
+
90+
<1> Ensure that the log messages for `shutdownGracePeriodRequested` and `shutdownGracePeriodCriticalPods` match the values set in the `KubeletConfig` CR.
91+
+
92+
.. To view the logs in the `kublet.conf` file on a node, run the following commands to enter a debug session on the node:
93+
+
94+
[source,terminal]
95+
----
96+
$ oc debug node/<node_name>
97+
----
98+
+
99+
[source,terminal]
100+
----
101+
$ chroot /host
102+
----
103+
+
104+
[source,terminal]
105+
----
106+
$ cat /etc/kubernetes/kubelet.conf
107+
----
108+
+
109+
.Example output
110+
[source,terminal]
111+
----
112+
...
113+
“memorySwap”: {},
114+
“containerLogMaxSize”: “50Mi”,
115+
“logging”: {
116+
“flushFrequency”: 0,
117+
“verbosity”: 0,
118+
“options”: {
119+
“json”: {
120+
“infoBufferSize”: “0”
121+
}
122+
}
123+
},
124+
“shutdownGracePeriod”: “10m0s”, <1>
125+
“shutdownGracePeriodCriticalPods”: “3m0s”
126+
}
127+
----
128+
+
129+
<1> Ensure that the log messages for `shutdownGracePeriodRequested` and `shutdownGracePeriodCriticalPods` match the values set in the `KubeletConfig` CR.
130+
131+
. During a graceful node shutdown, you can verify that a pod was gracefully shut down by running the following command, replacing `<pod_name>` with the name of the pod:
132+
+
133+
[source,terminal]
134+
----
135+
$ oc describe pod <pod_name>
136+
----
137+
+
138+
.Example output
139+
[source,terminal]
140+
----
141+
Reason: Terminated
142+
Message: Pod was terminated in response to imminent node shutdown.
143+
----
Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
:_content-type: ASSEMBLY
2+
[id="nodes-nodes-graceful-shutdown"]
3+
= Managing graceful node shutdown
4+
include::_attributes/common-attributes.adoc[]
5+
:context: nodes-nodes-graceful-shutdown
6+
7+
toc::[]
8+
9+
Graceful node shutdown enables the kubelet to delay forcible eviction of pods during a node shutdown. When you configure a graceful node shutdown, you can define a time period for pods to complete running workloads before shutting down. This grace period minimizes interruption to critical workloads during unexpected node shutdown events. Using priority classes, you can also specify the order of pod shutdown.
10+
11+
// concept topic: how it works
12+
include::modules/nodes-nodes-cluster-timeout-graceful-shutdown.adoc[leveloffset=+1]
13+
14+
// procedure topic: configuring Graceful node shutdown
15+
include::modules/nodes-nodes-configuring-graceful-shutdown.adoc[leveloffset=+1]
16+
17+
[role="_additional-resources"]
18+
.Additional resources
19+
20+
* xref:../../nodes/pods/nodes-pods-priority.adoc#nodes-pods-priority[Understanding pod priority]

0 commit comments

Comments
 (0)