@@ -5,25 +5,27 @@ weight: 10
5
5
---
6
6
7
7
<!-- overview -->
8
+
8
9
In a Kubernetes cluster, a {{< glossary_tooltip text="node" term_id="node" >}}
9
- can be shutdown in a planned graceful way or unexpectedly because of reasons such
10
+ can be shut down in a planned graceful way or unexpectedly because of reasons such
10
11
as a power outage or something else external. A node shutdown could lead to workload
11
12
failure if the node is not drained before the shutdown. A node shutdown can be
12
13
either ** graceful** or ** non-graceful** .
13
14
14
15
<!-- body -->
16
+
15
17
## Graceful node shutdown {#graceful-node-shutdown}
16
18
17
19
{{< feature-state feature_gate_name="GracefulNodeShutdown" >}}
18
20
19
21
The kubelet attempts to detect node system shutdown and terminates pods running on the node.
20
22
21
- Kubelet ensures that pods follow the normal
23
+ kubelet ensures that pods follow the normal
22
24
[ pod termination process] ( /docs/concepts/workloads/pods/pod-lifecycle/#pod-termination )
23
25
during the node shutdown. During node shutdown, the kubelet does not accept new
24
26
Pods (even if those Pods are already bound to the node).
25
27
26
- The Graceful node shutdown feature depends on systemd since it takes advantage of
28
+ The graceful node shutdown feature depends on systemd since it takes advantage of
27
29
[ systemd inhibitor locks] ( https://www.freedesktop.org/wiki/Software/systemd/inhibit/ ) to
28
30
delay the node shutdown with a given duration.
29
31
@@ -32,12 +34,12 @@ Graceful node shutdown is controlled with the `GracefulNodeShutdown`
32
34
enabled by default in 1.21.
33
35
34
36
Note that by default, both configuration options described below,
35
- ` shutdownGracePeriod ` and ` shutdownGracePeriodCriticalPods ` are set to zero,
37
+ ` shutdownGracePeriod ` and ` shutdownGracePeriodCriticalPods ` , are set to zero,
36
38
thus not activating the graceful node shutdown functionality.
37
- To activate the feature, the two kubelet config settings should be configured appropriately and
39
+ To activate the feature, both options should be configured appropriately and
38
40
set to non-zero values.
39
41
40
- Once systemd detects or notifies node shutdown, the kubelet sets a ` NotReady ` condition on
42
+ Once systemd detects or is notified of a node shutdown, the kubelet sets a ` NotReady ` condition on
41
43
the Node, with the ` reason ` set to ` "node is shutting down" ` . The kube-scheduler honors this condition
42
44
and does not schedule any Pods onto the affected node; other third-party schedulers are
43
45
expected to follow the same logic. This means that new Pods won't be scheduled onto that node
@@ -48,26 +50,29 @@ node shutdown has been detected, so that even Pods with a
48
50
{{< glossary_tooltip text="toleration" term_id="toleration" >}} for
49
51
` node.kubernetes.io/not-ready:NoSchedule ` do not start there.
50
52
51
- At the same time when kubelet is setting that condition on its Node via the API,
53
+ When kubelet is setting that condition on its Node via the API,
52
54
the kubelet also begins terminating any Pods that are running locally.
53
55
54
56
During a graceful shutdown, kubelet terminates pods in two phases:
55
57
56
58
1 . Terminate regular pods running on the node.
57
- 2 . Terminate [ critical pods] ( /docs/tasks/administer-cluster/guaranteed-scheduling-critical-addon-pods/#marking-pod-as-critical )
59
+ 1 . Terminate [ critical pods] ( /docs/tasks/administer-cluster/guaranteed-scheduling-critical-addon-pods/#marking-pod-as-critical )
58
60
running on the node.
59
61
60
- Graceful node shutdown feature is configured with two
62
+ The graceful node shutdown feature is configured with two
61
63
[ ` KubeletConfiguration ` ] ( /docs/tasks/administer-cluster/kubelet-config-file/ ) options:
62
64
63
- * ` shutdownGracePeriod ` :
64
- * Specifies the total duration that the node should delay the shutdown by. This is the total
65
- grace period for pod termination for both regular and
66
- [ critical pods] ( /docs/tasks/administer-cluster/guaranteed-scheduling-critical-addon-pods/#marking-pod-as-critical ) .
67
- * ` shutdownGracePeriodCriticalPods ` :
68
- * Specifies the duration used to terminate
69
- [ critical pods] ( /docs/tasks/administer-cluster/guaranteed-scheduling-critical-addon-pods/#marking-pod-as-critical )
70
- during a node shutdown. This value should be less than ` shutdownGracePeriod ` .
65
+ - ` shutdownGracePeriod ` :
66
+
67
+ Specifies the total duration that the node should delay the shutdown by. This is the total
68
+ grace period for pod termination for both regular and
69
+ [ critical pods] ( /docs/tasks/administer-cluster/guaranteed-scheduling-critical-addon-pods/#marking-pod-as-critical ) .
70
+
71
+ - ` shutdownGracePeriodCriticalPods ` :
72
+
73
+ Specifies the duration used to terminate
74
+ [ critical pods] ( /docs/tasks/administer-cluster/guaranteed-scheduling-critical-addon-pods/#marking-pod-as-critical )
75
+ during a node shutdown. This value should be less than ` shutdownGracePeriod ` .
71
76
72
77
{{< note >}}
73
78
@@ -122,22 +127,22 @@ Assuming the following custom pod
122
127
[ priority classes] ( /docs/concepts/scheduling-eviction/pod-priority-preemption/#priorityclass )
123
128
in a cluster,
124
129
125
- | Pod priority class name| Pod priority class value|
126
- | ------------------------- | ------------------------|
127
- | ` custom-class-a ` | 100000 |
128
- | ` custom-class-b ` | 10000 |
129
- | ` custom-class-c ` | 1000 |
130
- | ` regular/unset ` | 0 |
130
+ | Pod priority class name | Pod priority class value |
131
+ | ----------------------- | ------------------------ |
132
+ | ` custom-class-a ` | 100000 |
133
+ | ` custom-class-b ` | 10000 |
134
+ | ` custom-class-c ` | 1000 |
135
+ | ` regular/unset ` | 0 |
131
136
132
137
Within the [ kubelet configuration] ( /docs/reference/config-api/kubelet-config.v1beta1/ )
133
138
the settings for ` shutdownGracePeriodByPodPriority ` could look like:
134
139
135
- | Pod priority class value| Shutdown period|
136
- | ------------------------| ---------------|
137
- | 100000 | 10 seconds |
138
- | 10000 | 180 seconds |
139
- | 1000 | 120 seconds |
140
- | 0 | 60 seconds |
140
+ | Pod priority class value | Shutdown period |
141
+ | ------------------------ | --------------- |
142
+ | 100000 | 10 seconds |
143
+ | 10000 | 180 seconds |
144
+ | 1000 | 120 seconds |
145
+ | 0 | 60 seconds |
141
146
142
147
The corresponding kubelet config YAML configuration would be:
143
148
@@ -154,18 +159,18 @@ shutdownGracePeriodByPodPriority:
154
159
` ` `
155
160
156
161
The above table implies that any pod with ` priority` value >= 100000 will get
157
- just 10 seconds to stop , any pod with value >= 10000 and < 100000 will get 180
158
- seconds to stop , any pod with value >= 1000 and < 10000 will get 120 seconds to stop .
159
- Finally, all other pods will get 60 seconds to stop .
162
+ just 10 seconds to shut down , any pod with value >= 10000 and < 100000 will get 180
163
+ seconds to shut down , any pod with value >= 1000 and < 10000 will get 120 seconds to shut down .
164
+ Finally, all other pods will get 60 seconds to shut down .
160
165
161
166
One doesn't have to specify values corresponding to all of the classes. For
162
167
example, you could instead use these settings :
163
168
164
- |Pod priority class value| Shutdown period|
165
- |------------------------| ---------------|
166
- | 100000 | 300 seconds |
167
- | 1000 | 120 seconds |
168
- | 0 | 60 seconds |
169
+ | Pod priority class value | Shutdown period |
170
+ | ------------------------ | --------------- |
171
+ | 100000 | 300 seconds |
172
+ | 1000 | 120 seconds |
173
+ | 0 | 60 seconds |
169
174
170
175
In the above case, the pods with `custom-class-b` will go into the same bucket
171
176
as `custom-class-c` for shutdown.
@@ -225,14 +230,16 @@ on a different node.
225
230
During a non-graceful shutdown, Pods are terminated in the two phases :
226
231
227
232
1. Force delete the Pods that do not have matching `out-of-service` tolerations.
228
- 2 . Immediately perform detach volume operation for such pods.
233
+ 1 . Immediately perform detach volume operation for such pods.
229
234
230
235
{{< note >}}
236
+
231
237
- Before adding the taint `node.kubernetes.io/out-of-service`, it should be verified
232
238
that the node is already in shutdown or power off state (not in the middle of restarting).
233
239
- The user is required to manually remove the out-of-service taint after the pods are
234
240
moved to a new node and the user has checked that the shutdown node has been
235
241
recovered since the user was the one who originally added the taint.
242
+
236
243
{{< /note >}}
237
244
238
245
# ## Forced storage detach on timeout {#storage-force-detach-on-timeout}
@@ -256,39 +263,41 @@ its associated
256
263
[VolumeAttachment](/docs/reference/kubernetes-api/config-and-storage-resources/volume-attachment-v1/)
257
264
deleted.
258
265
259
- After this setting has been applied, unhealthy pods still attached to a volumes must be recovered
266
+ After this setting has been applied, unhealthy pods still attached to volumes must be recovered
260
267
via the [Non-Graceful Node Shutdown](#non-graceful-node-shutdown) procedure mentioned above.
261
268
262
269
{{< note >}}
270
+
263
271
- Caution must be taken while using the [Non-Graceful Node Shutdown](#non-graceful-node-shutdown) procedure.
264
272
- Deviation from the steps documented above can result in data corruption.
265
- {{< /note >}}
266
273
274
+ {{< /note >}}
267
275
268
276
# # Windows Graceful node shutdown {#windows-graceful-node-shutdown}
269
277
270
278
{{< feature-state feature_gate_name="WindowsGracefulNodeShutdown" >}}
271
279
272
- The Windows graceful node shutdown feature depends on kubelet running as a Windows service,
273
- it will then have a registered [service control handler](https://learn.microsoft.com/en-us/windows/win32/services/service-control-handler-function)
274
- to delay the presshutdown event with a given duration.
280
+ The Windows graceful node shutdown feature depends on kubelet running as a Windows service,
281
+ it will then have a registered [service control handler](https://learn.microsoft.com/en-us/windows/win32/services/service-control-handler-function)
282
+ to delay the preshutdown event with a given duration.
275
283
276
- Windows graceful node shutdown is controlled with the `WindowsGracefulNodeShutdown`
277
- [feature gate](/docs/reference/command-line-tools-reference/feature-gates/)
284
+ Windows graceful node shutdown is controlled with the `WindowsGracefulNodeShutdown`
285
+ [feature gate](/docs/reference/command-line-tools-reference/feature-gates/)
278
286
which is introduced in 1.32 as an alpha feature.
279
287
280
288
Windows graceful node shutdown can not be cancelled.
281
289
282
- If Kubelet is not running as a Windows service, it will not be able to set and monitor
290
+ If kubelet is not running as a Windows service, it will not be able to set and monitor
283
291
the [Preshutdown](https://learn.microsoft.com/en-us/windows/win32/api/winsvc/ns-winsvc-service_preshutdown_info) event,
284
292
the node will have to go through the [Non-Graceful Node Shutdown](#non-graceful-node-shutdown) procedure mentioned above.
285
293
286
- In the case where the Windows graceful node shutdown feature is enabled, but the kubelet is not
287
- running as a Windows service, the kubelet will continue running instead of failing. However,
294
+ In the case where the Windows graceful node shutdown feature is enabled, but the kubelet is not
295
+ running as a Windows service, the kubelet will continue running instead of failing. However,
288
296
it will log an error indicating that it needs to be run as a Windows service.
289
297
290
298
# # {{% heading "whatsnext" %}}
291
299
292
300
Learn more about the following :
293
- * Blog: [Non-Graceful Node Shutdown](/blog/2023/08/16/kubernetes-1-28-non-graceful-node-shutdown-ga/).
294
- * Cluster Architecture: [Nodes](/docs/concepts/architecture/nodes/).
301
+
302
+ - Blog : [Non-Graceful Node Shutdown](/blog/2023/08/16/kubernetes-1-28-non-graceful-node-shutdown-ga/).
303
+ - Cluster Architecture : [Nodes](/docs/concepts/architecture/nodes/).
0 commit comments