@@ -5,25 +5,27 @@ weight: 10
55---
66
77<!-- overview -->
8+
89In a Kubernetes cluster, a {{< glossary_tooltip text="node" term_id="node" >}}
9- can be shutdown in a planned graceful way or unexpectedly because of reasons such
10+ can be shut down in a planned graceful way or unexpectedly because of reasons such
1011as a power outage or something else external. A node shutdown could lead to workload
1112failure if the node is not drained before the shutdown. A node shutdown can be
1213either ** graceful** or ** non-graceful** .
1314
1415<!-- body -->
16+
1517## Graceful node shutdown {#graceful-node-shutdown}
1618
1719{{< feature-state feature_gate_name="GracefulNodeShutdown" >}}
1820
1921The kubelet attempts to detect node system shutdown and terminates pods running on the node.
2022
21- Kubelet ensures that pods follow the normal
23+ kubelet ensures that pods follow the normal
2224[ pod termination process] ( /docs/concepts/workloads/pods/pod-lifecycle/#pod-termination )
2325during the node shutdown. During node shutdown, the kubelet does not accept new
2426Pods (even if those Pods are already bound to the node).
2527
26- The Graceful node shutdown feature depends on systemd since it takes advantage of
28+ The graceful node shutdown feature depends on systemd since it takes advantage of
2729[ systemd inhibitor locks] ( https://www.freedesktop.org/wiki/Software/systemd/inhibit/ ) to
2830delay the node shutdown with a given duration.
2931
@@ -32,12 +34,12 @@ Graceful node shutdown is controlled with the `GracefulNodeShutdown`
3234enabled by default in 1.21.
3335
3436Note that by default, both configuration options described below,
35- ` shutdownGracePeriod ` and ` shutdownGracePeriodCriticalPods ` are set to zero,
37+ ` shutdownGracePeriod ` and ` shutdownGracePeriodCriticalPods ` , are set to zero,
3638thus not activating the graceful node shutdown functionality.
37- To activate the feature, the two kubelet config settings should be configured appropriately and
39+ To activate the feature, both options should be configured appropriately and
3840set to non-zero values.
3941
40- Once systemd detects or notifies node shutdown, the kubelet sets a ` NotReady ` condition on
42+ Once systemd detects or is notified of a node shutdown, the kubelet sets a ` NotReady ` condition on
4143the Node, with the ` reason ` set to ` "node is shutting down" ` . The kube-scheduler honors this condition
4244and does not schedule any Pods onto the affected node; other third-party schedulers are
4345expected to follow the same logic. This means that new Pods won't be scheduled onto that node
@@ -48,26 +50,29 @@ node shutdown has been detected, so that even Pods with a
4850{{< glossary_tooltip text="toleration" term_id="toleration" >}} for
4951` node.kubernetes.io/not-ready:NoSchedule ` do not start there.
5052
51- At the same time when kubelet is setting that condition on its Node via the API,
53+ When kubelet is setting that condition on its Node via the API,
5254the kubelet also begins terminating any Pods that are running locally.
5355
5456During a graceful shutdown, kubelet terminates pods in two phases:
5557
56581 . Terminate regular pods running on the node.
57- 2 . Terminate [ critical pods] ( /docs/tasks/administer-cluster/guaranteed-scheduling-critical-addon-pods/#marking-pod-as-critical )
59+ 1 . Terminate [ critical pods] ( /docs/tasks/administer-cluster/guaranteed-scheduling-critical-addon-pods/#marking-pod-as-critical )
5860 running on the node.
5961
60- Graceful node shutdown feature is configured with two
62+ The graceful node shutdown feature is configured with two
6163[ ` KubeletConfiguration ` ] ( /docs/tasks/administer-cluster/kubelet-config-file/ ) options:
6264
63- * ` shutdownGracePeriod ` :
64- * Specifies the total duration that the node should delay the shutdown by. This is the total
65- grace period for pod termination for both regular and
66- [ critical pods] ( /docs/tasks/administer-cluster/guaranteed-scheduling-critical-addon-pods/#marking-pod-as-critical ) .
67- * ` shutdownGracePeriodCriticalPods ` :
68- * Specifies the duration used to terminate
69- [ critical pods] ( /docs/tasks/administer-cluster/guaranteed-scheduling-critical-addon-pods/#marking-pod-as-critical )
70- during a node shutdown. This value should be less than ` shutdownGracePeriod ` .
65+ - ` shutdownGracePeriod ` :
66+
67+ Specifies the total duration that the node should delay the shutdown by. This is the total
68+ grace period for pod termination for both regular and
69+ [ critical pods] ( /docs/tasks/administer-cluster/guaranteed-scheduling-critical-addon-pods/#marking-pod-as-critical ) .
70+
71+ - ` shutdownGracePeriodCriticalPods ` :
72+
73+ Specifies the duration used to terminate
74+ [ critical pods] ( /docs/tasks/administer-cluster/guaranteed-scheduling-critical-addon-pods/#marking-pod-as-critical )
75+ during a node shutdown. This value should be less than ` shutdownGracePeriod ` .
7176
7277{{< note >}}
7378
@@ -122,22 +127,22 @@ Assuming the following custom pod
122127[ priority classes] ( /docs/concepts/scheduling-eviction/pod-priority-preemption/#priorityclass )
123128in a cluster,
124129
125- | Pod priority class name| Pod priority class value|
126- | ------------------------- | ------------------------|
127- | ` custom-class-a ` | 100000 |
128- | ` custom-class-b ` | 10000 |
129- | ` custom-class-c ` | 1000 |
130- | ` regular/unset ` | 0 |
130+ | Pod priority class name | Pod priority class value |
131+ | ----------------------- | ------------------------ |
132+ | ` custom-class-a ` | 100000 |
133+ | ` custom-class-b ` | 10000 |
134+ | ` custom-class-c ` | 1000 |
135+ | ` regular/unset ` | 0 |
131136
132137Within the [ kubelet configuration] ( /docs/reference/config-api/kubelet-config.v1beta1/ )
133138the settings for ` shutdownGracePeriodByPodPriority ` could look like:
134139
135- | Pod priority class value| Shutdown period|
136- | ------------------------| ---------------|
137- | 100000 | 10 seconds |
138- | 10000 | 180 seconds |
139- | 1000 | 120 seconds |
140- | 0 | 60 seconds |
140+ | Pod priority class value | Shutdown period |
141+ | ------------------------ | --------------- |
142+ | 100000 | 10 seconds |
143+ | 10000 | 180 seconds |
144+ | 1000 | 120 seconds |
145+ | 0 | 60 seconds |
141146
142147The corresponding kubelet config YAML configuration would be:
143148
@@ -154,18 +159,18 @@ shutdownGracePeriodByPodPriority:
154159` ` `
155160
156161The above table implies that any pod with ` priority` value >= 100000 will get
157- just 10 seconds to stop , any pod with value >= 10000 and < 100000 will get 180
158- seconds to stop , any pod with value >= 1000 and < 10000 will get 120 seconds to stop .
159- Finally, all other pods will get 60 seconds to stop .
162+ just 10 seconds to shut down , any pod with value >= 10000 and < 100000 will get 180
163+ seconds to shut down , any pod with value >= 1000 and < 10000 will get 120 seconds to shut down .
164+ Finally, all other pods will get 60 seconds to shut down .
160165
161166One doesn't have to specify values corresponding to all of the classes. For
162167example, you could instead use these settings :
163168
164- |Pod priority class value| Shutdown period|
165- |------------------------| ---------------|
166- | 100000 | 300 seconds |
167- | 1000 | 120 seconds |
168- | 0 | 60 seconds |
169+ | Pod priority class value | Shutdown period |
170+ | ------------------------ | --------------- |
171+ | 100000 | 300 seconds |
172+ | 1000 | 120 seconds |
173+ | 0 | 60 seconds |
169174
170175In the above case, the pods with `custom-class-b` will go into the same bucket
171176as `custom-class-c` for shutdown.
@@ -225,14 +230,16 @@ on a different node.
225230During a non-graceful shutdown, Pods are terminated in the two phases :
226231
2272321. Force delete the Pods that do not have matching `out-of-service` tolerations.
228- 2 . Immediately perform detach volume operation for such pods.
233+ 1 . Immediately perform detach volume operation for such pods.
229234
230235{{< note >}}
236+
231237- Before adding the taint `node.kubernetes.io/out-of-service`, it should be verified
232238 that the node is already in shutdown or power off state (not in the middle of restarting).
233239- The user is required to manually remove the out-of-service taint after the pods are
234240 moved to a new node and the user has checked that the shutdown node has been
235241 recovered since the user was the one who originally added the taint.
242+
236243{{< /note >}}
237244
238245# ## Forced storage detach on timeout {#storage-force-detach-on-timeout}
@@ -256,39 +263,41 @@ its associated
256263[VolumeAttachment](/docs/reference/kubernetes-api/config-and-storage-resources/volume-attachment-v1/)
257264deleted.
258265
259- After this setting has been applied, unhealthy pods still attached to a volumes must be recovered
266+ After this setting has been applied, unhealthy pods still attached to volumes must be recovered
260267via the [Non-Graceful Node Shutdown](#non-graceful-node-shutdown) procedure mentioned above.
261268
262269{{< note >}}
270+
263271- Caution must be taken while using the [Non-Graceful Node Shutdown](#non-graceful-node-shutdown) procedure.
264272- Deviation from the steps documented above can result in data corruption.
265- {{< /note >}}
266273
274+ {{< /note >}}
267275
268276# # Windows Graceful node shutdown {#windows-graceful-node-shutdown}
269277
270278{{< feature-state feature_gate_name="WindowsGracefulNodeShutdown" >}}
271279
272- The Windows graceful node shutdown feature depends on kubelet running as a Windows service,
273- it will then have a registered [service control handler](https://learn.microsoft.com/en-us/windows/win32/services/service-control-handler-function)
274- to delay the presshutdown event with a given duration.
280+ The Windows graceful node shutdown feature depends on kubelet running as a Windows service,
281+ it will then have a registered [service control handler](https://learn.microsoft.com/en-us/windows/win32/services/service-control-handler-function)
282+ to delay the preshutdown event with a given duration.
275283
276- Windows graceful node shutdown is controlled with the `WindowsGracefulNodeShutdown`
277- [feature gate](/docs/reference/command-line-tools-reference/feature-gates/)
284+ Windows graceful node shutdown is controlled with the `WindowsGracefulNodeShutdown`
285+ [feature gate](/docs/reference/command-line-tools-reference/feature-gates/)
278286which is introduced in 1.32 as an alpha feature.
279287
280288Windows graceful node shutdown can not be cancelled.
281289
282- If Kubelet is not running as a Windows service, it will not be able to set and monitor
290+ If kubelet is not running as a Windows service, it will not be able to set and monitor
283291the [Preshutdown](https://learn.microsoft.com/en-us/windows/win32/api/winsvc/ns-winsvc-service_preshutdown_info) event,
284292the node will have to go through the [Non-Graceful Node Shutdown](#non-graceful-node-shutdown) procedure mentioned above.
285293
286- In the case where the Windows graceful node shutdown feature is enabled, but the kubelet is not
287- running as a Windows service, the kubelet will continue running instead of failing. However,
294+ In the case where the Windows graceful node shutdown feature is enabled, but the kubelet is not
295+ running as a Windows service, the kubelet will continue running instead of failing. However,
288296it will log an error indicating that it needs to be run as a Windows service.
289297
290298# # {{% heading "whatsnext" %}}
291299
292300Learn more about the following :
293- * Blog: [Non-Graceful Node Shutdown](/blog/2023/08/16/kubernetes-1-28-non-graceful-node-shutdown-ga/).
294- * Cluster Architecture: [Nodes](/docs/concepts/architecture/nodes/).
301+
302+ - Blog : [Non-Graceful Node Shutdown](/blog/2023/08/16/kubernetes-1-28-non-graceful-node-shutdown-ga/).
303+ - Cluster Architecture : [Nodes](/docs/concepts/architecture/nodes/).
0 commit comments