Skip to content

Commit 7e2c916

Browse files
authored
Merge pull request #41914 from xing-yang/shutdown_ga
Add blog for non-graceful node shutdown
2 parents b33c318 + 8be30c6 commit 7e2c916

File tree

1 file changed

+109
-0
lines changed

1 file changed

+109
-0
lines changed
Lines changed: 109 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,109 @@
1+
---
2+
layout: blog
3+
title: "Kubernetes 1.28: Non-Graceful Node Shutdown Moves to GA"
4+
date: 2023-08-15T10:00:00-08:00
5+
slug: kubernetes-1-28-non-graceful-node-shutdown-GA
6+
---
7+
8+
**Authors:** Xing Yang (VMware) and Ashutosh Kumar (Elastic)
9+
10+
The Kubernetes Non-Graceful Node Shutdown feature is now GA in Kubernetes v1.28.
11+
It was introduced as
12+
[alpha](https://github.com/kubernetes/enhancements/tree/master/keps/sig-storage/2268-non-graceful-shutdown)
13+
in Kubernetes v1.24, and promoted to
14+
[beta](https://kubernetes.io/blog/2022/12/16/kubernetes-1-26-non-graceful-node-shutdown-beta/)
15+
in Kubernetes v1.26.
16+
This feature allows stateful workloads to restart on a different node if the
17+
original node is shutdown unexpectedly or ends up in a non-recoverable state
18+
such as the hardware failure or unresponsive OS.
19+
20+
## What is a Non-Graceful Node Shutdown
21+
22+
In a Kubernetes cluster, a node can be shutdown in a planned graceful way or
23+
unexpectedly because of reasons such as power outage or something else external.
24+
A node shutdown could lead to workload failure if the node is not drained
25+
before the shutdown. A node shutdown can be either graceful or non-graceful.
26+
27+
The [Graceful Node Shutdown](https://kubernetes.io/blog/2021/04/21/graceful-node-shutdown-beta/)
28+
feature allows Kubelet to detect a node shutdown event, properly terminate the pods,
29+
and release resources, before the actual shutdown.
30+
31+
When a node is shutdown but not detected by Kubelet's Node Shutdown Manager,
32+
this becomes a non-graceful node shutdown.
33+
Non-graceful node shutdown is usually not a problem for stateless apps, however,
34+
it is a problem for stateful apps.
35+
The stateful application cannot function properly if the pods are stuck on the
36+
shutdown node and are not restarting on a running node.
37+
38+
In the case of a non-graceful node shutdown, you can manually add an `out-of-service` taint on the Node.
39+
40+
```
41+
kubectl taint nodes <node-name> node.kubernetes.io/out-of-service=nodeshutdown:NoExecute
42+
```
43+
44+
This taint triggers pods on the node to be forcefully deleted if there are no
45+
matching tolerations on the pods. Persistent volumes attached to the shutdown node
46+
will be detached, and new pods will be created successfully on a different running
47+
node.
48+
49+
**Note:** Before applying the out-of-service taint, you must verify that a node is
50+
already in shutdown or power-off state (not in the middle of restarting).
51+
52+
Once all the workload pods that are linked to the out-of-service node are moved to
53+
a new running node, and the shutdown node has been recovered, you should remove that
54+
taint on the affected node after the node is recovered.
55+
56+
## What’s new in stable
57+
58+
With the promotion of the Non-Graceful Node Shutdown feature to stable, the
59+
feature gate `NodeOutOfServiceVolumeDetach` is locked to true on
60+
`kube-controller-manager` and cannot be disabled.
61+
62+
Metrics `force_delete_pods_total` and `force_delete_pod_errors_total` in the
63+
Pod GC Controller are enhanced to account for all forceful pods deletion.
64+
A reason is added to the metric to indicate whether the pod is forcefully deleted
65+
because it is terminated, orphaned, terminating with the `out-of-service` taint,
66+
or terminating and unscheduled.
67+
68+
A "reason" is also added to the metric `attachdetach_controller_forced_detaches`
69+
in the Attach Detach Controller to indicate whether the force detach is caused by
70+
the `out-of-service` taint or a timeout.
71+
72+
## What’s next?
73+
74+
This feature requires a user to manually add a taint to the node to trigger
75+
workloads failover and remove the taint after the node is recovered.
76+
In the future, we plan to find ways to automatically detect and fence nodes
77+
that are shutdown/failed and automatically failover workloads to another node.
78+
79+
## How can I learn more?
80+
81+
Check out additional documentation on this feature
82+
[here](https://kubernetes.io/docs/concepts/architecture/nodes/#non-graceful-node-shutdown).
83+
84+
## How to get involved?
85+
86+
We offer a huge thank you to all the contributors who helped with design,
87+
implementation, and review of this feature and helped move it from alpha, beta, to stable:
88+
89+
* Michelle Au ([msau42](https://github.com/msau42))
90+
* Derek Carr ([derekwaynecarr](https://github.com/derekwaynecarr))
91+
* Danielle Endocrimes ([endocrimes](https://github.com/endocrimes))
92+
* Baofa Fan ([carlory](https://github.com/carlory))
93+
* Tim Hockin ([thockin](https://github.com/thockin))
94+
* Ashutosh Kumar ([sonasingh46](https://github.com/sonasingh46))
95+
* Hemant Kumar ([gnufied](https://github.com/gnufied))
96+
* Yuiko Mouri ([YuikoTakada](https://github.com/YuikoTakada))
97+
* Mrunal Patel ([mrunalp](https://github.com/mrunalp))
98+
* David Porter ([bobbypage](https://github.com/bobbypage))
99+
* Yassine Tijani ([yastij](https://github.com/yastij))
100+
* Jing Xu ([jingxu97](https://github.com/jingxu97))
101+
* Xing Yang ([xing-yang](https://github.com/xing-yang))
102+
103+
This feature is a collaboration between SIG Storage and SIG Node.
104+
For those interested in getting involved with the design and development of any
105+
part of the Kubernetes Storage system, join the Kubernetes Storage Special
106+
Interest Group (SIG).
107+
For those interested in getting involved with the design and development of the
108+
components that support the controlled interactions between pods and host
109+
resources, join the Kubernetes Node SIG.

0 commit comments

Comments
 (0)