|
| 1 | +--- |
| 2 | +layout: blog |
| 3 | +title: "Kubernetes 1.28: Non-Graceful Node Shutdown Moves to GA" |
| 4 | +date: 2023-08-15T10:00:00-08:00 |
| 5 | +slug: kubernetes-1-28-non-graceful-node-shutdown-GA |
| 6 | +--- |
| 7 | + |
| 8 | +**Authors:** Xing Yang (VMware) and Ashutosh Kumar (Elastic) |
| 9 | + |
| 10 | +The Kubernetes Non-Graceful Node Shutdown feature is now GA in Kubernetes v1.28. |
| 11 | +It was introduced as |
| 12 | +[alpha](https://github.com/kubernetes/enhancements/tree/master/keps/sig-storage/2268-non-graceful-shutdown) |
| 13 | +in Kubernetes v1.24, and promoted to |
| 14 | +[beta](https://kubernetes.io/blog/2022/12/16/kubernetes-1-26-non-graceful-node-shutdown-beta/) |
| 15 | +in Kubernetes v1.26. |
| 16 | +This feature allows stateful workloads to restart on a different node if the |
| 17 | +original node is shutdown unexpectedly or ends up in a non-recoverable state |
| 18 | +such as the hardware failure or unresponsive OS. |
| 19 | + |
| 20 | +## What is a Non-Graceful Node Shutdown |
| 21 | + |
| 22 | +In a Kubernetes cluster, a node can be shutdown in a planned graceful way or |
| 23 | +unexpectedly because of reasons such as power outage or something else external. |
| 24 | +A node shutdown could lead to workload failure if the node is not drained |
| 25 | +before the shutdown. A node shutdown can be either graceful or non-graceful. |
| 26 | + |
| 27 | +The [Graceful Node Shutdown](https://kubernetes.io/blog/2021/04/21/graceful-node-shutdown-beta/) |
| 28 | +feature allows Kubelet to detect a node shutdown event, properly terminate the pods, |
| 29 | +and release resources, before the actual shutdown. |
| 30 | + |
| 31 | +When a node is shutdown but not detected by Kubelet's Node Shutdown Manager, |
| 32 | +this becomes a non-graceful node shutdown. |
| 33 | +Non-graceful node shutdown is usually not a problem for stateless apps, however, |
| 34 | +it is a problem for stateful apps. |
| 35 | +The stateful application cannot function properly if the pods are stuck on the |
| 36 | +shutdown node and are not restarting on a running node. |
| 37 | + |
| 38 | +In the case of a non-graceful node shutdown, you can manually add an `out-of-service` taint on the Node. |
| 39 | + |
| 40 | +``` |
| 41 | +kubectl taint nodes <node-name> node.kubernetes.io/out-of-service=nodeshutdown:NoExecute |
| 42 | +``` |
| 43 | + |
| 44 | +This taint triggers pods on the node to be forcefully deleted if there are no |
| 45 | +matching tolerations on the pods. Persistent volumes attached to the shutdown node |
| 46 | +will be detached, and new pods will be created successfully on a different running |
| 47 | +node. |
| 48 | + |
| 49 | +**Note:** Before applying the out-of-service taint, you must verify that a node is |
| 50 | +already in shutdown or power-off state (not in the middle of restarting). |
| 51 | + |
| 52 | +Once all the workload pods that are linked to the out-of-service node are moved to |
| 53 | +a new running node, and the shutdown node has been recovered, you should remove that |
| 54 | +taint on the affected node after the node is recovered. |
| 55 | + |
| 56 | +## What’s new in stable |
| 57 | + |
| 58 | +With the promotion of the Non-Graceful Node Shutdown feature to stable, the |
| 59 | +feature gate `NodeOutOfServiceVolumeDetach` is locked to true on |
| 60 | +`kube-controller-manager` and cannot be disabled. |
| 61 | + |
| 62 | +Metrics `force_delete_pods_total` and `force_delete_pod_errors_total` in the |
| 63 | +Pod GC Controller are enhanced to account for all forceful pods deletion. |
| 64 | +A reason is added to the metric to indicate whether the pod is forcefully deleted |
| 65 | +because it is terminated, orphaned, terminating with the `out-of-service` taint, |
| 66 | +or terminating and unscheduled. |
| 67 | + |
| 68 | +A "reason" is also added to the metric `attachdetach_controller_forced_detaches` |
| 69 | +in the Attach Detach Controller to indicate whether the force detach is caused by |
| 70 | +the `out-of-service` taint or a timeout. |
| 71 | + |
| 72 | +## What’s next? |
| 73 | + |
| 74 | +This feature requires a user to manually add a taint to the node to trigger |
| 75 | +workloads failover and remove the taint after the node is recovered. |
| 76 | +In the future, we plan to find ways to automatically detect and fence nodes |
| 77 | +that are shutdown/failed and automatically failover workloads to another node. |
| 78 | + |
| 79 | +## How can I learn more? |
| 80 | + |
| 81 | +Check out additional documentation on this feature |
| 82 | +[here](https://kubernetes.io/docs/concepts/architecture/nodes/#non-graceful-node-shutdown). |
| 83 | + |
| 84 | +## How to get involved? |
| 85 | + |
| 86 | +We offer a huge thank you to all the contributors who helped with design, |
| 87 | +implementation, and review of this feature and helped move it from alpha, beta, to stable: |
| 88 | + |
| 89 | +* Michelle Au ([msau42](https://github.com/msau42)) |
| 90 | +* Derek Carr ([derekwaynecarr](https://github.com/derekwaynecarr)) |
| 91 | +* Danielle Endocrimes ([endocrimes](https://github.com/endocrimes)) |
| 92 | +* Baofa Fan ([carlory](https://github.com/carlory)) |
| 93 | +* Tim Hockin ([thockin](https://github.com/thockin)) |
| 94 | +* Ashutosh Kumar ([sonasingh46](https://github.com/sonasingh46)) |
| 95 | +* Hemant Kumar ([gnufied](https://github.com/gnufied)) |
| 96 | +* Yuiko Mouri ([YuikoTakada](https://github.com/YuikoTakada)) |
| 97 | +* Mrunal Patel ([mrunalp](https://github.com/mrunalp)) |
| 98 | +* David Porter ([bobbypage](https://github.com/bobbypage)) |
| 99 | +* Yassine Tijani ([yastij](https://github.com/yastij)) |
| 100 | +* Jing Xu ([jingxu97](https://github.com/jingxu97)) |
| 101 | +* Xing Yang ([xing-yang](https://github.com/xing-yang)) |
| 102 | + |
| 103 | +This feature is a collaboration between SIG Storage and SIG Node. |
| 104 | +For those interested in getting involved with the design and development of any |
| 105 | +part of the Kubernetes Storage system, join the Kubernetes Storage Special |
| 106 | +Interest Group (SIG). |
| 107 | +For those interested in getting involved with the design and development of the |
| 108 | +components that support the controlled interactions between pods and host |
| 109 | +resources, join the Kubernetes Node SIG. |
0 commit comments