@@ -847,89 +847,6 @@ Message: Pod was terminated in response to imminent node shutdown.
847
847
```
848
848
{{< /note >}}
849
849
850
- <!--
851
- ## Non Graceful node shutdown {#non-graceful-node-shutdown}
852
- -->
853
- ## 节点非体面关闭 {#non-graceful-node-shutdown}
854
-
855
- {{< feature-state state="beta" for_k8s_version="v1.26" >}}
856
-
857
- <!--
858
- A node shutdown action may not be detected by kubelet's Node Shutdown Manager,
859
- either because the command does not trigger the inhibitor locks mechanism used by
860
- kubelet or because of a user error, i.e., the ShutdownGracePeriod and
861
- ShutdownGracePeriodCriticalPods are not configured properly. Please refer to above
862
- section [Graceful Node Shutdown](#graceful-node-shutdown) for more details.
863
- -->
864
- 节点关闭的操作可能无法被 kubelet 的节点关闭管理器检测到,
865
- 是因为该命令不会触发 kubelet 所使用的抑制锁定机制,或者是因为用户错误的原因,
866
- 即 ShutdownGracePeriod 和 ShutdownGracePeriodCriticalPod 配置不正确。
867
- 请参考以上[ 节点体面关闭] ( #graceful-node-shutdown ) 部分了解更多详细信息。
868
-
869
- <!--
870
- When a node is shutdown but not detected by kubelet's Node Shutdown Manager, the pods
871
- that are part of a StatefulSet will be stuck in terminating status on
872
- the shutdown node and cannot move to a new running node. This is because kubelet on
873
- the shutdown node is not available to delete the pods so the StatefulSet cannot
874
- create a new pod with the same name. If there are volumes used by the pods, the
875
- VolumeAttachments will not be deleted from the original shutdown node so the volumes
876
- used by these pods cannot be attached to a new running node. As a result, the
877
- application running on the StatefulSet cannot function properly. If the original
878
- shutdown node comes up, the pods will be deleted by kubelet and new pods will be
879
- created on a different running node. If the original shutdown node does not come up,
880
- these pods will be stuck in terminating status on the shutdown node forever.
881
- -->
882
- 当某节点关闭但 kubelet 的节点关闭管理器未检测到这一事件时,
883
- 在那个已关闭节点上、属于 StatefulSet 的 Pod 将停滞于终止状态,并且不能移动到新的运行节点上。
884
- 这是因为已关闭节点上的 kubelet 已不存在,亦无法删除 Pod,
885
- 因此 StatefulSet 无法创建同名的新 Pod。
886
- 如果 Pod 使用了卷,则 VolumeAttachments 不会从原来的已关闭节点上删除,
887
- 因此这些 Pod 所使用的卷也无法挂接到新的运行节点上。
888
- 所以,那些以 StatefulSet 形式运行的应用无法正常工作。
889
- 如果原来的已关闭节点被恢复,kubelet 将删除 Pod,新的 Pod 将被在不同的运行节点上创建。
890
- 如果原来的已关闭节点没有被恢复,那些在已关闭节点上的 Pod 将永远滞留在终止状态。
891
-
892
- <!--
893
- To mitigate the above situation, a user can manually add the taint `node.kubernetes.io/out-of-service` with either `NoExecute`
894
- or `NoSchedule` effect to a Node marking it out-of-service.
895
- If the `NodeOutOfServiceVolumeDetach`[feature gate](/docs/reference/command-line-tools-reference/feature-gates/)
896
- is enabled on `kube-controller-manager`, and a Node is marked out-of-service with this taint, the
897
- pods on the node will be forcefully deleted if there are no matching tolerations on it and volume
898
- detach operations for the pods terminating on the node will happen immediately. This allows the
899
- Pods on the out-of-service node to recover quickly on a different node.
900
- -->
901
- 为了缓解上述情况,用户可以手动将具有 ` NoExecute ` 或 ` NoSchedule ` 效果的
902
- ` node.kubernetes.io/out-of-service ` 污点添加到节点上,标记其无法提供服务。
903
- 如果在 ` kube-controller-manager ` 上启用了 ` NodeOutOfServiceVolumeDetach `
904
- [ 特性门控] ( /zh-cn/docs/reference/command-line-tools-reference/feature-gates/ ) ,
905
- 并且节点被通过污点标记为无法提供服务,如果节点 Pod 上没有设置对应的容忍度,
906
- 那么这样的 Pod 将被强制删除,并且该在节点上被终止的 Pod 将立即进行卷分离操作。
907
- 这样就允许那些在无法提供服务节点上的 Pod 能在其他节点上快速恢复。
908
-
909
- <!--
910
- During a non-graceful shutdown, Pods are terminated in the two phases:
911
-
912
- 1. Force delete the Pods that do not have matching `out-of-service` tolerations.
913
- 2. Immediately perform detach volume operation for such pods.
914
- -->
915
- 在非体面关闭期间,Pod 分两个阶段终止:
916
- 1 . 强制删除没有匹配的 ` out-of-service ` 容忍度的 Pod。
917
- 2 . 立即对此类 Pod 执行分离卷操作。
918
-
919
- {{< note >}}
920
- <!--
921
- - Before adding the taint `node.kubernetes.io/out-of-service` , it should be verified
922
- that the node is already in shutdown or power off state (not in the middle of
923
- restarting).
924
- - The user is required to manually remove the out-of-service taint after the pods are
925
- moved to a new node and the user has checked that the shutdown node has been
926
- recovered since the user was the one who originally added the taint.
927
- -->
928
- - 在添加 ` node.kubernetes.io/out-of-service ` 污点之前,应该验证节点已经处于关闭或断电状态(而不是在重新启动中)。
929
- - 将 Pod 移动到新节点后,用户需要手动移除停止服务的污点,并且用户要检查关闭节点是否已恢复,因为该用户是最初添加污点的用户。
930
- {{< /note >}}
931
-
932
-
933
850
<!--
934
851
### Pod Priority based graceful node shutdown {#pod-priority-graceful-node-shutdown}
935
852
-->
@@ -1091,6 +1008,91 @@ are emitted under the kubelet subsystem to monitor node shutdowns.
1091
1008
kubelet 子系统中会生成 `graceful_shutdown_start_time_seconds` 和
1092
1009
` graceful_shutdown_end_time_seconds` 度量指标以便监视节点关闭行为。
1093
1010
1011
+ <!--
1012
+ # # Non Graceful node shutdown {#non-graceful-node-shutdown}
1013
+ -->
1014
+ # # 节点非体面关闭 {#non-graceful-node-shutdown}
1015
+
1016
+ {{< feature-state state="beta" for_k8s_version="v1.26" >}}
1017
+
1018
+ <!--
1019
+ A node shutdown action may not be detected by kubelet's Node Shutdown Manager,
1020
+ either because the command does not trigger the inhibitor locks mechanism used by
1021
+ kubelet or because of a user error, i.e., the ShutdownGracePeriod and
1022
+ ShutdownGracePeriodCriticalPods are not configured properly. Please refer to above
1023
+ section [Graceful Node Shutdown](#graceful-node-shutdown) for more details.
1024
+ -->
1025
+ 节点关闭的操作可能无法被 kubelet 的节点关闭管理器检测到,
1026
+ 是因为该命令不会触发 kubelet 所使用的抑制锁定机制,或者是因为用户错误的原因,
1027
+ 即 ShutdownGracePeriod 和 ShutdownGracePeriodCriticalPod 配置不正确。
1028
+ 请参考以上[节点体面关闭](#graceful-node-shutdown)部分了解更多详细信息。
1029
+
1030
+ <!--
1031
+ When a node is shutdown but not detected by kubelet's Node Shutdown Manager, the pods
1032
+ that are part of a StatefulSet will be stuck in terminating status on
1033
+ the shutdown node and cannot move to a new running node. This is because kubelet on
1034
+ the shutdown node is not available to delete the pods so the StatefulSet cannot
1035
+ create a new pod with the same name. If there are volumes used by the pods, the
1036
+ VolumeAttachments will not be deleted from the original shutdown node so the volumes
1037
+ used by these pods cannot be attached to a new running node. As a result, the
1038
+ application running on the StatefulSet cannot function properly. If the original
1039
+ shutdown node comes up, the pods will be deleted by kubelet and new pods will be
1040
+ created on a different running node. If the original shutdown node does not come up,
1041
+ these pods will be stuck in terminating status on the shutdown node forever.
1042
+ -->
1043
+ 当某节点关闭但 kubelet 的节点关闭管理器未检测到这一事件时,
1044
+ 在那个已关闭节点上、属于 StatefulSet 的 Pod 将停滞于终止状态,并且不能移动到新的运行节点上。
1045
+ 这是因为已关闭节点上的 kubelet 已不存在,亦无法删除 Pod,
1046
+ 因此 StatefulSet 无法创建同名的新 Pod。
1047
+ 如果 Pod 使用了卷,则 VolumeAttachments 不会从原来的已关闭节点上删除,
1048
+ 因此这些 Pod 所使用的卷也无法挂接到新的运行节点上。
1049
+ 所以,那些以 StatefulSet 形式运行的应用无法正常工作。
1050
+ 如果原来的已关闭节点被恢复,kubelet 将删除 Pod,新的 Pod 将被在不同的运行节点上创建。
1051
+ 如果原来的已关闭节点没有被恢复,那些在已关闭节点上的 Pod 将永远滞留在终止状态。
1052
+
1053
+ <!--
1054
+ To mitigate the above situation, a user can manually add the taint `node.kubernetes.io/out-of-service` with either `NoExecute`
1055
+ or `NoSchedule` effect to a Node marking it out-of-service.
1056
+ If the `NodeOutOfServiceVolumeDetach`[feature gate](/docs/reference/command-line-tools-reference/feature-gates/)
1057
+ is enabled on `kube-controller-manager`, and a Node is marked out-of-service with this taint, the
1058
+ pods on the node will be forcefully deleted if there are no matching tolerations on it and volume
1059
+ detach operations for the pods terminating on the node will happen immediately. This allows the
1060
+ Pods on the out-of-service node to recover quickly on a different node.
1061
+ -->
1062
+ 为了缓解上述情况,用户可以手动将具有 `NoExecute` 或 `NoSchedule` 效果的
1063
+ ` node.kubernetes.io/out-of-service` 污点添加到节点上,标记其无法提供服务。
1064
+ 如果在 `kube-controller-manager` 上启用了 `NodeOutOfServiceVolumeDetach`
1065
+ [特性门控](/zh-cn/docs/reference/command-line-tools-reference/feature-gates/),
1066
+ 并且节点被通过污点标记为无法提供服务,如果节点 Pod 上没有设置对应的容忍度,
1067
+ 那么这样的 Pod 将被强制删除,并且该在节点上被终止的 Pod 将立即进行卷分离操作。
1068
+ 这样就允许那些在无法提供服务节点上的 Pod 能在其他节点上快速恢复。
1069
+
1070
+ <!--
1071
+ During a non-graceful shutdown, Pods are terminated in the two phases :
1072
+
1073
+ 1. Force delete the Pods that do not have matching `out-of-service` tolerations.
1074
+ 2. Immediately perform detach volume operation for such pods.
1075
+ -->
1076
+ 在非体面关闭期间,Pod 分两个阶段终止:
1077
+
1078
+ 1. 强制删除没有匹配的 `out-of-service` 容忍度的 Pod。
1079
+ 2. 立即对此类 Pod 执行分离卷操作。
1080
+
1081
+ {{< note >}}
1082
+ <!--
1083
+ - Before adding the taint `node.kubernetes.io/out-of-service` , it should be verified
1084
+ that the node is already in shutdown or power off state (not in the middle of
1085
+ restarting).
1086
+ - The user is required to manually remove the out-of-service taint after the pods are
1087
+ moved to a new node and the user has checked that the shutdown node has been
1088
+ recovered since the user was the one who originally added the taint.
1089
+ -->
1090
+ - 在添加 `node.kubernetes.io/out-of-service` 污点之前,
1091
+ 应该验证节点已经处于关闭或断电状态(而不是在重新启动中)。
1092
+ - 将 Pod 移动到新节点后,用户需要手动移除停止服务的污点,
1093
+ 并且用户要检查关闭节点是否已恢复,因为该用户是最初添加污点的用户。
1094
+ {{< /note >}}
1095
+
1094
1096
<!--
1095
1097
# # Swap memory management {#swap-memory}
1096
1098
-->
0 commit comments