Skip to content

Commit 06c75a9

Browse files
committed
[zh] sync 2023-01-06-unhealthy-pod-eviction-policy-for-pdb.md
1 parent 8dd83e6 commit 06c75a9

File tree

1 file changed

+198
-0
lines changed

1 file changed

+198
-0
lines changed
Lines changed: 198 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,198 @@
1+
---
2+
layout: blog
3+
title: "Kubernetes 1.26:PodDisruptionBudget 守护的不健康 Pod 所用的驱逐策略"
4+
date: 2023-01-06
5+
slug: "unhealthy-pod-eviction-policy-for-pdbs"
6+
---
7+
<!--
8+
layout: blog
9+
title: "Kubernetes 1.26: Eviction policy for unhealthy pods guarded by PodDisruptionBudgets"
10+
date: 2023-01-06
11+
slug: "unhealthy-pod-eviction-policy-for-pdbs"
12+
-->
13+
14+
<!--
15+
**Authors:** Filip Křepinský (Red Hat), Morten Torkildsen (Google), Ravi Gudimetla (Apple)
16+
-->
17+
**作者:** Filip Křepinský (Red Hat), Morten Torkildsen (Google), Ravi Gudimetla (Apple)
18+
19+
**译者:** Michael Yao (DaoCloud)
20+
21+
<!--
22+
Ensuring the disruptions to your applications do not affect its availability isn't a simple
23+
task. Last month's release of Kubernetes v1.26 lets you specify an _unhealthy pod eviction policy_
24+
for [PodDisruptionBudgets](/docs/concepts/workloads/pods/disruptions/#pod-disruption-budgets) (PDBs)
25+
to help you maintain that availability during node management operations.
26+
In this article, we will dive deeper into what modifications were introduced for PDBs to
27+
give application owners greater flexibility in managing disruptions.
28+
-->
29+
确保对应用的干扰不影响其可用性不是一个简单的任务。
30+
上个月发布的 Kubernetes v1.26 允许针对
31+
[PodDisruptionBudget](/zh-cn/docs/concepts/workloads/pods/disruptions/#pod-disruption-budgets) (PDB)
32+
指定**不健康 Pod 驱逐策略**,这有助于在节点执行管理操作期间保持可用性。
33+
34+
<!--
35+
## What problems does this solve?
36+
37+
API-initiated eviction of pods respects PodDisruptionBudgets (PDBs). This means that a requested [voluntary disruption](https://kubernetes.io/docs/concepts/scheduling-eviction/#pod-disruption)
38+
via an eviction to a Pod, should not disrupt a guarded application and `.status.currentHealthy` of a PDB should not fall
39+
below `.status.desiredHealthy`. Running pods that are [Unhealthy](/docs/tasks/run-application/configure-pdb/#healthiness-of-a-pod)
40+
do not count towards the PDB status, but eviction of these is only possible in case the application
41+
is not disrupted. This helps disrupted or not yet started application to achieve availability
42+
as soon as possible without additional downtime that would be caused by evictions.
43+
-->
44+
## 这解决什么问题? {#what-problem-does-this-solve}
45+
46+
API 发起的 Pod 驱逐尊重 PodDisruptionBudget (PDB) 约束。这意味着因驱逐 Pod
47+
而请求的[自愿干扰](/zh-cn/docs/concepts/scheduling-eviction/#pod-disruption)不应干扰守护的应用且
48+
PDB 的 `.status.currentHealthy` 不应低于 `.status.desiredHealthy`
49+
如果正在运行的 Pod 状态为 [Unhealthy](/zh-cn/docs/tasks/run-application/configure-pdb/#healthiness-of-a-pod)
50+
则该 Pod 不计入 PDB 状态,只有在应用不受干扰时才可以驱逐这些 Pod。
51+
这有助于尽可能确保受干扰或还未启动的应用的可用性,不会因驱逐造成额外的停机时间。
52+
53+
<!--
54+
Unfortunately, this poses a problem for cluster administrators that would like to drain nodes
55+
without any manual interventions. Misbehaving applications with pods in `CrashLoopBackOff`
56+
state (due to a bug or misconfiguration) or pods that are simply failing to become ready
57+
make this task much harder. Any eviction request will fail due to violation of a PDB,
58+
when all pods of an application are unhealthy. Draining of a node cannot make any progress
59+
in that case.
60+
-->
61+
不幸的是,对于想要腾空节点但又不进行任何手动干预的集群管理员而言,这种机制是有问题的。
62+
若一些应用因 Pod 处于 `CrashLoopBackOff` 状态(由于漏洞或配置错误)或 Pod 无法进入就绪状态而行为异常,
63+
会使这项任务变得更加困难。当某应用的所有 Pod 均不健康时,所有驱逐请求都会因违反 PDB 而失败。
64+
在这种情况下,腾空节点不会有任何作用。
65+
66+
<!--
67+
On the other hand there are users that depend on the existing behavior, in order to:
68+
- prevent data-loss that would be caused by deleting pods that are guarding an underlying resource or storage
69+
- achieve the best availability possible for their application
70+
-->
71+
另一方面,有些用户依赖于现有行为,以便:
72+
73+
- 防止因删除守护基础资源或存储的 Pod 而造成数据丢失
74+
- 让应用达到最佳可用性
75+
76+
<!--
77+
Kubernetes 1.26 introduced a new experimental field to the PodDisruptionBudget API: `.spec.unhealthyPodEvictionPolicy`.
78+
When enabled, this field lets you support both of those requirements.
79+
-->
80+
Kubernetes 1.26 为 PodDisruptionBudget API 引入了新的实验性字段:
81+
`.spec.unhealthyPodEvictionPolicy`。启用此字段后,将允许你支持上述两种需求。
82+
83+
<!--
84+
## How does it work?
85+
86+
API-initiated eviction is the process that triggers graceful pod termination.
87+
The process can be initiated either by calling the API directly,
88+
by using a `kubectl drain` command, or other actors in the cluster.
89+
During this process every pod removal is consulted with appropriate PDBs,
90+
to ensure that a sufficient number of pods is always running in the cluster.
91+
-->
92+
## 工作原理 {#how-does-it-work}
93+
94+
API 发起的驱逐是触发 Pod 优雅终止的一个进程。
95+
这个进程可以通过直接调用 API 发起,也能使用 `kubectl drain` 或集群中的其他主体来发起。
96+
在这个过程中,移除每个 Pod 时将与对应的 PDB 协商,确保始终有足够数量的 Pod 在集群中运行。
97+
98+
<!--
99+
The following policies allow PDB authors to have a greater control how the process deals with unhealthy pods.
100+
101+
There are two policies `IfHealthyBudget` and `AlwaysAllow` to choose from.
102+
103+
The former, `IfHealthyBudget`, follows the existing behavior to achieve the best availability
104+
that you get by default. Unhealthy pods can be disrupted only if their application
105+
has a minimum available `.status.desiredHealthy` number of pods.
106+
-->
107+
以下策略允许 PDB 作者进一步控制此进程如何处理不健康的 Pod。
108+
109+
有两个策略可供选择:`IfHealthyBudget``AlwaysAllow`
110+
111+
前者,`IfHealthyBudget` 采用现有行为以达到你默认可获得的最佳的可用性。
112+
不健康的 Pod 只有在其应用中可用的 Pod 个数达到 `.status.desiredHealthy` 即最小可用个数时才会被干扰。
113+
114+
<!--
115+
By setting the `spec.unhealthyPodEvictionPolicy` field of your PDB to `AlwaysAllow`,
116+
you are choosing the best effort availability for your application.
117+
With this policy it is always possible to evict unhealthy pods.
118+
This will make it easier to maintain and upgrade your clusters.
119+
120+
We think that `AlwaysAllow` will often be a better choice, but for some critical workloads you may
121+
still prefer to protect even unhealthy Pods from node drains or other forms of API-initiated
122+
eviction.
123+
-->
124+
通过将 PDB 的 `spec.unhealthyPodEvictionPolicy` 字段设置为 `AlwaysAllow`
125+
可以表示尽可能为应用选择最佳的可用性。采用此策略时,始终能够驱逐不健康的 Pod。
126+
这可以简化集群的维护和升级。
127+
128+
我们认为 `AlwaysAllow` 通常是一个更好的选择,但是对于某些关键工作负载,
129+
你可能仍然倾向于防止不健康的 Pod 被从节点上腾空或其他形式的 API 发起的驱逐。
130+
131+
<!--
132+
## How do I use it?
133+
134+
This is an alpha feature, which means you have to enable the `PDBUnhealthyPodEvictionPolicy`
135+
[feature gate](/docs/reference/command-line-tools-reference/feature-gates/),
136+
with the command line argument `--feature-gates=PDBUnhealthyPodEvictionPolicy=true`
137+
to the kube-apiserver.
138+
-->
139+
## 如何使用? {#how-do-i-use-it}
140+
141+
这是一个 Alpha 特性,意味着你必须使用命令行参数 `--feature-gates=PDBUnhealthyPodEvictionPolicy=true`
142+
为 kube-apiserver 启用 `PDBUnhealthyPodEvictionPolicy`
143+
[特性门控](/zh-cn/docs/reference/command-line-tools-reference/feature-gates/)
144+
145+
<!--
146+
Here's an example. Assume that you've enabled the feature gate in your cluster, and that you
147+
already defined a Deployment that runs a plain webserver. You labelled the Pods for that
148+
Deployment with `app: nginx`.
149+
You want to limit avoidable disruption, and you know that best effort availability is
150+
sufficient for this app.
151+
You decide to allow evictions even if those webserver pods are unhealthy.
152+
You create a PDB to guard this application, with the `AlwaysAllow` policy for evicting
153+
unhealthy pods:
154+
-->
155+
以下是一个例子。假设你已在集群中启用了此特性门控且你已定义了运行普通 Web 服务器的 Deployment。
156+
你已为 Deployment 的 Pod 打了标签 `app: nginx`
157+
你想要限制可避免的干扰,你知道对于此应用而言尽力而为的可用性也是足够的。
158+
你决定即使这些 Web 服务器 Pod 不健康也允许驱逐。
159+
你创建 PDB 守护此应用,使用 `AlwaysAllow` 策略驱逐不健康的 Pod:
160+
161+
```yaml
162+
apiVersion: policy/v1
163+
kind: PodDisruptionBudget
164+
metadata:
165+
name: nginx-pdb
166+
spec:
167+
selector:
168+
matchLabels:
169+
app: nginx
170+
maxUnavailable: 1
171+
unhealthyPodEvictionPolicy: AlwaysAllow
172+
```
173+
174+
<!--
175+
## How can I learn more?
176+
177+
- Read the KEP: [Unhealthy Pod Eviction Policy for PDBs](https://github.com/kubernetes/enhancements/tree/master/keps/sig-apps/3017-pod-healthy-policy-for-pdb)
178+
- Read the documentation: [Unhealthy Pod Eviction Policy](/docs/tasks/run-application/configure-pdb/#unhealthy-pod-eviction-policy) for PodDisruptionBudgets
179+
- Review the Kubernetes documentation for [PodDisruptionBudgets](/docs/concepts/workloads/pods/disruptions/#pod-disruption-budgets), [draining of Nodes](/docs/tasks/administer-cluster/safely-drain-node/) and [evictions](/docs/concepts/scheduling-eviction/api-eviction/)
180+
-->
181+
## 查阅更多资料 {#how-can-i-learn-more}
182+
183+
- 阅读 KEP:[Unhealthy Pod Eviction Policy for PDBs](https://github.com/kubernetes/enhancements/tree/master/keps/sig-apps/3017-pod-healthy-policy-for-pdb)
184+
- 阅读针对 PodDisruptionBudget
185+
的[不健康 Pod 驱逐策略](/zh-cn/docs/tasks/run-application/configure-pdb/#unhealthy-pod-eviction-policy)文档
186+
- 参阅 [PodDisruptionBudget](/zh-cn/docs/concepts/workloads/pods/disruptions/#pod-disruption-budgets)、
187+
[腾空节点](/zh-cn/docs/tasks/administer-cluster/safely-drain-node/)和[驱逐](/zh-cn/docs/concepts/scheduling-eviction/api-eviction/)等 Kubernetes 文档
188+
189+
<!--
190+
## How do I get involved?
191+
192+
If you have any feedback, please reach out to us in the [#sig-apps](https://kubernetes.slack.com/archives/C18NZM5K9) channel on Slack (visit https://slack.k8s.io/ for an invitation if you need one), or on the SIG Apps mailing list: [email protected]
193+
-->
194+
## 我如何参与? {#how-do-i-get-involved}
195+
196+
如果你有任何反馈,请通过 Slack [#sig-apps](https://kubernetes.slack.com/archives/C18NZM5K9) 频道
197+
(如有需要,请访问 https://slack.k8s.io/ 获取邀请)或通过 SIG Apps 邮件列表
198+
[[email protected]](https://groups.google.com/g/kubernetes-sig-apps) 联系我们。

0 commit comments

Comments
 (0)