Skip to content

Commit 96d5c1f

Browse files
authored
Merge pull request #48285 from windsonsea/jobspo
[zh] Add 2024-08-19-pod-failure-policy-for-jobs-goes-ga.md
2 parents f39d8ca + 1233332 commit 96d5c1f

File tree

1 file changed

+355
-0
lines changed

1 file changed

+355
-0
lines changed
Lines changed: 355 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,355 @@
1+
---
2+
layout: blog
3+
title: "Kubernetes 1.31:针对 Job 的 Pod 失效策略进阶至 GA"
4+
date: 2024-08-19
5+
slug: kubernetes-1-31-pod-failure-policy-for-jobs-goes-ga
6+
author: >
7+
[Michał Woźniak](https://github.com/mimowo) (Google),
8+
[Shannon Kularathna](https://github.com/shannonxtreme) (Google)
9+
translator: >
10+
[Michael Yao](https://github.com/windsonsea) (DaoCloud)
11+
---
12+
<!--
13+
layout: blog
14+
title: "Kubernetes 1.31: Pod Failure Policy for Jobs Goes GA"
15+
date: 2024-08-19
16+
slug: kubernetes-1-31-pod-failure-policy-for-jobs-goes-ga
17+
author: >
18+
[Michał Woźniak](https://github.com/mimowo) (Google),
19+
[Shannon Kularathna](https://github.com/shannonxtreme) (Google)
20+
-->
21+
22+
<!--
23+
This post describes _Pod failure policy_, which graduates to stable in Kubernetes
24+
1.31, and how to use it in your Jobs.
25+
-->
26+
这篇博文阐述在 Kubernetes 1.31 中进阶至 Stable 的 **Pod 失效策略**,还介绍如何在你的 Job 中使用此策略。
27+
28+
<!--
29+
## About Pod failure policy
30+
31+
When you run workloads on Kubernetes, Pods might fail for a variety of reasons.
32+
Ideally, workloads like Jobs should be able to ignore transient, retriable
33+
failures and continue running to completion.
34+
-->
35+
## 关于 Pod 失效策略
36+
37+
当你在 Kubernetes 上运行工作负载时,Pod 可能因各种原因而失效。
38+
理想情况下,像 Job 这样的工作负载应该能够忽略瞬时的、可重试的失效,并继续运行直到完成。
39+
40+
<!--
41+
To allow for these transient failures, Kubernetes Jobs include the `backoffLimit`
42+
field, which lets you specify a number of Pod failures that you're willing to tolerate
43+
during Job execution. However, if you set a large value for the `backoffLimit` field
44+
and rely solely on this field, you might notice unnecessary increases in operating
45+
costs as Pods restart excessively until the backoffLimit is met.
46+
-->
47+
要允许这些瞬时的失效,Kubernetes Job 需包含 `backoffLimit` 字段,
48+
此字段允许你指定在 Job 执行期间你愿意容忍的 Pod 失效次数。然而,
49+
如果你为 `backoffLimit` 字段设置了一个较大的值,并完全依赖这个字段,
50+
你可能会发现,由于在满足 backoffLimit 条件之前 Pod 重启次数太多,导致运营成本发生不必要的增加。
51+
52+
<!--
53+
This becomes particularly problematic when running large-scale Jobs with
54+
thousands of long-running Pods across thousands of nodes.
55+
56+
The Pod failure policy extends the backoff limit mechanism to help you reduce
57+
costs in the following ways:
58+
59+
- Gives you control to fail the Job as soon as a non-retriable Pod failure occurs.
60+
- Allows you to ignore retriable errors without increasing the `backoffLimit` field.
61+
-->
62+
在运行大规模的、包含跨数千节点且长时间运行的 Pod 的 Job 时,这个问题尤其严重。
63+
64+
Pod 失效策略扩展了回退限制机制,帮助你通过以下方式降低成本:
65+
66+
- 让你在出现不可重试的 Pod 失效时控制 Job 失败。
67+
- 允许你忽略可重试的错误,而不增加 `backoffLimit` 字段。
68+
69+
<!--
70+
For example, you can use a Pod failure policy to run your workload on more affordable spot machines
71+
by ignoring Pod failures caused by
72+
[graceful node shutdown](/docs/concepts/cluster-administration/node-shutdown/#graceful-node-shutdown).
73+
74+
The policy allows you to distinguish between retriable and non-retriable Pod
75+
failures based on container exit codes or Pod conditions in a failed Pod.
76+
-->
77+
例如,通过忽略由[节点体面关闭](/zh-cn/docs/concepts/cluster-administration/node-shutdown/#graceful-node-shutdown)引起的
78+
Pod 失效,你可以使用 Pod 失效策略在更实惠的临时机器上运行你的工作负载。
79+
80+
此策略允许你基于失效 Pod 中的容器退出码或 Pod 状况来区分可重试和不可重试的 Pod 失效。
81+
82+
<!--
83+
## How it works
84+
85+
You specify a Pod failure policy in the Job specification, represented as a list
86+
of rules.
87+
88+
For each rule you define _match requirements_ based on one of the following properties:
89+
90+
- Container exit codes: the `onExitCodes` property.
91+
- Pod conditions: the `onPodConditions` property.
92+
-->
93+
## 它是如何工作的
94+
95+
你在 Job 规约中指定的 Pod 失效策略是一个规则的列表。
96+
97+
对于每个规则,你基于以下属性之一来定义**匹配条件**
98+
99+
- 容器退出码:`onExitCodes` 属性。
100+
- Pod 状况:`onPodConditions` 属性。
101+
102+
<!--
103+
Additionally, for each rule, you specify one of the following actions to take
104+
when a Pod matches the rule:
105+
- `Ignore`: Do not count the failure towards the `backoffLimit` or `backoffLimitPerIndex`.
106+
- `FailJob`: Fail the entire Job and terminate all running Pods.
107+
- `FailIndex`: Fail the index corresponding to the failed Pod.
108+
This action works with the [Backoff limit per index](/docs/concepts/workloads/controllers/job/#backoff-limit-per-index) feature.
109+
- `Count`: Count the failure towards the `backoffLimit` or `backoffLimitPerIndex`.
110+
This is the default behavior.
111+
-->
112+
此外,对于每个规则,你要指定在 Pod 与此规则匹配时应采取的动作,可选动作为以下之一:
113+
114+
- `Ignore`:不将失效计入 `backoffLimit``backoffLimitPerIndex`
115+
- `FailJob`:让整个 Job 失败并终止所有运行的 Pod。
116+
- `FailIndex`:与失效 Pod 对应的索引失效。
117+
此动作与[逐索引回退限制](/zh-cn/docs/concepts/workloads/controllers/job/#backoff-limit-per-index)特性一起使用。
118+
- `Count`:将失效计入 `backoffLimit``backoffLimitPerIndex`。这是默认行为。
119+
120+
<!--
121+
When Pod failures occur in a running Job, Kubernetes matches the
122+
failed Pod status against the list of Pod failure policy rules, in the specified
123+
order, and takes the corresponding actions for the first matched rule.
124+
125+
Note that when specifying the Pod failure policy, you must also set the Job's
126+
Pod template with `restartPolicy: Never`. This prevents race conditions between
127+
the kubelet and the Job controller when counting Pod failures.
128+
-->
129+
当在运行的 Job 中发生 Pod 失效时,Kubernetes 按所给的顺序将失效 Pod 的状态与
130+
Pod 失效策略规则的列表进行匹配,并根据匹配的第一个规则采取相应的动作。
131+
132+
请注意,在指定 Pod 失效策略时,你还必须在 Job 的 Pod 模板中设置 `restartPolicy: Never`
133+
此字段可以防止在对 Pod 失效计数时在 kubelet 和 Job 控制器之间出现竞争条件。
134+
135+
<!--
136+
### Kubernetes-initiated Pod disruptions
137+
138+
To allow matching Pod failure policy rules against failures caused by
139+
disruptions initiated by Kubernetes, this feature introduces the `DisruptionTarget`
140+
Pod condition.
141+
142+
Kubernetes adds this condition to any Pod, regardless of whether it's managed by
143+
a Job controller, that fails because of a retriable
144+
[disruption scenario](/docs/concepts/workloads/pods/disruptions/#pod-disruption-conditions).
145+
The `DisruptionTarget` condition contains one of the following reasons that
146+
corresponds to these disruption scenarios:
147+
-->
148+
### Kubernetes 发起的 Pod 干扰
149+
150+
为了允许将 Pod 失效策略规则与由 Kubernetes 引发的干扰所导致的失效进行匹配,
151+
此特性引入了 `DisruptionTarget` Pod 状况。
152+
153+
Kubernetes 会将此状况添加到因可重试的[干扰场景](/zh-cn/docs/concepts/workloads/pods/disruptions/#pod-disruption-conditions)而失效的所有
154+
Pod,无论其是否由 Job 控制器管理。其中 `DisruptionTarget` 状况包含与这些干扰场景对应的以下原因之一:
155+
156+
<!--
157+
- `PreemptionByKubeScheduler`: [Preemption](/docs/concepts/scheduling-eviction/pod-priority-preemption)
158+
by `kube-scheduler` to accommodate a new Pod that has a higher priority.
159+
- `DeletionByTaintManager` - the Pod is due to be deleted by
160+
`kube-controller-manager` due to a `NoExecute` [taint](/docs/concepts/scheduling-eviction/taint-and-toleration/)
161+
that the Pod doesn't tolerate.
162+
- `EvictionByEvictionAPI` - the Pod is due to be deleted by an
163+
[API-initiated eviction](/docs/concepts/scheduling-eviction/api-eviction/).
164+
- `DeletionByPodGC` - the Pod is bound to a node that no longer exists, and is due to
165+
be deleted by [Pod garbage collection](/docs/concepts/workloads/pods/pod-lifecycle/#pod-garbage-collection).
166+
- `TerminationByKubelet` - the Pod was terminated by
167+
[graceful node shutdown](/docs/concepts/cluster-administration/node-shutdown/#graceful-node-shutdown),
168+
[node pressure eviction](/docs/concepts/scheduling-eviction/node-pressure-eviction/)
169+
or preemption for [system critical pods](/docs/tasks/administer-cluster/guaranteed-scheduling-critical-addon-pods/).
170+
-->
171+
- `PreemptionByKubeScheduler`:由 `kube-scheduler`
172+
[抢占](/zh-cn/docs/concepts/scheduling-eviction/pod-priority-preemption)以接纳更高优先级的新 Pod。
173+
- `DeletionByTaintManager` - Pod 因其不容忍的 `NoExecute`
174+
[污点](/zh-cn/docs/concepts/scheduling-eviction/taint-and-toleration/)而被 `kube-controller-manager` 删除。
175+
- `EvictionByEvictionAPI` - Pod 因为 [API 发起的驱逐](/zh-cn/docs/concepts/scheduling-eviction/api-eviction/)而被删除。
176+
- `DeletionByPodGC` - Pod 被绑定到一个不再存在的节点,并将通过
177+
[Pod 垃圾收集](/zh-cn/docs/concepts/workloads/pods/pod-lifecycle/#pod-garbage-collection)而被删除。
178+
- `TerminationByKubelet` - Pod 因[节点体面关闭](/zh-cn/docs/concepts/cluster-administration/node-shutdown/#graceful-node-shutdown)
179+
[节点压力驱逐](/zh-cn/docs/concepts/scheduling-eviction/node-pressure-eviction/)或被[系统关键 Pod](/zh-cn/docs/tasks/administer-cluster/guaranteed-scheduling-critical-addon-pods/)抢占
180+
181+
<!--
182+
In all other disruption scenarios, like eviction due to exceeding
183+
[Pod container limits](/docs/concepts/configuration/manage-resources-containers/),
184+
Pods don't receive the `DisruptionTarget` condition because the disruptions were
185+
likely caused by the Pod and would reoccur on retry.
186+
187+
### Example
188+
189+
The Pod failure policy snippet below demonstrates an example use:
190+
-->
191+
在所有其他干扰场景中,例如因超过
192+
[Pod 容器限制](/zh-cn/docs/concepts/configuration/manage-resources-containers/)而驱逐,
193+
Pod 不会收到 `DisruptionTarget` 状况,因为干扰可能是由 Pod 引起的,并且在重试时会再次发生干扰。
194+
195+
### 示例
196+
197+
下面的 Pod 失效策略片段演示了一种用法:
198+
199+
```yaml
200+
podFailurePolicy:
201+
rules:
202+
- action: Ignore
203+
onPodConditions:
204+
- type: DisruptionTarget
205+
- action: FailJob
206+
onPodConditions:
207+
- type: ConfigIssue
208+
- action: FailJob
209+
onExitCodes:
210+
operator: In
211+
values: [ 42 ]
212+
```
213+
214+
<!--
215+
In this example, the Pod failure policy does the following:
216+
217+
- Ignores any failed Pods that have the built-in `DisruptionTarget`
218+
condition. These Pods don't count towards Job backoff limits.
219+
- Fails the Job if any failed Pods have the custom user-supplied
220+
`ConfigIssue` condition, which was added either by a custom controller or webhook.
221+
- Fails the Job if any containers exited with the exit code 42.
222+
- Counts all other Pod failures towards the default `backoffLimit` (or
223+
`backoffLimitPerIndex` if used).
224+
-->
225+
在这个例子中,Pod 失效策略执行以下操作:
226+
227+
- 忽略任何具有内置 `DisruptionTarget` 状况的失效 Pod。这些 Pod 不计入 Job 回退限制。
228+
- 如果任何失效的 Pod 具有用户自定义的、由自定义控制器或 Webhook 添加的 `ConfigIssue`
229+
状况,则让 Job 失败。
230+
- 如果任何容器以退出码 42 退出,则让 Job 失败。
231+
- 将所有其他 Pod 失效计入默认的 `backoffLimit`(在合适的情况下,计入 `backoffLimitPerIndex`)。
232+
233+
<!--
234+
## Learn more
235+
236+
- For a hands-on guide to using Pod failure policy, see
237+
[Handling retriable and non-retriable pod failures with Pod failure policy](/docs/tasks/job/pod-failure-policy/)
238+
- Read the documentation for
239+
[Pod failure policy](/docs/concepts/workloads/controllers/job/#pod-failure-policy) and
240+
[Backoff limit per index](/docs/concepts/workloads/controllers/job/#backoff-limit-per-index)
241+
- Read the documentation for
242+
[Pod disruption conditions](/docs/concepts/workloads/pods/disruptions/#pod-disruption-conditions)
243+
- Read the KEP for [Pod failure policy](https://github.com/kubernetes/enhancements/tree/master/keps/sig-apps/3329-retriable-and-non-retriable-failures)
244+
-->
245+
## 进一步了解
246+
247+
- 有关使用 Pod 失效策略的实践指南,
248+
参见[使用 Pod 失效策略处理可重试和不可重试的 Pod 失效](/zh-cn/docs/tasks/job/pod-failure-policy/)
249+
- 阅读文档:[Pod 失效策略](/zh-cn/docs/concepts/workloads/controllers/job/#pod-failure-policy)和[逐索引回退限制](/zh-cn/docs/concepts/workloads/controllers/job/#backoff-limit-per-index)
250+
- 阅读文档:[Pod 干扰状况](/zh-cn/docs/concepts/workloads/pods/disruptions/#pod-disruption-conditions)
251+
- 阅读 KEP:[Pod 失效策略](https://github.com/kubernetes/enhancements/tree/master/keps/sig-apps/3329-retriable-and-non-retriable-failures)
252+
253+
<!--
254+
## Related work
255+
256+
Based on the concepts introduced by Pod failure policy, the following additional work is in progress:
257+
- JobSet integration: [Configurable Failure Policy API](https://github.com/kubernetes-sigs/jobset/issues/262)
258+
- [Pod failure policy extension to add more granular failure reasons](https://github.com/kubernetes/enhancements/issues/4443)
259+
- Support for Pod failure policy via JobSet in [Kubeflow Training v2](https://github.com/kubeflow/training-operator/pull/2171)
260+
- Proposal: [Disrupted Pods should be removed from endpoints](https://docs.google.com/document/d/1t25jgO_-LRHhjRXf4KJ5xY_t8BZYdapv7MDAxVGY6R8)
261+
-->
262+
## 相关工作
263+
264+
基于 Pod 失效策略所引入的概念,正在进行中的进一步工作如下:
265+
266+
- JobSet 集成:[可配置的失效策略 API](https://github.com/kubernetes-sigs/jobset/issues/262)
267+
- [扩展 Pod 失效策略以添加更细粒度的失效原因](https://github.com/kubernetes/enhancements/issues/4443)
268+
- 通过 JobSet 在 [Kubeflow Training v2](https://github.com/kubeflow/training-operator/pull/2171)
269+
中支持 Pod 失效策略
270+
- 提案:[受干扰的 Pod 应从端点中移除](https://docs.google.com/document/d/1t25jgO_-LRHhjRXf4KJ5xY_t8BZYdapv7MDAxVGY6R8)
271+
272+
<!--
273+
## Get involved
274+
275+
This work was sponsored by
276+
[batch working group](https://github.com/kubernetes/community/tree/master/wg-batch)
277+
in close collaboration with the
278+
[SIG Apps](https://github.com/kubernetes/community/tree/master/sig-apps),
279+
and [SIG Node](https://github.com/kubernetes/community/tree/master/sig-node),
280+
and [SIG Scheduling](https://github.com/kubernetes/community/tree/master/sig-scheduling)
281+
communities.
282+
-->
283+
## 参与其中
284+
285+
这项工作由 [Batch Working Group(批处理工作组)](https://github.com/kubernetes/community/tree/master/wg-batch) 发起,
286+
与 [SIG Apps](https://github.com/kubernetes/community/tree/master/sig-apps)、
287+
[SIG Node](https://github.com/kubernetes/community/tree/master/sig-node)
288+
和 [SIG Scheduling](https://github.com/kubernetes/community/tree/master/sig-scheduling)
289+
社区密切合作。
290+
291+
<!--
292+
If you are interested in working on new features in the space we recommend
293+
subscribing to our [Slack](https://kubernetes.slack.com/messages/wg-batch)
294+
channel and attending the regular community meetings.
295+
296+
## Acknowledgments
297+
298+
I would love to thank everyone who was involved in this project over the years -
299+
it's been a journey and a joint community effort! The list below is
300+
my best-effort attempt to remember and recognize people who made an impact.
301+
Thank you!
302+
-->
303+
如果你有兴趣处理这个领域中的新特性,建议你订阅我们的
304+
[Slack](https://kubernetes.slack.com/messages/wg-batch) 频道,并参加定期的社区会议。
305+
306+
## 感谢
307+
308+
我想感谢在这些年里参与过这个项目的每个人。
309+
这是一段旅程,也是一个社区共同努力的见证!
310+
以下名单是我尽力记住并对此特性产生过影响的人。感谢大家!
311+
312+
<!--
313+
- [Aldo Culquicondor](https://github.com/alculquicondor/) for guidance and reviews throughout the process
314+
- [Jordan Liggitt](https://github.com/liggitt) for KEP and API reviews
315+
- [David Eads](https://github.com/deads2k) for API reviews
316+
- [Maciej Szulik](https://github.com/soltysh) for KEP reviews from SIG Apps PoV
317+
- [Clayton Coleman](https://github.com/smarterclayton) for guidance and SIG Node reviews
318+
- [Sergey Kanzhelev](https://github.com/SergeyKanzhelev) for KEP reviews from SIG Node PoV
319+
- [Dawn Chen](https://github.com/dchen1107) for KEP reviews from SIG Node PoV
320+
- [Daniel Smith](https://github.com/lavalamp) for reviews from SIG API machinery PoV
321+
- [Antoine Pelisse](https://github.com/apelisse) for reviews from SIG API machinery PoV
322+
- [John Belamaric](https://github.com/johnbelamaric) for PRR reviews
323+
- [Filip Křepinský](https://github.com/atiratree) for thorough reviews from SIG Apps PoV and bug-fixing
324+
- [David Porter](https://github.com/bobbypage) for thorough reviews from SIG Node PoV
325+
- [Jensen Lo](https://github.com/jensentanlo) for early requirements discussions, testing and reporting issues
326+
- [Daniel Vega-Myhre](https://github.com/danielvegamyhre) for advancing JobSet integration and reporting issues
327+
- [Abdullah Gharaibeh](https://github.com/ahg-g) for early design discussions and guidance
328+
- [Antonio Ojea](https://github.com/aojea) for test reviews
329+
- [Yuki Iwai](https://github.com/tenzen-y) for reviews and aligning implementation of the closely related Job features
330+
- [Kevin Hannon](https://github.com/kannon92) for reviews and aligning implementation of the closely related Job features
331+
- [Tim Bannister](https://github.com/sftim) for docs reviews
332+
- [Shannon Kularathna](https://github.com/shannonxtreme) for docs reviews
333+
- [Paola Cortés](https://github.com/cortespao) for docs reviews
334+
-->
335+
- [Aldo Culquicondor](https://github.com/alculquicondor/) 在整个过程中提供指导和审查
336+
- [Jordan Liggitt](https://github.com/liggitt) 审查 KEP 和 API
337+
- [David Eads](https://github.com/deads2k) 审查 API
338+
- [Maciej Szulik](https://github.com/soltysh) 从 SIG Apps 角度审查 KEP
339+
- [Clayton Coleman](https://github.com/smarterclayton) 提供指导和 SIG Node 审查
340+
- [Sergey Kanzhelev](https://github.com/SergeyKanzhelev) 从 SIG Node 角度审查 KEP
341+
- [Dawn Chen](https://github.com/dchen1107) 从 SIG Node 角度审查 KEP
342+
- [Daniel Smith](https://github.com/lavalamp) 从 SIG API Machinery 角度进行审查
343+
- [Antoine Pelisse](https://github.com/apelisse) 从 SIG API Machinery 角度进行审查
344+
- [John Belamaric](https://github.com/johnbelamaric) 审查 PRR
345+
- [Filip Křepinský](https://github.com/atiratree) 从 SIG Apps 角度进行全面审查并修复 Bug
346+
- [David Porter](https://github.com/bobbypage) 从 SIG Node 角度进行全面审查
347+
- [Jensen Lo](https://github.com/jensentanlo) 进行早期需求讨论、测试和报告问题
348+
- [Daniel Vega-Myhre](https://github.com/danielvegamyhre) 推进 JobSet 集成并报告问题
349+
- [Abdullah Gharaibeh](https://github.com/ahg-g) 进行早期设计讨论和指导
350+
- [Antonio Ojea](https://github.com/aojea) 审查测试
351+
- [Yuki Iwai](https://github.com/tenzen-y) 审查并协调相关 Job 特性的实现
352+
- [Kevin Hannon](https://github.com/kannon92) 审查并协调相关 Job 特性的实现
353+
- [Tim Bannister](https://github.com/sftim) 审查文档
354+
- [Shannon Kularathna](https://github.com/shannonxtreme) 审查文档
355+
- [Paola Cortés](https://github.com/cortespao) 审查文档

0 commit comments

Comments
 (0)