Skip to content

Commit eb5f941

Browse files
authored
Merge pull request #42613 from my-git9/blog-job-update
[zh-cn] sync blog: 2023-08-21-job-update-post.md
2 parents 0d841ef + 4b1a579 commit eb5f941

File tree

1 file changed

+373
-0
lines changed

1 file changed

+373
-0
lines changed
Lines changed: 373 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,373 @@
1+
---
2+
layout: blog
3+
title: "Kubernetes 1.28:Job 失效处理的改进"
4+
date: 2023-08-21
5+
slug: kubernetes-1-28-jobapi-update
6+
---
7+
8+
<!--
9+
layout: blog
10+
title: "Kubernetes 1.28: Improved failure handling for Jobs"
11+
date: 2023-08-21
12+
slug: kubernetes-1-28-jobapi-update
13+
-->
14+
15+
<!--
16+
**Authors:** Kevin Hannon (G-Research), Michał Woźniak (Google)
17+
-->
18+
**作者:** Kevin Hannon (G-Research), Michał Woźniak (Google)
19+
20+
**译者:** Xin Li (Daocloud)
21+
22+
<!--
23+
This blog discusses two new features in Kubernetes 1.28 to improve Jobs for batch
24+
users: [Pod replacement policy](/docs/concepts/workloads/controllers/job/#pod-replacement-policy)
25+
and [Backoff limit per index](/docs/concepts/workloads/controllers/job/#backoff-limit-per-index).
26+
-->
27+
本博客讨论 Kubernetes 1.28 中的两个新特性,用于为批处理用户改进 Job:
28+
[Pod 更换策略](/zh-cn/docs/concepts/workloads/controllers/job/#pod-replacement-policy)
29+
[基于索引的回退限制](/zh-cn/docs/concepts/workloads/controllers/job/#backoff-limit-per-index)
30+
31+
<!--
32+
These features continue the effort started by the
33+
[Pod failure policy](/docs/concepts/workloads/controllers/job/#pod-failure-policy)
34+
to improve the handling of Pod failures in a Job.
35+
-->
36+
这些特性延续了 [Pod 失效策略](/zh-cn/docs/concepts/workloads/controllers/job/#pod-failure-policy)
37+
为开端的工作,用来改进对 Job 中 Pod 失效的处理。
38+
39+
<!--
40+
## Pod replacement policy {#pod-replacement-policy}
41+
42+
By default, when a pod enters a terminating state (e.g. due to preemption or
43+
eviction), Kubernetes immediately creates a replacement Pod. Therefore, both Pods are running
44+
at the same time. In API terms, a pod is considered terminating when it has a
45+
`deletionTimestamp` and it has a phase `Pending` or `Running`.
46+
-->
47+
## Pod 更换策略 {#pod-replacement-policy}
48+
49+
默认情况下,当 Pod 进入终止(Terminating)状态(例如由于抢占或驱逐机制)时,Kubernetes
50+
会立即创建一个替换的 Pod,因此这时会有两个 Pod 同时运行。就 API 而言,当 Pod 具有
51+
`deletionTimestamp` 字段并且处于 `Pending``Running` 阶段时会被视为终止。
52+
53+
<!--
54+
The scenario when two Pods are running at a given time is problematic for
55+
some popular machine learning frameworks, such as
56+
TensorFlow and [JAX](https://jax.readthedocs.io/en/latest/), which require at most one Pod running at the same time,
57+
for a given index.
58+
Tensorflow gives the following error if two pods are running for a given index.
59+
-->
60+
对于一些流行的机器学习框架来说,在给定时间运行两个 Pod 的情况是有问题的,
61+
例如 TensorFlow 和 [JAX](https://jax.readthedocs.io/en/latest/)
62+
对于给定的索引,它们最多同时运行一个 Pod。如果两个 Pod 使用同一个索引来运行,
63+
Tensorflow 会抛出以下错误:
64+
65+
```
66+
/job:worker/task:4: Duplicate task registration with task_name=/job:worker/replica:0/task:4
67+
```
68+
69+
<!--
70+
See more details in the ([issue](https://github.com/kubernetes/kubernetes/issues/115844)).
71+
72+
Creating the replacement Pod before the previous one fully terminates can also
73+
cause problems in clusters with scarce resources or with tight budgets, such as:
74+
* cluster resources can be difficult to obtain for Pods pending to be scheduled,
75+
as Kubernetes might take a long time to find available nodes until the existing
76+
Pods are fully terminated.
77+
* if cluster autoscaler is enabled, the replacement Pods might produce undesired
78+
scale ups.
79+
-->
80+
可参考[问题报告](https://github.com/kubernetes/kubernetes/issues/115844)进一步了解细节。
81+
82+
在前一个 Pod 完全终止之前创建替换的 Pod 也可能会导致资源或预算紧张的集群出现问题,例如:
83+
84+
* 对于待调度的 Pod 来说,很难分配到集群资源,导致 Kubernetes 需要很长时间才能找到可用节点,
85+
直到现有 Pod 完全终止。
86+
* 如果启用了集群自动扩缩器(Cluster Autoscaler),可能会产生不必要的集群规模扩增。
87+
88+
<!--
89+
### How can you use it? {#pod-replacement-policy-how-to-use}
90+
91+
This is an alpha feature, which you can enable by turning on `JobPodReplacementPolicy`
92+
[feature gate](/docs/reference/command-line-tools-reference/feature-gates/) in
93+
your cluster.
94+
95+
Once the feature is enabled in your cluster, you can use it by creating a new Job that specifies a
96+
`podReplacementPolicy` field as shown here:
97+
-->
98+
### 如何使用? {#pod-replacement-policy-how-to-use}
99+
100+
这是一项 Alpha 级别特性,你可以通过在集群中启用 `JobPodReplacementPolicy`
101+
[特性门控](/zh-cn/docs/reference/command-line-tools-reference/feature-gates/)
102+
来启用该特性。
103+
104+
```yaml
105+
kind: Job
106+
metadata:
107+
name: new
108+
...
109+
spec:
110+
podReplacementPolicy: Failed
111+
...
112+
```
113+
114+
<!--
115+
In that Job, the Pods would only be replaced once they reached the `Failed` phase,
116+
and not when they are terminating.
117+
118+
Additionally, you can inspect the `.status.terminating` field of a Job. The value
119+
of the field is the number of Pods owned by the Job that are currently terminating.
120+
-->
121+
在此 Job 中,Pod 仅在达到 `Failed` 阶段时才会被替换,而不是在它们处于终止过程中(Terminating)时被替换。
122+
123+
此外,你可以检查 Job 的 `.status.termination` 字段。该字段的值表示终止过程中的
124+
Job 所关联的 Pod 数量。
125+
126+
```shell
127+
kubectl get jobs/myjob -o=jsonpath='{.items[*].status.terminating}'
128+
```
129+
130+
```
131+
3 # three Pods are terminating and have not yet reached the Failed phase
132+
```
133+
134+
<!--
135+
This can be particularly useful for external queueing controllers, such as
136+
[Kueue](https://github.com/kubernetes-sigs/kueue), that tracks quota
137+
from running Pods of a Job until the resources are reclaimed from
138+
the currently terminating Job.
139+
140+
Note that the `podReplacementPolicy: Failed` is the default when using a custom
141+
[Pod failure policy](/docs/concepts/workloads/controllers/job/#pod-failure-policy).
142+
-->
143+
这一特性对于外部排队控制器(例如 [Kueue](https://github.com/kubernetes-sigs/kueue))特别有用,
144+
它跟踪作业的运行 Pod 的配额,直到从当前终止过程中的 Job 资源被回收为止。
145+
146+
请注意,使用自定义 [Pod 失败策略](/zh-cn/docs/concepts/workloads/controllers/job/#pod-failure-policy)时,
147+
`podReplacementPolicy: Failed` 是默认值。
148+
149+
<!--
150+
## Backoff limit per index {#backoff-limit-per-index}
151+
152+
By default, Pod failures for [Indexed Jobs](/docs/concepts/workloads/controllers/job/#completion-mode)
153+
are counted towards the global limit of retries, represented by `.spec.backoffLimit`.
154+
This means, that if there is a consistently failing index, it is restarted
155+
repeatedly until it exhausts the limit. Once the limit is reached the entire
156+
Job is marked failed and some indexes may never be even started.
157+
-->
158+
## 逐索引的回退限制 {#backoff-limit-per-index}
159+
160+
默认情况下,[带索引的 Job(Indexed Job)](/zh-cn/docs/concepts/workloads/controllers/job/#completion-mode)
161+
Pod 失败情况会被统计下来,受 `.spec.backoffLimit` 字段所设置的全局重试次数限制。
162+
这意味着,如果存在某个索引值的 Pod 一直持续失败,则会 Pod 会被重新启动,直到重试次数达到限制值。
163+
一旦达到限制值,整个 Job 将被标记为失败,并且对应某些索引的 Pod 甚至可能从不曾被启动。
164+
165+
<!--
166+
This is problematic for use cases where you want to handle Pod failures for
167+
every index independently. For example, if you use Indexed Jobs for running
168+
integration tests where each index corresponds to a testing suite. In that case,
169+
you may want to account for possible flake tests allowing for 1 or 2 retries per
170+
suite. There might be some buggy suites, making the corresponding
171+
indexes fail consistently. In that case you may prefer to limit retries for
172+
the buggy suites, yet allowing other suites to complete.
173+
-->
174+
对于你想要独立处理不同索引值的 Pod 的失败的场景而言,这是有问题的。
175+
例如,如果你使用带索引的 Job(Indexed Job)来运行集成测试,其中每个索引值对应一个测试套件。
176+
在这种情况下,你可能需要考虑可能发生的脆弱测试(Flake Test),允许每个套件重试 1 次或 2 次。
177+
可能存在一些有缺陷的套件,导致对应索引的 Pod 始终失败。在这种情况下,
178+
你或许更希望限制有问题的套件的重试,而允许其他套件完成。
179+
180+
<!--
181+
The feature allows you to:
182+
* complete execution of all indexes, despite some indexes failing.
183+
* better utilize the computational resources by avoiding unnecessary retries of consistently failing indexes.
184+
-->
185+
此特性允许你:
186+
* 尽管某些索引值的 Pod 失败,但仍完成执行所有索引值的 Pod。
187+
* 通过避免对持续失败的、特定索引值的 Pod 进行不必要的重试,更好地利用计算资源。
188+
189+
<!--
190+
### How can you use it? {#backoff-limit-per-index-how-to-use}
191+
192+
This is an alpha feature, which you can enable by turning on the
193+
`JobBackoffLimitPerIndex`
194+
[feature gate](/docs/reference/command-line-tools-reference/feature-gates/)
195+
in your cluster.
196+
197+
Once the feature is enabled in your cluster, you can create an Indexed Job with the
198+
`.spec.backoffLimitPerIndex` field specified.
199+
-->
200+
### 可以如何使用它? {#backoff-limit-per-index-how-to-use}
201+
202+
这是一个 Alpha 特性,你可以通过启用集群的 `JobBackoffLimitPerIndex`
203+
[特性门控](/zh-cn/docs/reference/command-line-tools-reference/feature-gates/)来启用此特性。
204+
205+
在集群中启用该特性后,你可以在创建带索引的 Job(Indexed Job)时指定 `.spec.backoffLimitPerIndex` 字段。
206+
207+
<!--
208+
#### Example
209+
210+
The following example demonstrates how to use this feature to make sure the
211+
Job executes all indexes (provided there is no other reason for the early Job
212+
termination, such as reaching the `activeDeadlineSeconds` timeout, or being
213+
manually deleted by the user), and the number of failures is controlled per index.
214+
-->
215+
#### 示例
216+
217+
下面的示例演示如何使用此功能来确保 Job 执行所有索引值的 Pod(前提是没有其他原因导致 Job 提前终止,
218+
例如达到 `activeDeadlineSeconds` 超时,或者被用户手动删除),以及按索引控制失败次数。
219+
220+
```yaml
221+
apiVersion: batch/v1
222+
kind: Job
223+
metadata:
224+
name: job-backoff-limit-per-index-execute-all
225+
spec:
226+
completions: 8
227+
parallelism: 2
228+
completionMode: Indexed
229+
backoffLimitPerIndex: 1
230+
template:
231+
spec:
232+
restartPolicy: Never
233+
containers:
234+
- name: example # 当此示例容器作为任何 Job 中的第二个或第三个索引运行时(即使在重试之后),它会返回错误并失败
235+
image: python
236+
command:
237+
- python3
238+
- -c
239+
- |
240+
import os, sys, time
241+
id = int(os.environ.get("JOB_COMPLETION_INDEX"))
242+
if id == 1 or id == 2:
243+
sys.exit(1)
244+
time.sleep(1)
245+
```
246+
247+
<!--
248+
Now, inspect the Pods after the job is finished:
249+
-->
250+
现在,在 Job 完成后检查 Pod:
251+
252+
```sh
253+
kubectl get pods -l job-name=job-backoff-limit-per-index-execute-all
254+
```
255+
256+
<!--
257+
Returns output similar to this:
258+
-->
259+
返回的输出类似与:
260+
261+
```
262+
NAME READY STATUS RESTARTS AGE
263+
job-backoff-limit-per-index-execute-all-0-b26vc 0/1 Completed 0 49s
264+
job-backoff-limit-per-index-execute-all-1-6j5gd 0/1 Error 0 49s
265+
job-backoff-limit-per-index-execute-all-1-6wd82 0/1 Error 0 37s
266+
job-backoff-limit-per-index-execute-all-2-c66hg 0/1 Error 0 32s
267+
job-backoff-limit-per-index-execute-all-2-nf982 0/1 Error 0 43s
268+
job-backoff-limit-per-index-execute-all-3-cxmhf 0/1 Completed 0 33s
269+
job-backoff-limit-per-index-execute-all-4-9q6kq 0/1 Completed 0 28s
270+
job-backoff-limit-per-index-execute-all-5-z9hqf 0/1 Completed 0 28s
271+
job-backoff-limit-per-index-execute-all-6-tbkr8 0/1 Completed 0 23s
272+
job-backoff-limit-per-index-execute-all-7-hxjsq 0/1 Completed 0 22s
273+
```
274+
275+
<!--
276+
Additionally, you can take a look at the status for that Job:
277+
-->
278+
此外,你可以查看该 Job 的状态:
279+
280+
```sh
281+
kubectl get jobs job-backoff-limit-per-index-fail-index -o yaml
282+
```
283+
284+
<!--
285+
The output ends with a `status` similar to:
286+
-->
287+
输出内容以 `status` 结尾,类似于:
288+
289+
```yaml
290+
status:
291+
completedIndexes: 0,3-7
292+
failedIndexes: 1,2
293+
succeeded: 6
294+
failed: 4
295+
conditions:
296+
- message: Job has failed indexes
297+
reason: FailedIndexes
298+
status: "True"
299+
type: Failed
300+
```
301+
302+
<!--
303+
Here, indexes `1` and `2` were both retried once. After the second failure,
304+
in each of them, the specified `.spec.backoffLimitPerIndex` was exceeded, so
305+
the retries were stopped. For comparison, if the per-index backoff was disabled,
306+
then the buggy indexes would retry until the global `backoffLimit` was exceeded,
307+
and then the entire Job would be marked failed, before some of the higher
308+
indexes are started.
309+
-->
310+
这里,索引为 `1` 和 `2` 的 Pod 都被重试了一次。这两个 Pod 在第二次失败后都超出了指定的
311+
`.spec.backoffLimitPerIndex`,因此停止重试。相比之下,如果禁用了基于索引的回退,
312+
那么有问题的、特定索引的 Pod 将被重试,直到超出全局 `backoffLimit`,之后在启动一些索引值较高的 Pod 之前,
313+
整个 Job 将被标记为失败。
314+
315+
<!--
316+
## How can you learn more?
317+
318+
- Read the user-facing documentation for [Pod replacement policy](/docs/concepts/workloads/controllers/job/#pod-replacement-policy),
319+
[Backoff limit per index](/docs/concepts/workloads/controllers/job/#backoff-limit-per-index), and
320+
[Pod failure policy](/docs/concepts/workloads/controllers/job/#pod-failure-policy)
321+
- Read the KEPs for [Pod Replacement Policy](https://github.com/kubernetes/enhancements/tree/master/keps/sig-apps/3939-allow-replacement-when-fully-terminated),
322+
[Backoff limit per index](https://github.com/kubernetes/enhancements/tree/master/keps/sig-apps/3850-backoff-limits-per-index-for-indexed-jobs), and
323+
[Pod failure policy](https://github.com/kubernetes/enhancements/tree/master/keps/sig-apps/3329-retriable-and-non-retriable-failures).
324+
-->
325+
## 如何进一步了解 {#how-can-you-learn-more}
326+
327+
- 阅读面向用户的 [Pod 替换策略](/zh-cn/docs/concepts/workloads/controllers/job/#pod-replacement-policy)文档、
328+
[逐索引的回退限制](/zh-cn/docs/concepts/workloads/controllers/job/#backoff-limit-per-index)和
329+
[Pod 失效策略](/zh-cn/docs/concepts/workloads/controllers/job/#pod-failure-policy)
330+
- 阅读 [Pod 替换策略](https://github.com/kubernetes/enhancements/tree/master/keps/sig-apps/3939-allow-replacement-when-fully-terminated))、
331+
[逐索引的回退限制](https://github.com/kubernetes/enhancements/tree/master/keps/sig-apps/3850-backoff-limits-per-index-for-indexed-jobs)和
332+
[Pod 失效策略](https://github.com/kubernetes/enhancements/tree/master/keps/sig-apps/3329-retriable-and-non-retriable-failures)的 KEP。
333+
334+
<!--
335+
## Getting Involved
336+
337+
These features were sponsored by [SIG Apps](https://github.com/kubernetes/community/tree/master/sig-apps). Batch use cases are actively
338+
being improved for Kubernetes users in the
339+
[batch working group](https://github.com/kubernetes/community/tree/master/wg-batch).
340+
Working groups are relatively short-lived initiatives focused on specific goals.
341+
The goal of the WG Batch is to improve experience for batch workload users, offer support for
342+
batch processing use cases, and enhance the
343+
Job API for common use cases. If that interests you, please join the working
344+
group either by subscriping to our
345+
[mailing list](https://groups.google.com/a/kubernetes.io/g/wg-batch) or on
346+
[Slack](https://kubernetes.slack.com/messages/wg-batch).
347+
-->
348+
## 参与其中 {#getting-Involved}
349+
350+
这些功能由 [SIG Apps](https://github.com/kubernetes/community/tree/master/sig-apps) 赞助。
351+
社区正在为[批处理工作组](https://github.com/kubernetes/community/tree/master/wg-batch)中的
352+
Kubernetes 用户积极改进批处理场景。
353+
工作组是相对短暂的举措,专注于特定目标。WG Batch 的目标是改善批处理工作负载的用户体验、
354+
提供对批处理场景的支持并增强常见场景下的 Job API。
355+
如果你对此感兴趣,请通过订阅我们的[邮件列表](https://groups.google.com/a/kubernetes.io/g/wg-batch)或通过
356+
[Slack](https://kubernetes.slack.com/messages/wg-batch) 加入进来。
357+
358+
<!--
359+
## Acknowledgments
360+
361+
As with any Kubernetes feature, multiple people contributed to getting this
362+
done, from testing and filing bugs to reviewing code.
363+
364+
We would not have been able to achieve either of these features without Aldo
365+
Culquicondor (Google) providing excellent domain knowledge and expertise
366+
throughout the Kubernetes ecosystem.
367+
-->
368+
## 致谢 {#acknowledgments}
369+
370+
与其他 Kubernetes 特性一样,从测试、报告缺陷到代码审查,很多人为此特性做出了贡献。
371+
372+
如果没有 Aldo Culquicondor(Google)提供出色的领域知识和跨整个 Kubernetes 生态系统的知识,
373+
我们可能无法实现这些特性。

0 commit comments

Comments
 (0)