Skip to content

Commit 7a34c31

Browse files
authored
Merge pull request #40831 from Zhuzhenghao/1.27/job
[zh] resync 1.27 job
2 parents b6f2790 + 284d90b commit 7a34c31

File tree

1 file changed

+116
-54
lines changed
  • content/zh-cn/docs/concepts/workloads/controllers

1 file changed

+116
-54
lines changed

content/zh-cn/docs/concepts/workloads/controllers/job.md

Lines changed: 116 additions & 54 deletions
Original file line numberDiff line numberDiff line change
@@ -94,22 +94,22 @@ Check on the status of the Job with `kubectl`:
9494

9595
{{< tabs name="Check status of Job" >}}
9696
{{< tab name="kubectl describe job pi" codelang="bash" >}}
97-
Name: pi
98-
Namespace: default
99-
Selector: controller-uid=0cd26dd5-88a2-4a5f-a203-ea19a1d5d578
100-
Labels: controller-uid=0cd26dd5-88a2-4a5f-a203-ea19a1d5d578
101-
job-name=pi
102-
Annotations: batch.kubernetes.io/job-tracking:
103-
Parallelism: 1
104-
Completions: 1
105-
Completion Mode: NonIndexed
106-
Start Time: Fri, 28 Oct 2022 13:05:18 +0530
107-
Completed At: Fri, 28 Oct 2022 13:05:21 +0530
108-
Duration: 3s
109-
Pods Statuses: 0 Active / 1 Succeeded / 0 Failed
97+
Name: pi
98+
Namespace: default
99+
Selector: batch.kubernetes.io/controller-uid=c9948307-e56d-4b5d-8302-ae2d7b7da67c
100+
Labels: batch.kubernetes.io/controller-uid=c9948307-e56d-4b5d-8302-ae2d7b7da67c
101+
batch.kubernetes.io/job-name=pi
102+
...
103+
Annotations: batch.kubernetes.io/job-tracking: ""
104+
Parallelism: 1
105+
Completions: 1
106+
Start Time: Mon, 02 Dec 2019 15:20:11 +0200
107+
Completed At: Mon, 02 Dec 2019 15:21:16 +0200
108+
Duration: 65s
109+
Pods Statuses: 0 Running / 1 Succeeded / 0 Failed
110110
Pod Template:
111-
Labels: controller-uid=0cd26dd5-88a2-4a5f-a203-ea19a1d5d578
112-
job-name=pi
111+
Labels: batch.kubernetes.io/controller-uid=c9948307-e56d-4b5d-8302-ae2d7b7da67c
112+
batch.kubernetes.io/job-name=pi
113113
Containers:
114114
pi:
115115
Image: perl:5.34.0
@@ -133,15 +133,13 @@ Events:
133133
apiVersion: batch/v1
134134
kind: Job
135135
metadata:
136-
annotations:
137-
batch.kubernetes.io/job-tracking: ""
138-
kubectl.kubernetes.io/last-applied-configuration: |
139-
{"apiVersion":"batch/v1","kind":"Job","metadata":{"annotations":{},"name":"pi","namespace":"default"},"spec":{"backoffLimit":4,"template":{"spec":{"containers":[{"command":["perl","-Mbignum=bpi","-wle","print bpi(2000)"],"image":"perl:5.34.0","name":"pi"}],"restartPolicy":"Never"}}}}
136+
annotations: batch.kubernetes.io/job-tracking: ""
137+
...
140138
creationTimestamp: "2022-11-10T17:53:53Z"
141139
generation: 1
142140
labels:
143-
controller-uid: 204fb678-040b-497f-9266-35ffa8716d14
144-
job-name: pi
141+
batch.kubernetes.io/controller-uid: 863452e6-270d-420e-9b94-53a54146c223
142+
batch.kubernetes.io/job-name: pi
145143
name: pi
146144
namespace: default
147145
resourceVersion: "4751"
@@ -153,14 +151,14 @@ spec:
153151
parallelism: 1
154152
selector:
155153
matchLabels:
156-
controller-uid: 204fb678-040b-497f-9266-35ffa8716d14
154+
batch.kubernetes.io/controller-uid: 863452e6-270d-420e-9b94-53a54146c223
157155
suspend: false
158156
template:
159157
metadata:
160158
creationTimestamp: null
161159
labels:
162-
controller-uid: 204fb678-040b-497f-9266-35ffa8716d14
163-
job-name: pi
160+
batch.kubernetes.io/controller-uid: 863452e6-270d-420e-9b94-53a54146c223
161+
batch.kubernetes.io/job-name: pi
164162
spec:
165163
containers:
166164
- command:
@@ -197,7 +195,7 @@ To list all the Pods that belong to a Job in a machine readable form, you can us
197195
要以机器可读的方式列举隶属于某 Job 的全部 Pod,你可以使用类似下面这条命令:
198196

199197
```shell
200-
pods=$(kubectl get pods --selector=job-name=pi --output=jsonpath='{.items[*].metadata.name}')
198+
pods=$(kubectl get pods --selector=batch.kubernetes.io/job-name=pi --output=jsonpath='{.items[*].metadata.name}')
201199
echo $pods
202200
```
203201

@@ -225,6 +223,15 @@ View the standard output of one of the pods:
225223
kubectl logs $pods
226224
```
227225

226+
<!--
227+
Another way to view the logs of a Job:
228+
-->
229+
另外一种查看 Job 日志的方法:
230+
231+
```shell
232+
kubectl logs jobs/pi
233+
```
234+
228235
<!--
229236
The output is similar to this:
230237
-->
@@ -262,6 +269,15 @@ Job 的名字必须是合法的 [DNS 子域名](/zh-cn/docs/concepts/overview/wo
262269

263270
Job 配置还需要一个 [`.spec`](https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#spec-and-status)
264271

272+
<!--
273+
### Job Labels
274+
-->
275+
### Job 标签
276+
277+
<!--
278+
Job labels will have `batch.kubernetes.io/` prefix for `job-name` and `controller-uid`.
279+
-->
280+
Job 标签将为 `job-name``controller-uid` 加上 `batch.kubernetes.io/` 前缀。
265281
<!--
266282
### Pod Template
267283
@@ -1058,7 +1074,7 @@ Job 被恢复执行时,Pod 创建操作立即被重启执行。
10581074
-->
10591075
### 可变调度指令 {#mutable-scheduling-directives}
10601076
1061-
{{< feature-state for_k8s_version="v1.23" state="beta" >}}
1077+
{{< feature-state for_k8s_version="v1.27" state="stable" >}}
10621078
10631079
{{< note >}}
10641080
<!--
@@ -1102,9 +1118,10 @@ been unsuspended before.
11021118
11031119
<!--
11041120
The fields in a Job's pod template that can be updated are node affinity, node selector,
1105-
tolerations, labels and annotations.
1121+
tolerations, labels, annotations and [scheduling gates](/docs/concepts/scheduling-eviction/pod-scheduling-readiness/).
11061122
-->
1107-
Job 的 Pod 模板中可以更新的字段是节点亲和性、节点选择器、容忍、标签和注解。
1123+
Job 的 Pod 模板中可以更新的字段是节点亲和性、节点选择器、容忍、标签、注解和
1124+
[调度门控](/zh-cn/docs/concepts/scheduling-eviction/pod-scheduling-readiness/)。
11081125
11091126
<!--
11101127
### Specifying your own Pod selector
@@ -1181,20 +1198,21 @@ metadata:
11811198
spec:
11821199
selector:
11831200
matchLabels:
1184-
controller-uid: a8f3d00d-c6d2-11e5-9f87-42010af00002
1201+
batch.kubernetes.io/controller-uid: a8f3d00d-c6d2-11e5-9f87-42010af00002
11851202
...
11861203
```
11871204

11881205
<!--
11891206
Then you create a new Job with name `new` and you explicitly specify the same selector.
1190-
Since the existing Pods have label `controller-uid=a8f3d00d-c6d2-11e5-9f87-42010af00002`,
1207+
Since the existing Pods have label `batch.kubernetes.io/controller-uid=a8f3d00d-c6d2-11e5-9f87-42010af00002`,
11911208
they are controlled by Job `new` as well.
11921209
11931210
You need to specify `manualSelector: true` in the new Job since you are not using
11941211
the selector that the system normally generates for you automatically.
11951212
-->
11961213
接下来你会创建名为 `new` 的新 Job,并显式地为其设置相同的选择算符。
1197-
由于现有 Pod 都具有标签 `controller-uid=a8f3d00d-c6d2-11e5-9f87-42010af00002`
1214+
由于现有 Pod 都具有标签
1215+
`batch.kubernetes.io/controller-uid=a8f3d00d-c6d2-11e5-9f87-42010af00002`
11981216
它们也会被名为 `new` 的 Job 所控制。
11991217

12001218
你需要在新 Job 中设置 `manualSelector: true`
@@ -1209,7 +1227,7 @@ spec:
12091227
manualSelector: true
12101228
selector:
12111229
matchLabels:
1212-
controller-uid: a8f3d00d-c6d2-11e5-9f87-42010af00002
1230+
batch.kubernetes.io/controller-uid: a8f3d00d-c6d2-11e5-9f87-42010af00002
12131231
...
12141232
```
12151233

@@ -1223,14 +1241,14 @@ mismatch.
12231241
是在告诉系统你知道自己在干什么并要求系统允许这种不匹配的存在。
12241242

12251243
<!--
1226-
### Pod failure policy {#pod-failure-policy}
1244+
### Pod failure policy {#pod-failure-policy}
12271245
-->
12281246
### Pod 失效策略 {#pod-failure-policy}
12291247

12301248
{{< feature-state for_k8s_version="v1.26" state="beta" >}}
12311249

12321250
{{< note >}}
1233-
<!--
1251+
<!--
12341252
You can only configure a Pod failure policy for a Job if you have the
12351253
`JobPodFailurePolicy` [feature gate](/docs/reference/command-line-tools-reference/feature-gates/)
12361254
enabled in your cluster. Additionally, it is recommended
@@ -1247,23 +1265,23 @@ available in Kubernetes {{< skew currentVersion >}}.
12471265
这两个特性门控都是在 Kubernetes {{< skew currentVersion >}} 中提供的。
12481266
{{< /note >}}
12491267

1250-
<!--
1268+
<!--
12511269
A Pod failure policy, defined with the `.spec.podFailurePolicy` field, enables
12521270
your cluster to handle Pod failures based on the container exit codes and the
1253-
Pod conditions.
1271+
Pod conditions.
12541272
-->
12551273
Pod 失效策略使用 `.spec.podFailurePolicy` 字段来定义,
12561274
它能让你的集群根据容器的退出码和 Pod 状况来处理 Pod 失效事件。
12571275

1258-
<!--
1276+
<!--
12591277
In some situations, you may want to have a better control when handling Pod
12601278
failures than the control provided by the [Pod backoff failure policy](#pod-backoff-failure-policy),
1261-
which is based on the Job's `.spec.backoffLimit`. These are some examples of use cases:
1279+
which is based on the Job's `.spec.backoffLimit`. These are some examples of use cases:
12621280
-->
12631281
在某些情况下,你可能希望更好地控制 Pod 失效的处理方式,
12641282
而不是仅限于 [Pod 回退失效策略](#pod-backoff-failure-policy)所提供的控制能力,
12651283
后者是基于 Job 的 `.spec.backoffLimit` 实现的。以下是一些使用场景:
1266-
<!--
1284+
<!--
12671285
* To optimize costs of running workloads by avoiding unnecessary Pod restarts,
12681286
you can terminate a Job as soon as one of its Pods fails with an exit code
12691287
indicating a software bug.
@@ -1281,30 +1299,30 @@ which is based on the Job's `.spec.backoffLimit`. These are some examples of use
12811299
或基于{{< glossary_tooltip text="污点" term_id="taint" >}}的驱逐),
12821300
这样这些失效就不会被计入 `.spec.backoffLimit` 的重试限制中。
12831301

1284-
<!--
1302+
<!--
12851303
You can configure a Pod failure policy, in the `.spec.podFailurePolicy` field,
12861304
to meet the above use cases. This policy can handle Pod failures based on the
1287-
container exit codes and the Pod conditions.
1305+
container exit codes and the Pod conditions.
12881306
-->
12891307
你可以在 `.spec.podFailurePolicy` 字段中配置 Pod 失效策略,以满足上述使用场景。
12901308
该策略可以根据容器退出码和 Pod 状况来处理 Pod 失效。
12911309

1292-
<!--
1293-
Here is a manifest for a Job that defines a `podFailurePolicy`:
1310+
<!--
1311+
Here is a manifest for a Job that defines a `podFailurePolicy`:
12941312
-->
12951313
下面是一个定义了 `podFailurePolicy` 的 Job 的清单:
12961314

1297-
{{< codenew file="controllers/job-pod-failure-policy-example.yaml" >}}
1315+
{{< codenew file="/controllers/job-pod-failure-policy-example.yaml" >}}
12981316

1299-
<!--
1317+
<!--
13001318
In the example above, the first rule of the Pod failure policy specifies that
13011319
the Job should be marked failed if the `main` container fails with the 42 exit
1302-
code. The following are the rules for the `main` container specifically:
1320+
code. The following are the rules for the `main` container specifically:
13031321
-->
13041322
在上面的示例中,Pod 失效策略的第一条规则规定如果 `main` 容器失败并且退出码为 42,
13051323
Job 将被标记为失败。以下是 `main` 容器的具体规则:
13061324

1307-
<!--
1325+
<!--
13081326
- an exit code of 0 means that the container succeeded
13091327
- an exit code of 42 means that the **entire Job** failed
13101328
- any other exit code represents that the container failed, and hence the entire
@@ -1318,34 +1336,34 @@ Job 将被标记为失败。以下是 `main` 容器的具体规则:
13181336
如果等于 `backoffLimit` 所设置的次数,则代表 **整个 Job** 失效。
13191337

13201338
{{< note >}}
1321-
<!--
1339+
<!--
13221340
Because the Pod template specifies a `restartPolicy: Never`,
1323-
the kubelet does not restart the `main` container in that particular Pod.
1341+
the kubelet does not restart the `main` container in that particular Pod.
13241342
-->
13251343
因为 Pod 模板中指定了 `restartPolicy: Never`
13261344
所以 kubelet 将不会重启 Pod 中的 `main` 容器。
13271345
{{< /note >}}
13281346

1329-
<!--
1347+
<!--
13301348
The second rule of the Pod failure policy, specifying the `Ignore` action for
13311349
failed Pods with condition `DisruptionTarget` excludes Pod disruptions from
1332-
being counted towards the `.spec.backoffLimit` limit of retries.
1350+
being counted towards the `.spec.backoffLimit` limit of retries.
13331351
-->
13341352
Pod 失效策略的第二条规则,
13351353
指定对于状况为 `DisruptionTarget` 的失效 Pod 采取 `Ignore` 操作,
13361354
统计 `.spec.backoffLimit` 重试次数限制时不考虑 Pod 因干扰而发生的异常。
13371355

13381356
{{< note >}}
1339-
<!--
1357+
<!--
13401358
If the Job failed, either by the Pod failure policy or Pod backoff
13411359
failure policy, and the Job is running multiple Pods, Kubernetes terminates all
1342-
the Pods in that Job that are still Pending or Running.
1360+
the Pods in that Job that are still Pending or Running.
13431361
-->
13441362
如果根据 Pod 失效策略或 Pod 回退失效策略判定 Pod 已经失效,
13451363
并且 Job 正在运行多个 Pod,Kubernetes 将终止该 Job 中仍处于 Pending 或 Running 的所有 Pod。
13461364
{{< /note >}}
13471365

1348-
<!--
1366+
<!--
13491367
These are some requirements and semantics of the API:
13501368
- if you want to use a `.spec.podFailurePolicy` field for a Job, you must
13511369
also define that Job's pod template with `.spec.restartPolicy` set to `Never`.
@@ -1382,6 +1400,26 @@ These are some requirements and semantics of the API:
13821400
- `Ignore`:表示 `.spec.backoffLimit` 的计数器不应该增加,应该创建一个替换的 Pod。
13831401
- `Count`:表示 Pod 应该以默认方式处理。`.spec.backoffLimit` 的计数器应该增加。
13841402

1403+
{{< note >}}
1404+
<!--
1405+
When you use a `podFailurePolicy`, the job controller only matches Pods in the
1406+
`Failed` phase. Pods with a deletion timestamp that are not in a terminal phase
1407+
(`Failed` or `Succeeded`) are considered still terminating. This implies that
1408+
terminating pods retain a [tracking finalizer](#job-tracking-with-finalizers)
1409+
until they reach a terminal phase.
1410+
Since Kubernetes 1.27, Kubelet transitions deleted pods to a terminal phase
1411+
(see: [Pod Phase](/docs/concepts/workloads/pods/pod-lifecycle/#pod-phase)). This
1412+
ensures that deleted pods have their finalizers removed by the Job controller.
1413+
-->
1414+
当你使用 `podFailurePolicy` 时,Job 控制器只匹配处于 `Failed` 阶段的 Pod。
1415+
具有删除时间戳但不处于终止阶段(`Failed``Succeeded`)的 Pod 被视为仍在终止中。
1416+
这意味着终止中的 Pod 会保留一个[跟踪 Finalizer](#job-tracking-with-finalizers)
1417+
直到到达终止阶段。
1418+
从 Kubernetes 1.27 开始,kubelet 将删除的 Pod 转换到终止阶段
1419+
(参阅 [Pod 阶段](/zh-cn/docs/concepts/workloads/pods/pod-lifecycle/#pod-phase))。
1420+
这确保已删除的 Pod 的 Finalizer 被 Job 控制器移除。
1421+
{{< /note >}}
1422+
13851423
<!--
13861424
### Job tracking with finalizers
13871425
-->
@@ -1435,6 +1473,30 @@ are tracked using Pod finalizers.
14351473
****应该给 Job 手动添加或删除该注解。
14361474
取而代之的是你可以重新创建 Job 以确保使用 Pod Finalizer 跟踪这些 Job。
14371475

1476+
<!--
1477+
### Elastic Indexed Jobs
1478+
-->
1479+
### 弹性索引 Job {#elastic-indexed-job}
1480+
1481+
{{< feature-state for_k8s_version="v1.27" state="beta" >}}
1482+
1483+
<!--
1484+
You can scale Indexed Jobs up or down by mutating both `.spec.parallelism`
1485+
and `.spec.completions` together such that `.spec.parallelism == .spec.completions`.
1486+
When the `ElasticIndexedJob`[feature gate](/docs/reference/command-line-tools-reference/feature-gates/)
1487+
on the [API server](/docs/reference/command-line-tools-reference/kube-apiserver/)
1488+
is disabled, `.spec.completions` is immutable.
1489+
1490+
Use cases for elastic Indexed Jobs include batch workloads which require
1491+
scaling an indexed Job, such as MPI, Horovord, Ray, and PyTorch training jobs.
1492+
-->
1493+
你可以通过同时改变 `.spec.parallelism``.spec.completions` 来扩大或缩小带索引 Job,
1494+
从而满足 `.spec.parallelism == .spec.completions`
1495+
[API 服务器](/zh-cn/docs/reference/command-line-tools-reference/kube-apiserver/)
1496+
上的 `ElasticIndexedJob` 特性门控被禁用时,`.spec.completions` 是不可变的。
1497+
弹性索引 Job 的使用场景包括需要扩展索引 Job 的批处理工作负载,例如 MPI、Horovord、Ray
1498+
和 PyTorch 训练作业。
1499+
14381500
<!--
14391501
## Alternatives
14401502

0 commit comments

Comments
 (0)