Skip to content

Commit 7187bda

Browse files
authored
Merge pull request #27765 from tengqm/zh-sync-concepts-8
[zh] Resync concepts section (8)
2 parents db776ac + dbdbf49 commit 7187bda

File tree

1 file changed

+234
-27
lines changed
  • content/zh/docs/concepts/workloads/controllers

1 file changed

+234
-27
lines changed

content/zh/docs/concepts/workloads/controllers/job.md

Lines changed: 234 additions & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -25,19 +25,20 @@ weight: 50
2525
A Job creates one or more Pods and will continue to retry execution of the Pods until a specified number of them successfully terminate.
2626
As pods successfully complete, the Job tracks the successful completions. When a specified number
2727
of successful completions is reached, the task (ie, Job) is complete. Deleting a Job will clean up
28-
the Pods it created.
28+
the Pods it created. Suspending a Job will delete its active Pods until the Job
29+
is resumed again.
2930
3031
A simple case is to create one Job object in order to reliably run one Pod to completion.
3132
The Job object will start a new Pod if the first Pod fails or is deleted (for example
3233
due to a node hardware failure or a node reboot).
3334
3435
You can also use a Job to run multiple Pods in parallel.
3536
-->
36-
3737
Job 会创建一个或者多个 Pods,并将继续重试 Pods 的执行,直到指定数量的 Pods 成功终止。
3838
随着 Pods 成功结束,Job 跟踪记录成功完成的 Pods 个数。
3939
当数量达到指定的成功个数阈值时,任务(即 Job)结束。
4040
删除 Job 的操作会清除所创建的全部 Pods。
41+
挂起 Job 的操作会删除 Job 的所有活跃 Pod,直到 Job 被再次恢复执行。
4142

4243
一种简单的使用场景下,你会创建一个 Job 对象以便以一种可靠的方式运行某 Pod 直到完成。
4344
当第一个 Pod 失败或者被删除(比如因为节点硬件失效或者重启)时,Job
@@ -224,8 +225,8 @@ There are three main types of task suitable to run as a Job:
224225
- the Job is complete as soon as its Pod terminates successfully.
225226
1. Parallel Jobs with a *fixed completion count*:
226227
- specify a non-zero positive value for `.spec.completions`.
227-
- the Job represents the overall task, and is complete when there is one successful Pod for each value in the range 1 to `.spec.completions`.
228-
- **not implemented yet:** Each Pod is passed a different index in the range 1 to `.spec.completions`.
228+
- the Job represents the overall task, and is complete when there are `.spec.completions` successful Pods.
229+
- when using `.spec.completionMode="Indexed"`, each Pod gets a different index in the range 0 to `.spec.completions-1`.
229230
1. Parallel Jobs with a *work queue*:
230231
- do not specify `.spec.completions`, default to `.spec.parallelism`.
231232
- the Pods must coordinate amongst themselves or an external service to determine what each should work on. For example, a Pod might fetch a batch of up to N items from the work queue.
@@ -234,21 +235,21 @@ There are three main types of task suitable to run as a Job:
234235
- once at least one Pod has terminated with success and all Pods are terminated, then the Job is completed with success.
235236
- once any Pod has exited with success, no other Pod should still be doing any work for this task or writing any output. They should all be in the process of exiting.
236237
-->
237-
1. 非并行 Job
238-
- 通常只启动一个 Pod,除非该 Pod 失败
239-
- 当 Pod 成功终止时,立即视 Job 为完成状态
240-
1. 具有 *确定完成计数* 的并行 Job
241-
- `.spec.completions` 字段设置为非 0 的正数值
242-
- Job 用来代表整个任务,当对应于 1 和 `.spec.completions` 之间的每个整数都存在
243-
一个成功的 Pod 时,Job 被视为完成
244-
- **尚未实现**:每个 Pod 收到一个介于 1 `spec.completions` 之间的不同索引值
245-
1.*工作队列* 的并行 Job
246-
- 不设置 `spec.completions`,默认值为 `.spec.parallelism`
238+
1. 非并行 Job
239+
- 通常只启动一个 Pod,除非该 Pod 失败
240+
- 当 Pod 成功终止时,立即视 Job 为完成状态
241+
1. 具有 *确定完成计数* 的并行 Job
242+
- `.spec.completions` 字段设置为非 0 的正数值
243+
- Job 用来代表整个任务,当成功的 Pod 个数达到 `.spec.completions` 时,Job 被视为完成。
244+
- 当使用 `.spec.completionMode="Indexed"` 时,每个 Pod 都会获得一个不同的
245+
索引值,介于 0 `.spec.completions-1` 之间。
246+
1.*工作队列* 的并行 Job
247+
- 不设置 `spec.completions`,默认值为 `.spec.parallelism`
247248
- 多个 Pod 之间必须相互协调,或者借助外部服务确定每个 Pod 要处理哪个工作条目。
248249
例如,任一 Pod 都可以从工作队列中取走最多 N 个工作条目。
249-
- 每个 Pod 都可以独立确定是否其它 Pod 都已完成,进而确定 Job 是否完成
250-
- 当 Job 中 _任何_ Pod 成功终止,不再创建新 Pod
251-
- 一旦至少 1 个 Pod 成功完成,并且所有 Pod 都已终止,即可宣告 Job 成功完成
250+
- 每个 Pod 都可以独立确定是否其它 Pod 都已完成,进而确定 Job 是否完成
251+
- 当 Job 中 _任何_ Pod 成功终止,不再创建新 Pod
252+
- 一旦至少 1 个 Pod 成功完成,并且所有 Pod 都已终止,即可宣告 Job 成功完成
252253
- 一旦任何 Pod 成功退出,任何其它 Pod 都不应再对此任务执行任何操作或生成任何输出。
253254
所有 Pod 都应启动退出过程。
254255

@@ -314,6 +315,59 @@ parallelism, for a variety of reasons:
314315
- Job 控制器可能会因为之前同一 Job 中 Pod 失效次数过多而压制新 Pod 的创建。
315316
- 当 Pod 处于体面终止进程中,需要一定时间才能停止。
316317

318+
<!--
319+
### Completion mode
320+
-->
321+
### 完成模式 {#completion-mode}
322+
323+
{{< feature-state for_k8s_version="v1.21" state="alpha" >}}
324+
325+
{{< note >}}
326+
<!--
327+
To be able to create Indexed Jobs, make sure to enable the `IndexedJob`
328+
[feature gate](/docs/reference/command-line-tools-reference/feature-gates/)
329+
on the [API server](/docs/reference/command-line-tools-reference/kube-apiserver/)
330+
and the [controller manager](/docs/reference/command-line-tools-reference/kube-controller-manager/).
331+
-->
332+
若想创建带索引的 Job(Indexed Job),请确保
333+
[API 服务器](/zh/docs/reference/command-line-tools-reference/kube-apiserver/)
334+
[控制器管理器](/docs/reference/command-line-tools-reference/kube-controller-manager/)
335+
上的
336+
[特性门控](/zh/docs/reference/command-line-tools-reference/feature-gates/)
337+
`IndexedJob` 被启用。
338+
{{< /note >}}
339+
340+
<!--
341+
Jobs with _fixed completion count_ - that is, jobs that have non null
342+
`.spec.completions` - can have a completion mode that is specified in `.spec.completionMode`:
343+
-->
344+
带有 *确定完成计数* 的 Job,即 `.spec.completions` 不为 null 的 Job,
345+
都可以在其 `.spec.completionMode` 中设置完成模式:
346+
347+
<!--
348+
- `NonIndexed` (default): the Job is considered complete when there have been
349+
`.spec.completions` successfully completed Pods. In other words, each Pod
350+
completion is homologous to each other. Note that Jobs that have null
351+
`.spec.completions` are implicitly `NonIndexed`.
352+
- `Indexed`: the Pods of a Job get an associated completion index from 0 to
353+
`.spec.completions-1`, available in the annotation `batch.kubernetes.io/job-completion-index`.
354+
The Job is considered complete when there is one successfully completed Pod
355+
for each index. For more information about how to use this mode, see
356+
[Indexed Job for Parallel Processing with Static Work Assignment](/docs/tasks/job/indexed-parallel-processing-static/).
357+
Note that, although rare, more than one Pod could be started for the same
358+
index, but only one of them will count towards the completion count.
359+
-->
360+
- `NonIndexed` (默认值):当成功完成的 Pod 个数达到 `.spec.completions`
361+
设值时认为 Job 已经完成。换言之,每个 Job 完成事件都是独立无关且同质的。
362+
要注意的是,当 `.spec.completions` 取值为 null 时,Job 被隐式处理为 `NonIndexed`
363+
- `Indexed`:Job 的 Pod 会获得对应的完成索引,取值为 0 到 `.spec.completions-1`
364+
存放在注解 `batch.kubernetes.io/job-completion-index` 中。
365+
当每个索引都对应一个完成完成的 Pod 时,Job 被认为是已完成的。
366+
关于如何使用这种模式的更多信息,可参阅
367+
[用带索引的 Job 执行基于静态任务分配的并行处理](/zh/docs/tasks/job/indexed-parallel-processing-static/)
368+
需要注意的是,对同一索引值可能被启动的 Pod 不止一个,尽管这种情况很少发生。
369+
这时,只有一个会被记入完成计数中。
370+
317371
<!--
318372
## Handling Pod and container failures
319373
@@ -631,12 +685,12 @@ The pattern names are also links to examples and more detailed description.
631685
下面是对这些权衡的汇总,列 2 到 4 对应上面的权衡比较。
632686
模式的名称对应了相关示例和更详细描述的链接。
633687

634-
| 模式 | 单个 Job 对象 | Pods 数少于工作条目数? | 直接使用应用无需修改? | 在 Kube 1.1 上可用?|
635-
| ----- |:-------------:|:-----------------------:|:---------------------:|:-------------------:|
636-
| [Job 模版扩展](/zh/docs/tasks/job/parallel-processing-expansion/) | | | ✓ | ✓ |
637-
| [每工作条目一 Pod 的队列](/zh/docs/tasks/job/coarse-parallel-processing-work-queue/) | ✓ | | 有时 | ✓ |
638-
| [Pod 数量可变的队列](/zh/docs/tasks/job/fine-parallel-processing-work-queue/) | ✓ | ✓ | | ✓ |
639-
| 静态工作分派的单个 Job | ✓ | | | |
688+
| 模式 | 单个 Job 对象 | Pods 数少于工作条目数? | 直接使用应用无需修改? |
689+
| ----- |:-------------:|:-----------------------:|:---------------------:|
690+
| [每工作条目一 Pod 的队列](/zh/docs/tasks/job/coarse-parallel-processing-work-queue/) | ✓ | | 有时 |
691+
| [Pod 数量可变的队列](/zh/docs/tasks/job/fine-parallel-processing-work-queue/) | ✓ | ✓ | |
692+
| [静态任务分派的带索引的 Job](/zh/docs/tasks/job/indexed-parallel-processing-static) | ✓ | | ✓ |
693+
| [Job 模版扩展](/zh/docs/tasks/job/parallel-processing-expansion/) | | | |
640694

641695
<!--
642696
When you specify completions with `.spec.completions`, each Pod created by the Job controller
@@ -659,14 +713,169 @@ Here, `W` is the number of work items.
659713

660714
| 模式 | `.spec.completions` | `.spec.parallelism` |
661715
| ----- |:-------------------:|:--------------------:|
662-
| [Job 模版扩展](/zh/docs/tasks/job/parallel-processing-expansion/) | 1 | 应该为 1 |
663716
| [每工作条目一 Pod 的队列](/zh/docs/tasks/job/coarse-parallel-processing-work-queue/) | W | 任意值 |
664717
| [Pod 个数可变的队列](/zh/docs/tasks/job/fine-parallel-processing-work-queue/) | 1 | 任意值 |
665-
| 基于静态工作分派的单一 Job | W | 任意值 |
718+
| [静态任务分派的带索引的 Job](/zh/docs/tasks/job/indexed-parallel-processing-static) | W | | 任意值 |
719+
| [Job 模版扩展](/zh/docs/tasks/job/parallel-processing-expansion/) | 1 | 应该为 1 |
666720

667721
<!--
668722
## Advanced usage
669723

724+
### Suspending a Job
725+
-->
726+
## 高级用法 {#advanced-usage}
727+
728+
### 挂起 Job {#suspending-a-job}
729+
730+
{{< feature-state for_k8s_version="v1.21" state="alpha" >}}
731+
732+
{{< note >}}
733+
<!--
734+
Suspending Jobs is available in Kubernetes versions 1.21 and above. You must
735+
enable the `SuspendJob` [feature gate](/docs/reference/command-line-tools-reference/feature-gates/)
736+
on the [API server](/docs/reference/command-line-tools-reference/kube-apiserver/)
737+
and the [controller manager](/docs/reference/command-line-tools-reference/kube-controller-manager/)
738+
in order to use this feature.
739+
-->
740+
在 Kubernetes 1.21 及更高版本中可以执行挂起(Suspending)Job 的操作。
741+
你必须在
742+
[API 服务器](/zh/docs/reference/command-line-tools-reference/kube-apiserver/)
743+
和[控制器管理器](/zh/docs/reference/command-line-tools-reference/kube-controller-manager/)
744+
上启用 `SuspendJob` 这一
745+
[特性门控](/docs/reference/command-line-tools-reference/feature-gates/)
746+
才能执行此操作,
747+
{{< /note >}}
748+
749+
<!--
750+
When a Job is created, the Job controller will immediately begin creating Pods
751+
to satisfy the Job's requirements and will continue to do so until the Job is
752+
complete. However, you may want to temporarily suspend a Job's execution and
753+
resume it later. To suspend a Job, you can update the `.spec.suspend` field of
754+
the Job to true; later, when you want to resume it again, update it to false.
755+
Creating a Job with `.spec.suspend` set to true will create it in the suspended
756+
state.
757+
-->
758+
Job 被创建时,Job 控制器会马上开始执行 Pod 创建操作以满足 Job 的需求,
759+
并持续执行此操作直到 Job 完成为止。
760+
不过你可能想要暂时挂起 Job 执行,之后再恢复其执行。
761+
要挂起一个 Job,你可以将 Job 的 `.spec.suspend` 字段更新为 true。
762+
之后,当你希望恢复其执行时,将其更新为 false。
763+
创建一个 `.spec.suspend` 被设置为 true 的 Job 本质上会将其创建为被挂起状态。
764+
765+
<!--
766+
When a Job is resumed from suspension, its `.status.startTime` field will be
767+
reset to the current time. This means that the `.spec.activeDeadlineSeconds`
768+
timer will be stopped and reset when a Job is suspended and resumed.
769+
-->
770+
当 Job 被从挂起状态恢复执行时,其 `.status.startTime` 字段会被重置为
771+
当前的时间。这意味着 `.spec.activeDeadlineSeconds` 计时器会在 Job 挂起时
772+
被停止,并在 Job 恢复执行时复位。
773+
774+
<!--
775+
Remember that suspending a Job will delete all active Pods. When the Job is
776+
suspended, your [Pods will be terminated](/docs/concepts/workloads/pods/pod-lifecycle/#pod-termination)
777+
with a SIGTERM signal. The Pod's graceful termination period will be honored and
778+
your Pod must handle this signal in this period. This may involve saving
779+
progress for later or undoing changes. Pods terminated this way will not count
780+
towards the Job's `completions` count.
781+
-->
782+
要记住的是,挂起 Job 会删除其所有活跃的 Pod。当 Job 被挂起时,你的 Pod 会
783+
收到 SIGTERM 信号而被[终止](/docs/concepts/workloads/pods/pod-lifecycle/#pod-termination)。
784+
Pod 的体面终止期限会被考虑,不过 Pod 自身也必须在此期限之内处理完信号。
785+
处理逻辑可能包括保存进度以便将来恢复,或者取消已经做出的变更等等。
786+
Pod 以这种形式终止时,不会被记入 Job 的 `completions` 计数。
787+
788+
<!--
789+
An example Job definition in the suspended state can be like so:
790+
-->
791+
处于被挂起状态的 Job 的定义示例可能是这样子:
792+
793+
```shell
794+
kubectl get job myjob -o yaml
795+
```
796+
797+
```yaml
798+
apiVersion: batch/v1
799+
kind: Job
800+
metadata:
801+
name: myjob
802+
spec:
803+
suspend: true
804+
parallelism: 1
805+
completions: 5
806+
template:
807+
spec:
808+
...
809+
```
810+
811+
<!--
812+
The Job's status can be used to determine if a Job is suspended or has been
813+
suspended in the past:
814+
-->
815+
Job 的 `status` 可以用来确定 Job 是否被挂起,或者曾经被挂起。
816+
817+
```shell
818+
kubectl get jobs/myjob -o yaml
819+
```
820+
821+
```json
822+
apiVersion: batch/v1
823+
kind: Job
824+
# .metadata and .spec omitted
825+
status:
826+
conditions:
827+
- lastProbeTime: "2021-02-05T13:14:33Z"
828+
lastTransitionTime: "2021-02-05T13:14:33Z"
829+
status: "True"
830+
type: Suspended
831+
startTime: "2021-02-05T13:13:48Z"
832+
```
833+
834+
<!--
835+
The Job condition of type "Suspended" with status "True" means the Job is
836+
suspended; the `lastTransitionTime` field can be used to determine how long the
837+
Job has been suspended for. If the status of that condition is "False", then the
838+
Job was previously suspended and is now running. If such a condition does not
839+
exist in the Job's status, the Job has never been stopped.
840+
841+
Events are also created when the Job is suspended and resumed:
842+
-->
843+
Job 的 "Suspended" 类型的状况在状态值为 "True" 时意味着 Job 正被
844+
挂起;`lastTransitionTime` 字段可被用来确定 Job 被挂起的时长。
845+
如果此状况字段的取值为 "False",则 Job 之前被挂起且现在在运行。
846+
如果 "Suspended" 状况在 `status` 字段中不存在,则意味着 Job 从未
847+
被停止执行。
848+
849+
当 Job 被挂起和恢复执行时,也会生成事件:
850+
851+
```shell
852+
kubectl describe jobs/myjob
853+
```
854+
855+
```
856+
Name: myjob
857+
...
858+
Events:
859+
Type Reason Age From Message
860+
---- ------ ---- ---- -------
861+
Normal SuccessfulCreate 12m job-controller Created pod: myjob-hlrpl
862+
Normal SuccessfulDelete 11m job-controller Deleted pod: myjob-hlrpl
863+
Normal Suspended 11m job-controller Job suspended
864+
Normal SuccessfulCreate 3s job-controller Created pod: myjob-jvb44
865+
Normal Resumed 3s job-controller Job resumed
866+
```
867+
868+
<!--
869+
The last four events, particularly the "Suspended" and "Resumed" events, are
870+
directly a result of toggling the `.spec.suspend` field. In the time between
871+
these two events, we see that no Pods were created, but Pod creation restarted
872+
as soon as the Job was resumed.
873+
-->
874+
最后四个四件,特别是 "Suspended" 和 "Resumed" 事件,都是因为 `.spec.suspend`
875+
字段值被改来改去造成的。在这两个事件之间,我们看到没有 Pod 被创建,不过当
876+
Job 被恢复执行时,Pod 创建操作立即被重启执行。
877+
878+
<!--
670879
### Specifying your own Pod selector {#specifying-your-own-pod-selector}
671880
672881
Normally, when you create a Job object, you do not specify `.spec.selector`.
@@ -676,8 +885,6 @@ It picks a selector value that will not overlap with any other jobs.
676885
However, in some cases, you might need to override this automatically set selector.
677886
To do this, you can specify the `.spec.selector` of the Job.
678887
-->
679-
## 高级用法 {#advanced-usage}
680-
681888
### 指定你自己的 Pod 选择算符 {#specifying-your-own-pod-selector}
682889
683890
通常,当你创建一个 Job 对象时,你不会设置 `.spec.selector`。

0 commit comments

Comments
 (0)