Skip to content

Commit 105d90a

Browse files
committed
KEP-3998: move section to before Job termination and cleanup
Signed-off-by: Yuki Iwai <[email protected]>
1 parent 92a0032 commit 105d90a

File tree

1 file changed

+57
-57
lines changed
  • content/en/docs/concepts/workloads/controllers

1 file changed

+57
-57
lines changed

content/en/docs/concepts/workloads/controllers/job.md

Lines changed: 57 additions & 57 deletions
Original file line numberDiff line numberDiff line change
@@ -550,6 +550,63 @@ terminating Pods only once these Pods reach the terminal `Failed` phase. This be
550550
to `podReplacementPolicy: Failed`. For more information, see [Pod replacement policy](#pod-replacement-policy).
551551
{{< /note >}}
552552

553+
## Success policy {#success-policy}
554+
555+
{{< feature-state feature_gate_name="JobSuccessPolicy" >}}
556+
557+
{{< note >}}
558+
You can only configure a success policy for an Indexed Job if you have the
559+
`JobSuccessPolicy` [feature gate](/docs/reference/command-line-tools-reference/feature-gates/)
560+
enabled in your cluster.
561+
{{< /note >}}
562+
563+
When you run an indexed Job, a success policy defined with the `spec.successPolicy` field,
564+
allows you to define when a Job can be declared as succeeded based on the number of succeeded pods.
565+
566+
In some situations, you may want to have a better control when handling Pod
567+
successes than the control provided by the `.spec.completins`.
568+
There are some examples of use cases:
569+
570+
* To optimize costs of running workloads by avoiding unnecessary Pod running,
571+
you can terminate a Job as soon as one of its Pods succeeds.
572+
* To care only about a leader index in determining the success or failure of a Job
573+
in a batch workloads such as MPI and PyTorch etc.
574+
575+
You can configure a success policy, in the `.spec.successPolicy` field,
576+
to meet the above use cases. This policy can handle Job successes based on the
577+
number of succeeded pods. After the Job meet success policy, the lingering Pods
578+
are terminated by the Job controller.
579+
580+
When you specify the only `.spec.successPolicy.rules[*].succeededIndexes`,
581+
once all indexes specified in the `succeededIndexes` succeeded, the Job is marked as succeeded.
582+
The `succeededIndexes` must be a list within 0 to `.spec.completions-1` and
583+
must not contain duplicate indexes. The `succeededIndexes` is represented as intervals separated by a hyphen.
584+
The number are listed in represented by the first and last element of the series, separated by a hyphen.
585+
For example, if you want to specify 1, 3, 4, 5 and 7, the `succeededIndexes` is represented as `1,3-5,7`.
586+
587+
When you specify the only `spec.successPolicy.rules[*].succeededCount`,
588+
once the number of succeeded indexes reaches the `succeededCount`, the Job is marked as succeeded.
589+
590+
When you specify both `succeededIndexes` and `succeededCount`,
591+
once the number of succeeded indexes specified in the `succeededIndexes` reaches the `succeededCount`,
592+
the Job is marked as succeeded.
593+
594+
Note that when you specify multiple rules in the `.spec.succeessPolicy.rules`,
595+
the rules are evaluated in order. Once the Job meets a rule, the remaining rules are ignored.
596+
597+
Here is a manifest for a Job with `successPolicy`:
598+
599+
{{% code_sample file="/controllers/job-success-policy-example.yaml" %}}
600+
601+
In the example above, the rule of the success policy specifies that
602+
the Job should be marked succeeded and terminate the lingering Pods
603+
if one of the 0, 1, and 2 indexes succeeded.
604+
605+
{{< note >}}
606+
When you specify both a success policy and some terminating policies such as `.spec.backoffLimit` and `.spec.podFailurePolicy`,
607+
once the Job meets both policies, the terminating policies are respected and a success policy is ignored.
608+
{{< /note >}}
609+
553610
## Job termination and cleanup
554611

555612
When a Job completes, no more Pods are created, but the Pods are [usually](#pod-backoff-failure-policy) not deleted either.
@@ -1050,63 +1107,6 @@ after the operation: the built-in Job controller and the external controller
10501107
indicated by the field value.
10511108
{{< /warning >}}
10521109

1053-
### Success policy {#success-policy}
1054-
1055-
{{< feature-state for_k8s_version="v1.29" state="alpha" >}}
1056-
1057-
{{< note >}}
1058-
You can only configure a success policy for an Indexed Job if you have the
1059-
`JobSuccessPolicy` [feature gate](/docs/reference/command-line-tools-reference/feature-gates/)
1060-
enabled in your cluster.
1061-
{{< /note >}}
1062-
1063-
When you run an indexed Job, a success policy defined with the `spec.successPolicy` field,
1064-
allows you to define when a Job can be declared as succeeded based on the number of succeeded pods.
1065-
1066-
In some situations, you may want to have a better control when handling Pod
1067-
successes than the control provided by the `.spec.completins`.
1068-
There are some examples of use cases:
1069-
1070-
* To optimize costs of running workloads by avoiding unnecessary Pod running,
1071-
you can terminate a Job as soon as one of its Pods succeeds.
1072-
* To care only about a leader index in determining the success or failure of a Job
1073-
in a batch workloads such as MPI and PyTorch etc.
1074-
1075-
You can configure a success policy, in the `.spec.successPolicy` field,
1076-
to meet the above use cases. This policy can handle Job successes based on the
1077-
number of succeeded pods. After the Job meet success policy, the lingering Pods
1078-
are terminated by the Job controller.
1079-
1080-
When you specify the only `.spec.successPolicy.rules[*].succeededIndexes`,
1081-
once all indexes specified in the `succeededIndexes` succeeded, the Job is marked as succeeded.
1082-
The `succeededIndexes` must be a list within 0 to `.spec.completions-1` and
1083-
must not contain duplicate indexes. The `succeededIndexes` is represented as intervals separated by a hyphen.
1084-
The number are listed in represented by the first and last element of the series, separated by a hyphen.
1085-
For example, if you want to specify 1, 3, 4, 5 and 7, the `succeededIndexes` is represented as `1,3-5,7`.
1086-
1087-
When you specify the only `spec.successPolicy.rules[*].succeededCount`,
1088-
once the number of succeeded indexes reaches the `succeededCount`, the Job is marked as succeeded.
1089-
1090-
When you specify both `succeededIndexes` and `succeededCount`,
1091-
once the number of succeeded indexes specified in the `succeededIndexes` reaches the `succeededCount`,
1092-
the Job is marked as succeeded.
1093-
1094-
Note that when you specify multiple rules in the `.spec.succeessPolicy.rules`,
1095-
the rules are evaluated in order. Once the Job meets a rule, the remaining rules are ignored.
1096-
1097-
Here is a manifest for a Job with `successPolicy`:
1098-
1099-
{{% code_sample file="/controllers/job-success-policy-example.yaml" %}}
1100-
1101-
In the example above, the rule of the success policy specifies that
1102-
the Job should be marked succeeded and terminate the lingering Pods
1103-
if one of the 0, 1, and 2 indexes succeeded.
1104-
1105-
{{< note >}}
1106-
When you specify both a success policy and some terminating policies such as `.spec.backoffLimit` and `.spec.podFailurePolicy`,
1107-
once the Job meets both policies, the terminating policies are respected and a success policy is ignored.
1108-
{{< /note >}}
1109-
11101110
## Alternatives
11111111

11121112
### Bare Pods

0 commit comments

Comments
 (0)