@@ -550,6 +550,63 @@ terminating Pods only once these Pods reach the terminal `Failed` phase. This be
550
550
to `podReplacementPolicy : Failed`. For more information, see [Pod replacement policy](#pod-replacement-policy).
551
551
{{< /note >}}
552
552
553
+ # # Success policy {#success-policy}
554
+
555
+ {{< feature-state feature_gate_name="JobSuccessPolicy" >}}
556
+
557
+ {{< note >}}
558
+ You can only configure a success policy for an Indexed Job if you have the
559
+ ` JobSuccessPolicy` [feature gate](/docs/reference/command-line-tools-reference/feature-gates/)
560
+ enabled in your cluster.
561
+ {{< /note >}}
562
+
563
+ When you run an indexed Job, a success policy defined with the `spec.successPolicy` field,
564
+ allows you to define when a Job can be declared as succeeded based on the number of succeeded pods.
565
+
566
+ In some situations, you may want to have a better control when handling Pod
567
+ successes than the control provided by the `.spec.completins`.
568
+ There are some examples of use cases :
569
+
570
+ * To optimize costs of running workloads by avoiding unnecessary Pod running,
571
+ you can terminate a Job as soon as one of its Pods succeeds.
572
+ * To care only about a leader index in determining the success or failure of a Job
573
+ in a batch workloads such as MPI and PyTorch etc.
574
+
575
+ You can configure a success policy, in the `.spec.successPolicy` field,
576
+ to meet the above use cases. This policy can handle Job successes based on the
577
+ number of succeeded pods. After the Job meet success policy, the lingering Pods
578
+ are terminated by the Job controller.
579
+
580
+ When you specify the only `.spec.successPolicy.rules[*].succeededIndexes`,
581
+ once all indexes specified in the `succeededIndexes` succeeded, the Job is marked as succeeded.
582
+ The `succeededIndexes` must be a list within 0 to `.spec.completions-1` and
583
+ must not contain duplicate indexes. The `succeededIndexes` is represented as intervals separated by a hyphen.
584
+ The number are listed in represented by the first and last element of the series, separated by a hyphen.
585
+ For example, if you want to specify 1, 3, 4, 5 and 7, the `succeededIndexes` is represented as `1,3-5,7`.
586
+
587
+ When you specify the only `spec.successPolicy.rules[*].succeededCount`,
588
+ once the number of succeeded indexes reaches the `succeededCount`, the Job is marked as succeeded.
589
+
590
+ When you specify both `succeededIndexes` and `succeededCount`,
591
+ once the number of succeeded indexes specified in the `succeededIndexes` reaches the `succeededCount`,
592
+ the Job is marked as succeeded.
593
+
594
+ Note that when you specify multiple rules in the `.spec.succeessPolicy.rules`,
595
+ the rules are evaluated in order. Once the Job meets a rule, the remaining rules are ignored.
596
+
597
+ Here is a manifest for a Job with `successPolicy` :
598
+
599
+ {{% code_sample file="/controllers/job-success-policy-example.yaml" %}}
600
+
601
+ In the example above, the rule of the success policy specifies that
602
+ the Job should be marked succeeded and terminate the lingering Pods
603
+ if one of the 0, 1, and 2 indexes succeeded.
604
+
605
+ {{< note >}}
606
+ When you specify both a success policy and some terminating policies such as `.spec.backoffLimit` and `.spec.podFailurePolicy`,
607
+ once the Job meets both policies, the terminating policies are respected and a success policy is ignored.
608
+ {{< /note >}}
609
+
553
610
# # Job termination and cleanup
554
611
555
612
When a Job completes, no more Pods are created, but the Pods are [usually](#pod-backoff-failure-policy) not deleted either.
@@ -1050,63 +1107,6 @@ after the operation: the built-in Job controller and the external controller
1050
1107
indicated by the field value.
1051
1108
{{< /warning >}}
1052
1109
1053
- # ## Success policy {#success-policy}
1054
-
1055
- {{< feature-state for_k8s_version="v1.29" state="alpha" >}}
1056
-
1057
- {{< note >}}
1058
- You can only configure a success policy for an Indexed Job if you have the
1059
- ` JobSuccessPolicy` [feature gate](/docs/reference/command-line-tools-reference/feature-gates/)
1060
- enabled in your cluster.
1061
- {{< /note >}}
1062
-
1063
- When you run an indexed Job, a success policy defined with the `spec.successPolicy` field,
1064
- allows you to define when a Job can be declared as succeeded based on the number of succeeded pods.
1065
-
1066
- In some situations, you may want to have a better control when handling Pod
1067
- successes than the control provided by the `.spec.completins`.
1068
- There are some examples of use cases :
1069
-
1070
- * To optimize costs of running workloads by avoiding unnecessary Pod running,
1071
- you can terminate a Job as soon as one of its Pods succeeds.
1072
- * To care only about a leader index in determining the success or failure of a Job
1073
- in a batch workloads such as MPI and PyTorch etc.
1074
-
1075
- You can configure a success policy, in the `.spec.successPolicy` field,
1076
- to meet the above use cases. This policy can handle Job successes based on the
1077
- number of succeeded pods. After the Job meet success policy, the lingering Pods
1078
- are terminated by the Job controller.
1079
-
1080
- When you specify the only `.spec.successPolicy.rules[*].succeededIndexes`,
1081
- once all indexes specified in the `succeededIndexes` succeeded, the Job is marked as succeeded.
1082
- The `succeededIndexes` must be a list within 0 to `.spec.completions-1` and
1083
- must not contain duplicate indexes. The `succeededIndexes` is represented as intervals separated by a hyphen.
1084
- The number are listed in represented by the first and last element of the series, separated by a hyphen.
1085
- For example, if you want to specify 1, 3, 4, 5 and 7, the `succeededIndexes` is represented as `1,3-5,7`.
1086
-
1087
- When you specify the only `spec.successPolicy.rules[*].succeededCount`,
1088
- once the number of succeeded indexes reaches the `succeededCount`, the Job is marked as succeeded.
1089
-
1090
- When you specify both `succeededIndexes` and `succeededCount`,
1091
- once the number of succeeded indexes specified in the `succeededIndexes` reaches the `succeededCount`,
1092
- the Job is marked as succeeded.
1093
-
1094
- Note that when you specify multiple rules in the `.spec.succeessPolicy.rules`,
1095
- the rules are evaluated in order. Once the Job meets a rule, the remaining rules are ignored.
1096
-
1097
- Here is a manifest for a Job with `successPolicy` :
1098
-
1099
- {{% code_sample file="/controllers/job-success-policy-example.yaml" %}}
1100
-
1101
- In the example above, the rule of the success policy specifies that
1102
- the Job should be marked succeeded and terminate the lingering Pods
1103
- if one of the 0, 1, and 2 indexes succeeded.
1104
-
1105
- {{< note >}}
1106
- When you specify both a success policy and some terminating policies such as `.spec.backoffLimit` and `.spec.podFailurePolicy`,
1107
- once the Job meets both policies, the terminating policies are respected and a success policy is ignored.
1108
- {{< /note >}}
1109
-
1110
1110
# # Alternatives
1111
1111
1112
1112
# ## Bare Pods
0 commit comments