@@ -1050,6 +1050,63 @@ after the operation: the built-in Job controller and the external controller
1050
1050
indicated by the field value.
1051
1051
{{< /warning >}}
1052
1052
1053
+ # ## Success policy {#success-policy}
1054
+
1055
+ {{< feature-state for_k8s_version="v1.29" state="alpha" >}}
1056
+
1057
+ {{< note >}}
1058
+ You can only configure a success policy for an Indexed Job if you have the
1059
+ ` JobSuccessPolicy` [feature gate](/docs/reference/command-line-tools-reference/feature-gates/)
1060
+ enabled in your cluster.
1061
+ {{< /note >}}
1062
+
1063
+ When you run an indexed Job, a success policy defined with the `spec.successPolicy` field,
1064
+ allows you to define when a Job can be declared as succeeded based on the number of succeeded pods.
1065
+
1066
+ In some situations, you may want to have a better control when handling Pod
1067
+ successes than the control provided by the `.spec.completins`.
1068
+ There are some examples of use cases :
1069
+
1070
+ * To optimize costs of running workloads by avoiding unnecessary Pod running,
1071
+ you can terminate a Job as soon as one of its Pods succeeds.
1072
+ * To care only about a leader index in determining the success or failure of a Job
1073
+ in a batch workloads such as MPI and PyTorch etc.
1074
+
1075
+ You can configure a success policy, in the `.spec.successPolicy` field,
1076
+ to meet the above use cases. This policy can handle Job successes based on the
1077
+ number of succeeded pods. After the Job meet success policy, the lingering Pods
1078
+ are terminated by the Job controller.
1079
+
1080
+ When you specify the only `.spec.successPolicy.rules[*].succeededIndexes`,
1081
+ once all indexes specified in the `succeededIndexes` succeeded, the Job is marked as succeeded.
1082
+ The `succeededIndexes` must be a list within 0 to `.spec.completions-1` and
1083
+ must not contain duplicate indexes. The `succeededIndexes` is represented as intervals separated by a hyphen.
1084
+ The number are listed in represented by the first and last element of the series, separated by a hyphen.
1085
+ For example, if you want to specify 1, 3, 4, 5 and 7, the `succeededIndexes` is represented as `1,3-5,7`.
1086
+
1087
+ When you specify the only `spec.successPolicy.rules[*].succeededCount`,
1088
+ once the number of succeeded indexes reaches the `succeededCount`, the Job is marked as succeeded.
1089
+
1090
+ When you specify both `succeededIndexes` and `succeededCount`,
1091
+ once the number of succeeded indexes specified in the `succeededIndexes` reaches the `succeededCount`,
1092
+ the Job is marked as succeeded.
1093
+
1094
+ Note that when you specify multiple rules in the `.spec.succeessPolicy.rules`,
1095
+ the rules are evaluated in order. Once the Job meets a rule, the remaining rules are ignored.
1096
+
1097
+ Here is a manifest for a Job with `successPolicy` :
1098
+
1099
+ {{% code_sample file="/controllers/job-success-policy-example.yaml" %}}
1100
+
1101
+ In the example above, the rule of the success policy specifies that
1102
+ the Job should be marked succeeded and terminate the lingering Pods
1103
+ if one of the 0, 1, and 2 indexes succeeded.
1104
+
1105
+ {{< note >}}
1106
+ When you specify both a success policy and some terminating policies such as `.spec.backoffLimit` and `.spec.podFailurePolicy`,
1107
+ once the Job meets both policies, the terminating policies are respected and a success policy is ignored.
1108
+ {{< /note >}}
1109
+
1053
1110
# # Alternatives
1054
1111
1055
1112
# ## Bare Pods
0 commit comments