@@ -550,6 +550,62 @@ terminating Pods only once these Pods reach the terminal `Failed` phase. This be
550
550
to `podReplacementPolicy : Failed`. For more information, see [Pod replacement policy](#pod-replacement-policy).
551
551
{{< /note >}}
552
552
553
+ # # Success policy {#success-policy}
554
+
555
+ {{< feature-state feature_gate_name="JobSuccessPolicy" >}}
556
+
557
+ {{< note >}}
558
+ You can only configure a success policy for an Indexed Job if you have the
559
+ ` JobSuccessPolicy` [feature gate](/docs/reference/command-line-tools-reference/feature-gates/)
560
+ enabled in your cluster.
561
+ {{< /note >}}
562
+
563
+ When creating an Indexed Job, you can define when a Job can be declared as succeeded using a `.spec.successPolicy`,
564
+ based on the pods that succeeded.
565
+
566
+ By default, a Job succeeds when the number of succeeded Pods equals `.spec.completions`.
567
+ These are some situations where you might want additional control for declaring a Job succeeded :
568
+
569
+ * When running simulations with different parameters,
570
+ you might not need all the simulations to succeed for the overall Job to be successful.
571
+ * When following a leader-worker pattern, only the success of the leader determines the success or
572
+ failure of a Job. Examples of this are frameworks like MPI and PyTorch etc.
573
+
574
+ You can configure a success policy, in the `.spec.successPolicy` field,
575
+ to meet the above use cases. This policy can handle Job success based on the
576
+ succeeded pods. After the Job meet success policy, the job controller terminates the lingering Pods.
577
+ A success policy is defined by rules. Each rule can take one of the following forms :
578
+
579
+ * When you specify the `succeededIndexes` only,
580
+ once all indexes specified in the `succeededIndexes` succeed, the job controller marks the Job as succeeded.
581
+ The `succeededIndexes` must be a list of intervals between 0 and `.spec.completions-1`.
582
+ * When you specify the `succeededCount` only,
583
+ once the number of succeeded indexes reaches the `succeededCount`, the job controller marks the Job as succeeded.
584
+ * When you specify both `succeededIndexes` and `succeededCount`,
585
+ once the number of succeeded indexes from the subset of indexes specified in the `succeededIndexes` reaches the `succeededCount`,
586
+ the job controller marks the Job as succeeded.
587
+
588
+ Note that when you specify multiple rules in the `.spec.succeessPolicy.rules`,
589
+ the job controller evaluates the rules in order. Once the Job meets a rule, the job controller ignores remaining rules.
590
+
591
+ Here is a manifest for a Job with `successPolicy` :
592
+
593
+ {{% code_sample file="/controllers/job-success-policy.yaml" %}}
594
+
595
+ In the example above, the rule of the success policy specifies that
596
+ the Job should be marked succeeded and terminate the lingering Pods
597
+ if one of the 0, 2, and 3 indexes succeeded.
598
+ The Job that met the success policy gets the `SuccessCriteriaMet` condition.
599
+ After the removal of the lingering Pods is issued, the Job gets the `Complete` condition.
600
+
601
+ Note that the `succeededIndexes` is represented as intervals separated by a hyphen.
602
+ The number are listed in represented by the first and last element of the series, separated by a hyphen.
603
+
604
+ {{< note >}}
605
+ When you specify both a success policy and some terminating policies such as `.spec.backoffLimit` and `.spec.podFailurePolicy`,
606
+ once the Job meets either policy, the job controller respects the terminating policy and ignores the success policy.
607
+ {{< /note >}}
608
+
553
609
# # Job termination and cleanup
554
610
555
611
When a Job completes, no more Pods are created, but the Pods are [usually](#pod-backoff-failure-policy) not deleted either.
0 commit comments