@@ -436,12 +436,22 @@ kubectl get -o yaml job job-backoff-limit-per-index-example
436
436
succeeded : 5 # 1 succeeded pod for each of 5 succeeded indexes
437
437
failed : 10 # 2 failed pods (1 retry) for each of 5 failed indexes
438
438
conditions :
439
+ - message : Job has failed indexes
440
+ reason : FailedIndexes
441
+ status : " True"
442
+ type : FailureTarget
439
443
- message : Job has failed indexes
440
444
reason : FailedIndexes
441
445
status : " True"
442
446
type : Failed
443
447
` ` `
444
448
449
+ The Job controller adds the ` FailureTarget` Job condition to trigger
450
+ [Job termination and cleanup](#job-termination-and-cleanup). When all of the
451
+ Job Pods are terminated, the Job controller adds the `Failed` condition
452
+ with the same values for `reason` and `message` as the `FailureTarget` Job
453
+ condition. For details, see [Termination of Job Pods](#termination-of-job-pods).
454
+
445
455
Additionally, you may want to use the per-index backoff along with a
446
456
[pod failure policy](#pod-failure-policy). When using
447
457
per-index backoff, there is a new `FailIndex` action available which allows you to
@@ -541,6 +551,11 @@ terminating Pods only once these Pods reach the terminal `Failed` phase. This be
541
551
to `podReplacementPolicy : Failed`. For more information, see [Pod replacement policy](#pod-replacement-policy).
542
552
{{< /note >}}
543
553
554
+ When you use the `podFailurePolicy`, and the Job fails due to the pod
555
+ matching the rule with the `FailJob` action, then the Job controller triggers
556
+ the Job termination process by adding the `FailureTarget` condition.
557
+ For more details, see [Job termination and cleanup](#job-termination-and-cleanup).
558
+
544
559
# # Success policy {#success-policy}
545
560
546
561
{{< feature-state feature_gate_name="JobSuccessPolicy" >}}
@@ -647,6 +662,70 @@ there is no automatic Job restart once the Job status is `type: Failed`.
647
662
That is, the Job termination mechanisms activated with `.spec.activeDeadlineSeconds`
648
663
and `.spec.backoffLimit` result in a permanent Job failure that requires manual intervention to resolve.
649
664
665
+ # ## Terminal Job conditions
666
+
667
+ A Job has two possible terminal states, each of which has a corresponding Job
668
+ condition :
669
+ * Succeeded: Job condition `Complete`
670
+ * Failed: Job condition `Failed`
671
+
672
+ Jobs fail for the following reasons :
673
+ - The number of Pod failures exceeded the specified `.spec.backoffLimit` in the Job
674
+ specification. For details, see [Pod backoff failure policy](#pod-backoff-failure-policy).
675
+ - The Job runtime exceeded the specified `.spec.activeDeadlineSeconds`
676
+ - An indexed Job that used `.spec.backoffLimitPerIndex` has failed indexes.
677
+ For details, see [Backoff limit per index](#backoff-limit-per-index).
678
+ - The number of failed indexes in the Job exceeded the specified
679
+ ` spec.maxFailedIndexes` . For details, see [Backoff limit per index](#backoff-limit-per-index)
680
+ - A failed Pod matches a rule in `.spec.podFailurePolicy` that has the `FailJob`
681
+ action. For details about how Pod failure policy rules might affect failure
682
+ evaluation, see [Pod failure policy](#pod-failure-policy).
683
+
684
+ Jobs succeed for the following reasons :
685
+ - The number of succeeded Pods reached the specified `.spec.completions`
686
+ - The criteria specified in `.spec.successPolicy` are met. For details, see
687
+ [Success policy](#success-policy).
688
+
689
+ In Kubernetes v1.31 and later the Job controller delays the addition of the
690
+ terminal conditions,`Failed` or `Complete`, until all of the Job Pods are terminated.
691
+
692
+ In Kubernetes v1.30 and earlier, the Job controller added the `Complete` or the
693
+ ` Failed` Job terminal conditions as soon as the Job termination process was
694
+ triggered and all Pod finalizers were removed. However, some Pods would still
695
+ be running or terminating at the moment that the terminal condition was added.
696
+
697
+ In Kubernetes v1.31 and later, the controller only adds the Job terminal conditions
698
+ _after_ all of the Pods are terminated. You can enable this behavior by using the
699
+ ` JobManagedBy` or the `JobPodReplacementPolicy` (enabled by default)
700
+ [feature gates](/docs/reference/command-line-tools-reference/feature-gates/).
701
+
702
+ # ## Termination of Job pods
703
+
704
+ The Job controller adds the `FailureTarget` condition or the `SuccessCriteriaMet`
705
+ condition to the Job to trigger Pod termination after a Job meets either the
706
+ success or failure criteria.
707
+
708
+ Factors like `terminationGracePeriodSeconds` might increase the amount of time
709
+ from the moment that the Job controller adds the `FailureTarget` condition or the
710
+ ` SuccessCriteriaMet` condition to the moment that all of the Job Pods terminate
711
+ and the Job controller adds a [terminal condition](#terminal-job-conditions)
712
+ (`Failed` or `Complete`).
713
+
714
+ You can use the `FailureTarget` or the `SuccessCriteriaMet` condition to evaluate
715
+ whether the Job has failed or succeeded without having to wait for the controller
716
+ to add a terminal condition.
717
+
718
+ For example, you might want to decide when to create a replacement Job
719
+ that replaces a failed Job. If you replace the failed Job when the `FailureTarget`
720
+ condition appears, your replacement Job runs sooner, but could result in Pods
721
+ from the failed and the replacement Job running at the same time, using
722
+ extra compute resources.
723
+
724
+ Alternatively, if your cluster has limited resource capacity, you could choose to
725
+ wait until the `Failed` condition appears on the Job, which would delay your
726
+ replacement Job but would ensure that you conserve resources by waiting
727
+ until all of the failed Pods are removed.
728
+
650
729
# # Clean up finished jobs automatically
651
730
652
731
Finished Jobs are usually no longer needed in the system. Keeping them around in
0 commit comments