@@ -438,15 +438,21 @@ kubectl get -o yaml job job-backoff-limit-per-index-example
438
438
succeeded : 5 # 1 succeeded pod for each of 5 succeeded indexes
439
439
failed : 10 # 2 failed pods (1 retry) for each of 5 failed indexes
440
440
conditions :
441
+ - message : Job has failed indexes
442
+ reason : FailedIndexes
443
+ status : " True"
444
+ type : FailureTarget
441
445
- message : Job has failed indexes
442
446
reason : FailedIndexes
443
447
status : " True"
444
448
type : Failed
445
449
` ` `
446
450
447
- Note that, since v1.31, you will also observe in the status the ` FailureTarget`
448
- Job condition, with the same `reason` and `message` as for the the `Failed`
449
- condition (see also [Job termination and cleanup](#job-termination-and-cleanup)).
451
+ The Job controller adds the ` FailureTarget` Job condition to trigger
452
+ [Job termination and cleanup](#job-termination-and-cleanup). The
453
+ ` Failed` condition has the same values for `reason` and `message` as the
454
+ ` FailureTarget` Job condition, but is added to the Job at the moment all Pods
455
+ are terminated; for details see [Termination of Job pods](#termination-of-job-pods).
450
456
451
457
Additionally, you may want to use the per-index backoff along with a
452
458
[pod failure policy](#pod-failure-policy). When using
@@ -560,7 +566,7 @@ to `podReplacementPolicy: Failed`. For more information, see [Pod replacement po
560
566
When you use the `podFailurePolicy`, and the Job fails due to the pod
561
567
matching the rule with the `FailJob` action, then the Job controller triggers
562
568
the Job termination process by adding the `FailureTarget` condition.
563
- See [Job termination and cleanup](#job-termination-and-cleanup) for more details .
569
+ For more details, see [Job termination and cleanup](#job-termination-and-cleanup).
564
570
565
571
# # Success policy {#success-policy}
566
572
@@ -670,42 +676,64 @@ and `.spec.backoffLimit` result in a permanent Job failure that requires manual
670
676
671
677
# ## Terminal Job conditions
672
678
673
- A Job has two possible terminal states, it ends up either succeeded, or failed,
674
- and these states are reflected by the presence of the Job conditions `Complete`
675
- or `Failed`, respectively.
679
+ A Job has two possible terminal states, each of which has a corresponding Job
680
+ condition :
681
+ * Succeeded: Job condition `Complete`
682
+ * Failed: Job condition `Failed`.
683
+
684
+ The possible reasons for a Job failure :
685
+ - The number of Pod failures exceeded the specified `.spec.backoffLimit` in the Job
686
+ specification. For details, see [Pod backoff failure policy](#pod-backoff-failure-policy).
687
+ - The Job runtime exceeded the specified `.spec.activeDeadlineSeconds`
688
+ - An indexed Job that used `.spec.backoffLimitPerIndex` has failed indexes.
689
+ For details, see [Backoff limit per index](#backoff-limit-per-index).
690
+ - The number of failed indexes in the Job exceeded the specified
691
+ ` spec.maxFailedIndexes` . For details, see [Backoff limit per index](#backoff-limit-per-index)
692
+ - A failed Pod matches a rule in `.spec.podFailurePolicy` that has the `FailJob`
693
+ action. For details about how Pod failure policy rules might affect failure
694
+ evaluation, see [Pod failure policy](#pod-failure-policy).
695
+
696
+ The possible reasons for a Job success :
697
+ - The number of succeeded Pods reached the specified `.spec.completions`
698
+ - The criteria specified in `.spec.successPolicy` are met. For details, see
699
+ [Success policy](#success-policy).
700
+
701
+ In Kubernetes v1.31 and later the Job controller delays the addition of the
702
+ terminal conditions,`Failed` or `Succeeded`, until all pods are terminated.
676
703
677
- The failure scenarios encompass :
678
- - the `.spec.backoffLimit`
679
- - the `.spec.activeDeadlineSeconds` is exceeded
680
- - the `.spec.backoffLimitPerIndex` is exceeded (see [Backoff limit per index](#backoff-limit-per-index))
681
- - the Pod matches the Job Pod Failure Policy rule with the `FailJob` action (see more [Pod failure policy](#pod-failure-policy))
704
+ {{< note >}}
705
+ In Kubernetes v1.30 and earlier, Job terminal conditions were added when the Job
706
+ termination process is triggered, and all Pod finalizers are removed, but some
707
+ pods may still remain running/terminating at that point in time.
682
708
683
- The success scenarios encompass :
684
- - the `.spec.completions` is reached
685
- - the criteria specified by the Job Success Policy are met (see more [Success policy](#success-policy))
709
+ The change of the behavior is activated by enablement of the `JobManagedBy` or
710
+ ` JobPodReplacementPolicy` (enabled by default)
711
+ [feature gates](/docs/reference/command-line-tools-reference/feature-gates/).
712
+ {{< /note >}}
686
713
687
714
# ## Termination of Job pods
688
715
689
- Prior to v1.31 the Job terminal conditions are added when the Job termination
690
- process is triggered, and all Pod finalizers are removed, but some pods may
691
- still remain running at that point in time .
716
+ The Job controller adds the `FailureTarget` condition or the `SuccessCriteriaMet`
717
+ condition to the Job to trigger Pod termination after a Job meets either the
718
+ success or failure criteria .
692
719
693
- Since v1.31, when you enable either the `JobManagedBy` or
694
- ` JobPodReplacementPolicy ` (enabled by default)
695
- [feature gate](/docs/reference/command-line-tools-reference/feature-gates/), the
696
- Job controller awaits for termination of all pods before adding a condition
697
- indicating that the Job is finished (either `Complete ` or `Failed `).
720
+ Factors like `terminationGracePeriodSeconds` might increase the amount of time
721
+ from the moment that the Job controller adds the `FailureTarget` condition or the
722
+ ` SuccessCriteriaMet ` condition to the moment that all of the Job Pods terminate
723
+ and the Job controller adds a [terminal condition](#terminal-job-conditions)
724
+ (`Failed ` or `Complete `).
698
725
699
- Note that, the process of terminating all pods may take a substantial amount
700
- of time, depending on a Pod's `terminationGracePeriodSeconds` (see
701
- [Pod termination](#docs/concepts/workloads/pods/pod-lifecycle/#pod-termination)),
702
- and thus adding the terminal Job condition, even if the fate of the Job is
703
- already determined.
726
+ You can use the `FailureTarget` or the `SuccessCriteriaMet` condition to evaluate
727
+ whether the Job has failed or succeeded without having to wait for the controller
728
+ to add a terminal condition.
704
729
705
- If you want to know the fate of the Job as soon as determined you can use,
706
- since v1.31, the `FailureTarget` and `SuccessCriteriaMet` conditions, which
707
- cover all scenarios in which Job controller triggers the Job termination process
708
- (see [Terminal Job conditions](#terminal-job-conditions)).
730
+ {{< note >}}
731
+ For example, you can use the `FailureTarget` condition to quickly decide whether
732
+ to create a replacement Job, but it could result in Pods from the failing and
733
+ replacement Jobs running at the same time for a while. Thus, if your cluster
734
+ capacity is limited, you may prefer to wait for the `Failed` condition before
735
+ creating the replacement Job.
736
+ {{< /note >}}
709
737
710
738
# # Clean up finished jobs automatically
711
739
@@ -1111,13 +1139,6 @@ status:
1111
1139
terminating : 3 # three Pods are terminating and have not yet reached the Failed phase
1112
1140
` ` `
1113
1141
1114
- {{< note >}}
1115
- Since v1.31, when you enable the ` JobPodReplacementPolicy`
1116
- [feature gate](/docs/reference/command-line-tools-reference/feature-gates/)
1117
- (enabled by default), the Job controller awaits for termination of all pods
1118
- before marking a Job as terminal (see [Termination of Job Pods](#termination-of-job-pods)).
1119
- {{< /note >}}
1120
-
1121
1142
### Delegation of managing a Job object to external controller
1122
1143
1123
1144
{{< feature-state feature_gate_name="JobManagedBy" >}}
@@ -1162,13 +1183,6 @@ after the operation: the built-in Job controller and the external controller
1162
1183
indicated by the field value.
1163
1184
{{< /warning >}}
1164
1185
1165
- {{< note >}}
1166
- Since v1.31, when you enable the `JobManagedBy`
1167
- [feature gate](/docs/reference/command-line-tools-reference/feature-gates/),
1168
- the Job controller awaits for termination of all pods before marking a Job as
1169
- terminal (see [Termination of Job Pods](#termination-of-job-pods)).
1170
- {{< /note >}}
1171
-
1172
1186
# # Alternatives
1173
1187
1174
1188
# ## Bare Pods
0 commit comments