Skip to content

Commit 7bf346a

Browse files
mimowoatiratreeshannonxtremeTim Bannister
committed
Address review remarks
Co-authored-by: Filip Křepinský <[email protected]> Co-authored-by: Shannon Kularathna <[email protected]> Co-authored-by: Tim Bannister <[email protected]>
1 parent 67fe8ed commit 7bf346a

File tree

2 files changed

+69
-50
lines changed

2 files changed

+69
-50
lines changed

content/en/docs/concepts/workloads/controllers/job.md

Lines changed: 60 additions & 46 deletions
Original file line numberDiff line numberDiff line change
@@ -438,15 +438,21 @@ kubectl get -o yaml job job-backoff-limit-per-index-example
438438
succeeded: 5 # 1 succeeded pod for each of 5 succeeded indexes
439439
failed: 10 # 2 failed pods (1 retry) for each of 5 failed indexes
440440
conditions:
441+
- message: Job has failed indexes
442+
reason: FailedIndexes
443+
status: "True"
444+
type: FailureTarget
441445
- message: Job has failed indexes
442446
reason: FailedIndexes
443447
status: "True"
444448
type: Failed
445449
```
446450
447-
Note that, since v1.31, you will also observe in the status the `FailureTarget`
448-
Job condition, with the same `reason` and `message` as for the the `Failed`
449-
condition (see also [Job termination and cleanup](#job-termination-and-cleanup)).
451+
The Job controller adds the `FailureTarget` Job condition to trigger
452+
[Job termination and cleanup](#job-termination-and-cleanup). The
453+
`Failed` condition has the same values for `reason` and `message` as the
454+
`FailureTarget` Job condition, but is added to the Job at the moment all Pods
455+
are terminated; for details see [Termination of Job pods](#termination-of-job-pods).
450456

451457
Additionally, you may want to use the per-index backoff along with a
452458
[pod failure policy](#pod-failure-policy). When using
@@ -560,7 +566,7 @@ to `podReplacementPolicy: Failed`. For more information, see [Pod replacement po
560566
When you use the `podFailurePolicy`, and the Job fails due to the pod
561567
matching the rule with the `FailJob` action, then the Job controller triggers
562568
the Job termination process by adding the `FailureTarget` condition.
563-
See [Job termination and cleanup](#job-termination-and-cleanup) for more details.
569+
For more details, see [Job termination and cleanup](#job-termination-and-cleanup).
564570

565571
## Success policy {#success-policy}
566572

@@ -670,42 +676,64 @@ and `.spec.backoffLimit` result in a permanent Job failure that requires manual
670676

671677
### Terminal Job conditions
672678

673-
A Job has two possible terminal states, it ends up either succeeded, or failed,
674-
and these states are reflected by the presence of the Job conditions `Complete`
675-
or `Failed`, respectively.
679+
A Job has two possible terminal states, each of which has a corresponding Job
680+
condition:
681+
* Succeeded: Job condition `Complete`
682+
* Failed: Job condition `Failed`.
683+
684+
The possible reasons for a Job failure:
685+
- The number of Pod failures exceeded the specified `.spec.backoffLimit` in the Job
686+
specification. For details, see [Pod backoff failure policy](#pod-backoff-failure-policy).
687+
- The Job runtime exceeded the specified `.spec.activeDeadlineSeconds`
688+
- An indexed Job that used `.spec.backoffLimitPerIndex` has failed indexes.
689+
For details, see [Backoff limit per index](#backoff-limit-per-index).
690+
- The number of failed indexes in the Job exceeded the specified
691+
`spec.maxFailedIndexes`. For details, see [Backoff limit per index](#backoff-limit-per-index)
692+
- A failed Pod matches a rule in `.spec.podFailurePolicy` that has the `FailJob`
693+
action. For details about how Pod failure policy rules might affect failure
694+
evaluation, see [Pod failure policy](#pod-failure-policy).
695+
696+
The possible reasons for a Job success:
697+
- The number of succeeded Pods reached the specified `.spec.completions`
698+
- The criteria specified in `.spec.successPolicy` are met. For details, see
699+
[Success policy](#success-policy).
700+
701+
In Kubernetes v1.31 and later the Job controller delays the addition of the
702+
terminal conditions,`Failed` or `Succeeded`, until all pods are terminated.
676703

677-
The failure scenarios encompass:
678-
- the `.spec.backoffLimit`
679-
- the `.spec.activeDeadlineSeconds` is exceeded
680-
- the `.spec.backoffLimitPerIndex` is exceeded (see [Backoff limit per index](#backoff-limit-per-index))
681-
- the Pod matches the Job Pod Failure Policy rule with the `FailJob` action (see more [Pod failure policy](#pod-failure-policy))
704+
{{< note >}}
705+
In Kubernetes v1.30 and earlier, Job terminal conditions were added when the Job
706+
termination process is triggered, and all Pod finalizers are removed, but some
707+
pods may still remain running/terminating at that point in time.
682708

683-
The success scenarios encompass:
684-
- the `.spec.completions` is reached
685-
- the criteria specified by the Job Success Policy are met (see more [Success policy](#success-policy))
709+
The change of the behavior is activated by enablement of the `JobManagedBy` or
710+
`JobPodReplacementPolicy` (enabled by default)
711+
[feature gates](/docs/reference/command-line-tools-reference/feature-gates/).
712+
{{< /note >}}
686713

687714
### Termination of Job pods
688715

689-
Prior to v1.31 the Job terminal conditions are added when the Job termination
690-
process is triggered, and all Pod finalizers are removed, but some pods may
691-
still remain running at that point in time.
716+
The Job controller adds the `FailureTarget` condition or the `SuccessCriteriaMet`
717+
condition to the Job to trigger Pod termination after a Job meets either the
718+
success or failure criteria.
692719

693-
Since v1.31, when you enable either the `JobManagedBy` or
694-
`JobPodReplacementPolicy` (enabled by default)
695-
[feature gate](/docs/reference/command-line-tools-reference/feature-gates/), the
696-
Job controller awaits for termination of all pods before adding a condition
697-
indicating that the Job is finished (either `Complete` or `Failed`).
720+
Factors like `terminationGracePeriodSeconds` might increase the amount of time
721+
from the moment that the Job controller adds the `FailureTarget` condition or the
722+
`SuccessCriteriaMet` condition to the moment that all of the Job Pods terminate
723+
and the Job controller adds a [terminal condition](#terminal-job-conditions)
724+
(`Failed` or `Complete`).
698725

699-
Note that, the process of terminating all pods may take a substantial amount
700-
of time, depending on a Pod's `terminationGracePeriodSeconds` (see
701-
[Pod termination](#docs/concepts/workloads/pods/pod-lifecycle/#pod-termination)),
702-
and thus adding the terminal Job condition, even if the fate of the Job is
703-
already determined.
726+
You can use the `FailureTarget` or the `SuccessCriteriaMet` condition to evaluate
727+
whether the Job has failed or succeeded without having to wait for the controller
728+
to add a terminal condition.
704729

705-
If you want to know the fate of the Job as soon as determined you can use,
706-
since v1.31, the `FailureTarget` and `SuccessCriteriaMet` conditions, which
707-
cover all scenarios in which Job controller triggers the Job termination process
708-
(see [Terminal Job conditions](#terminal-job-conditions)).
730+
{{< note >}}
731+
For example, you can use the `FailureTarget` condition to quickly decide whether
732+
to create a replacement Job, but it could result in Pods from the failing and
733+
replacement Jobs running at the same time for a while. Thus, if your cluster
734+
capacity is limited, you may prefer to wait for the `Failed` condition before
735+
creating the replacement Job.
736+
{{< /note >}}
709737

710738
## Clean up finished jobs automatically
711739

@@ -1111,13 +1139,6 @@ status:
11111139
terminating: 3 # three Pods are terminating and have not yet reached the Failed phase
11121140
```
11131141
1114-
{{< note >}}
1115-
Since v1.31, when you enable the `JobPodReplacementPolicy`
1116-
[feature gate](/docs/reference/command-line-tools-reference/feature-gates/)
1117-
(enabled by default), the Job controller awaits for termination of all pods
1118-
before marking a Job as terminal (see [Termination of Job Pods](#termination-of-job-pods)).
1119-
{{< /note >}}
1120-
11211142
### Delegation of managing a Job object to external controller
11221143
11231144
{{< feature-state feature_gate_name="JobManagedBy" >}}
@@ -1162,13 +1183,6 @@ after the operation: the built-in Job controller and the external controller
11621183
indicated by the field value.
11631184
{{< /warning >}}
11641185

1165-
{{< note >}}
1166-
Since v1.31, when you enable the `JobManagedBy`
1167-
[feature gate](/docs/reference/command-line-tools-reference/feature-gates/),
1168-
the Job controller awaits for termination of all pods before marking a Job as
1169-
terminal (see [Termination of Job Pods](#termination-of-job-pods)).
1170-
{{< /note >}}
1171-
11721186
## Alternatives
11731187

11741188
### Bare Pods

content/en/docs/tasks/job/pod-failure-policy.md

Lines changed: 9 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -53,10 +53,15 @@ After around 30s the entire Job should be terminated. Inspect the status of the
5353
kubectl get jobs -l job-name=job-pod-failure-policy-failjob -o yaml
5454
```
5555

56-
In the Job status, see a job `Failed` condition with the field `reason`
57-
equal `PodFailurePolicy`. Additionally, the `message` field contains a
58-
more detailed information about the Job termination, such as:
59-
`Container main for pod default/job-pod-failure-policy-failjob-8ckj8 failed with exit code 42 matching FailJob rule at index 0`.
56+
In the Job status, the following conditions display:
57+
- `FailureTarget` condition: has a `reason` field set to `PodFailurePolicy` and
58+
a `message` field with more information about the termination, like
59+
`Container main for pod default/job-pod-failure-policy-failjob-8ckj8 failed with exit code 42 matching FailJob rule at index 0`.
60+
The Job controller adds this condition as soon as the Job is considered a failure.
61+
For details, see [Termination of Job Pods](/docs/concepts/workloads/controllers/job/#termination-of-job-pods).
62+
- `Failed` condition: same `reason` and `message` as the `FailureTarget`
63+
condition. The Job controller adds this condition after all of the Job's Pods
64+
are terminated.
6065

6166
For comparison, if the Pod failure policy was disabled it would take 6 retries
6267
of the Pod, taking at least 2 minutes.

0 commit comments

Comments
 (0)