Skip to content

Commit 1fab6ea

Browse files
mimoworeylejanosoltyshTim Bannisteralculquicondor
committed
Remarks to the Job update blogpost
Co-authored-by: Rey Lejano <[email protected]> Co-authored-by: Maciej Szulik <[email protected]> Co-authored-by: Tim Bannister <[email protected]> Co-authored-by: Aldo Culquicondor <[email protected]> Co-authored-by: Paola Cortés <[email protected]>
1 parent 35565df commit 1fab6ea

File tree

1 file changed

+61
-44
lines changed

1 file changed

+61
-44
lines changed

content/en/blog/_posts/2023-07-27-job-update-post.md renamed to content/en/blog/_posts/2023-08-21-job-update-post.md

Lines changed: 61 additions & 44 deletions
Original file line numberDiff line numberDiff line change
@@ -1,42 +1,47 @@
11
---
22
layout: blog
3-
title: "Kubernetes 1.28: New Job features"
4-
date: 2023-08-15
3+
title: "Kubernetes 1.28: Improved failure handling for Jobs"
4+
date: 2023-08-21
55
slug: kubernetes-1-28-jobapi-update
66
---
77

88
**Authors:** Kevin Hannon (G-Research), Michał Woźniak (Google)
99

1010
This blog discusses two new features in Kubernetes 1.28 to improve Jobs for batch
11-
users: [PodReplacementPolicy](https://github.com/kubernetes/enhancements/tree/master/keps/sig-apps/3939-allow-replacement-when-fully-terminated)
12-
and [BackoffLimitPerIndex](https://github.com/kubernetes/enhancements/tree/master/keps/sig-apps/3850-backoff-limits-per-index-for-indexed-jobs).
11+
users: [Pod replacement policy](/docs/concepts/workloads/controllers/job/#pod-replacement-policy)
12+
and [Backoff limit per index](/docs/concepts/workloads/controllers/job/#backoff-limit-per-index).
1313

14-
## Pod Replacement Policy
14+
These features continue the effort started by the
15+
[Pod failure policy](/docs/concepts/workloads/controllers/job/#pod-failure-policy)
16+
to improve the handling of Pod failures in a Job.
1517

16-
### What problem does this solve?
18+
## Pod replacement policy {#pod-replacement-policy}
1719

1820
By default, when a pod enters a terminating state (e.g. due to preemption or
19-
eviction), a replacement pod is created immediately, and both pods are running
20-
at the same time.
21+
eviction), Kubernetes immediately creates a replacement Pod. Therefore, both Pods are running
22+
at the same time. In API terms, a pod is considered terminating when it has a
23+
`deletionTimestamp` and it has a phase `Pending` or `Running`.
2124

22-
This is problematic for some popular machine learning frameworks, such as
23-
TensorFlow and [JAX](https://jax.readthedocs.io/en/latest/), which require at most one pod running at the same time,
25+
The scenario when two Pods are running at a given time is problematic for
26+
some popular machine learning frameworks, such as
27+
TensorFlow and [JAX](https://jax.readthedocs.io/en/latest/), which require at most one Pod running at the same time,
2428
for a given index (see more details in the [issue](https://github.com/kubernetes/kubernetes/issues/115844)).
2529

2630
Creating the replacement Pod before the previous one fully terminates can also
27-
cause problems in clusters with scarce resources or with tight budgets. These
28-
resources can be difficult to obtain so pods can take a long time to find
29-
resources and they may only be able to find nodes until the existing pods are
30-
fully terminated. Further, if cluster autoscaler is enabled, the replacement
31-
Pods might produce undesired scale ups.
31+
cause problems in clusters with scarce resources or with tight budgets, such as:
32+
* cluster resources can be difficult to obtain for Pods pending to be scheduled,
33+
as Kubernetes might take a long time to find available nodes until the existing
34+
Pods are fully terminated.
35+
* if cluster autoscaler is enabled, the replacement Pods might produce undesired
36+
scale ups.
3237

33-
### How can I use it
38+
### How can you use it? {#pod-replacement-policy-how-to-use}
3439

35-
This is an alpha feature, which you can enable by enabling the `JobPodReplacementPolicy`
40+
This is an alpha feature, which you can enable by turning on `JobPodReplacementPolicy`
3641
[feature gate](/docs/reference/command-line-tools-reference/feature-gates/) in
3742
your cluster.
3843

39-
Once the feature is enabled you can use it by creating a new Job, which specifies
44+
Once the feature is enabled in your cluster, you can use it by creating a new Job that specifies a
4045
`podReplacementPolicy` field as shown here:
4146

4247
```yaml
@@ -49,6 +54,9 @@ spec:
4954
...
5055
```
5156

57+
In that Job, the Pods would only be replaced once they reached the `Failed` phase,
58+
and not when they are terminating.
59+
5260
Additionally, you can inspect the `.status.terminating` field of a Job. The value
5361
of the field is the number of Pods owned by the Job that are currently terminating.
5462

@@ -64,50 +72,49 @@ status:
6472
```
6573
6674
This can be particularly useful for external queueing controllers, such as
67-
[Kueue](https://github.com/kubernetes-sigs/kueue), that would calculate the
68-
quota and suspend the start of a new Job until the resources are reclaimed from
75+
[Kueue](https://github.com/kubernetes-sigs/kueue), that tracks quota
76+
from running Pods of a Job until the resources are reclaimed from
6977
the currently terminating Job.
7078
71-
### How can I learn more?
72-
73-
- Read the KEP: [PodReplacementPolicy](https://github.com/kubernetes/enhancements/tree/master/keps/sig-apps/3939-allow-replacement-when-fully-terminated)
74-
75-
## Job Backoff Limit per Index
79+
Note that the `podReplacementPolicy: Failed` is the default when using a custom
80+
[Pod failure policy](/docs/concepts/workloads/controllers/job/#pod-failure-policy).
7681

77-
### What problem does this solve?
82+
## Backoff limit per index {#backoff-limit-per-index}
7883

79-
By default, pod failures for [Indexed Jobs](/docs/concepts/workloads/controllers/job/#completion-mode)
84+
By default, Pod failures for [Indexed Jobs](/docs/concepts/workloads/controllers/job/#completion-mode)
8085
are counted towards the global limit of retries, represented by `.spec.backoffLimit`.
8186
This means, that if there is a consistently failing index, it is restarted
82-
repeatedly until it exhausts the limit. Once the limit is exceeded the entire
87+
repeatedly until it exhausts the limit. Once the limit is reached the entire
8388
Job is marked failed and some indexes may never be even started.
8489

85-
This is problematic for use cases where you want to handle pod failures for
90+
This is problematic for use cases where you want to handle Pod failures for
8691
every index independently. For example, if you use Indexed Jobs for running
8792
integration tests where each index corresponds to a testing suite. In that case,
8893
you may want to account for possible flake tests allowing for 1 or 2 retries per
89-
suite. Additionally, there might be some buggy suites, making the corresponding
90-
indexes fail consistently. In that case you may prefer to terminate retries for
91-
that indexes, yet allowing other suites to complete.
94+
suite. There might be some buggy suites, making the corresponding
95+
indexes fail consistently. In that case you may prefer to limit retries for
96+
the buggy suites, yet allowing other suites to complete.
9297

9398
The feature allows you to:
94-
* complete execution of all indexes, despite some indexes failing,
99+
* complete execution of all indexes, despite some indexes failing.
95100
* better utilize the computational resources by avoiding unnecessary retries of consistently failing indexes.
96101

97-
### How to use it?
102+
### How can you use it? {#backoff-limit-per-index-how-to-use}
98103

99-
This is an alpha feature, which you can enable by enabling the
104+
This is an alpha feature, which you can enable by turning on the
100105
`JobBackoffLimitPerIndex`
101106
[feature gate](/docs/reference/command-line-tools-reference/feature-gates/)
102107
in your cluster.
103108

104-
Once the feature is enabled, you can create an Indexed Job with the
109+
Once the feature is enabled in your cluster, you can create an Indexed Job with the
105110
`.spec.backoffLimitPerIndex` field specified.
106111

107112
#### Example
108113

109114
The following example demonstrates how to use this feature to make sure the
110-
Job executes all indexes, and the number of failures is controller per index.
115+
Job executes all indexes (provided there is no other reason for the early Job
116+
termination, such as reaching the `activeDeadlineSeconds` timeout, or being
117+
manually deleted by the user), and the number of failures is controlled per index.
111118

112119
```yaml
113120
apiVersion: batch/v1
@@ -136,7 +143,7 @@ spec:
136143
time.sleep(1)
137144
```
138145

139-
Now, inspect the pods after the job is finished:
146+
Now, inspect the Pods after the job is finished:
140147

141148
```sh
142149
kubectl get pods -l job-name=job-backoff-limit-per-index-execute-all
@@ -157,13 +164,13 @@ job-backoff-limit-per-index-execute-all-6-tbkr8 0/1 Completed 0
157164
job-backoff-limit-per-index-execute-all-7-hxjsq 0/1 Completed 0 22s
158165
```
159166

160-
Additionally, let's take a look at the job status:
167+
Additionally, you can take a look at the status for that Job:
161168

162169
```sh
163170
kubectl get jobs job-backoff-limit-per-index-fail-index -o yaml
164171
```
165172

166-
Returns output similar to this:
173+
The output ends with a `status` similar to:
167174

168175
```yaml
169176
status:
@@ -185,19 +192,29 @@ then the buggy indexes would retry until the global `backoffLimit` was exceeded,
185192
and then the entire Job would be marked failed, before some of the higher
186193
indexes are started.
187194

188-
### Getting Involved
195+
## How can you learn more?
196+
197+
- Read the user-facing documentation for [Pod replacement policy](/docs/concepts/workloads/controllers/job/#pod-replacement-policy),
198+
[Backoff limit per index](/docs/concepts/workloads/controllers/job/#backoff-limit-per-index), and
199+
[Pod failure policy](/docs/concepts/workloads/controllers/job/#pod-failure-policy)
200+
- Read the KEPs for [Pod Replacement Policy](https://github.com/kubernetes/enhancements/tree/master/keps/sig-apps/3939-allow-replacement-when-fully-terminated),
201+
[Backoff limit per index](https://github.com/kubernetes/enhancements/tree/master/keps/sig-apps/3850-backoff-limits-per-index-for-indexed-jobs), and
202+
[Pod failure policy](https://github.com/kubernetes/enhancements/tree/master/keps/sig-apps/3329-retriable-and-non-retriable-failures).
203+
204+
## Getting Involved
189205

190-
These features were sponsored under the domain of SIG Apps. Batch is actively
206+
These features were sponsored by [SIG Apps](https://github.com/kubernetes/community/tree/master/sig-apps). Batch use cases are actively
191207
being improved for Kubernetes users in the
192208
[batch working group](https://github.com/kubernetes/community/tree/master/wg-batch).
193209
Working groups are relatively short-lived initiatives focused on specific goals.
194-
In the case of Batch, the goal is to improve/support batch users and enhance the
210+
The goal of the WG Batch is to improve experience for batch workload users, offer support for
211+
batch processing use cases, and enhance the
195212
Job API for common use cases. If that interests you, please join the working
196213
group either by subscriping to our
197214
[mailing list](https://groups.google.com/a/kubernetes.io/g/wg-batch) or on
198215
[Slack](https://kubernetes.slack.com/messages/wg-batch).
199216

200-
### Acknowledgments
217+
## Acknowledgments
201218

202219
As with any Kubernetes feature, multiple people contributed to getting this
203220
done, from testing and filing bugs to reviewing code.

0 commit comments

Comments
 (0)