Skip to content

Commit e2020e0

Browse files
committed
JobPodReplacementPolicy Promoted To GA Blog Post
Signed-off-by: Dejan Zele Pejchev <[email protected]>
1 parent 4facaa9 commit e2020e0

File tree

1 file changed

+107
-0
lines changed

1 file changed

+107
-0
lines changed
Lines changed: 107 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,107 @@
1+
---
2+
layout: blog
3+
title: "Kubernetes v1.34: Pod Replacement Policy for Jobs Goes GA"
4+
date: 2025-0X-XX
5+
draft: true
6+
slug: kubernetes-v1-34-pod-replacement-policy-for-jobs-goes-ga
7+
author: >
8+
[Dejan Zele Pejchev](https://github.com/dejanzele) (G-Research)
9+
---
10+
11+
In Kubernetes v1.34, the _Pod Replacement Policy_ feature reaches general availability (GA).
12+
This blog post describes the Pod Replacement Policy feature and how to use it in your Jobs.
13+
14+
## About Pod Replacement Policy
15+
16+
By default, the Job controller immediately recreates Pods as soon as they fail or begin terminating (when they have a deletion timestamp).
17+
18+
As a result, while some Pods are terminating, the total number of running Pods for a Job can temporarily exceed the specified parallelism.
19+
For Indexed Jobs, this can even mean multiple Pods running for the same index at the same time.
20+
21+
This behavior works fine for many workloads, but it can cause problems in certain cases.
22+
23+
For example, popular machine learning frameworks like TensorFlow and
24+
[JAX](https://jax.readthedocs.io/en/latest/) expect exactly one Pod per worker index.
25+
If two Pods run at the same time, you might encounter errors such as:
26+
```
27+
/job:worker/task:4: Duplicate task registration with task_name=/job:worker/replica:0/task:4
28+
```
29+
30+
Additionally, starting replacement Pods before the old ones fully terminate can lead to:
31+
- Scheduling delays by kube-scheduler as the nodes remain occupied.
32+
- Unnecessary cluster scale-ups to accommodate the replacement Pods.
33+
- Temporary bypassing of quota checks by workload orchestrators like [Kueue](https://kueue.sigs.k8s.io/).
34+
35+
The _Pod Replacement Policy_ feature gives you control over when Kubernetes replaces terminating Pods, helping you avoid these issues.
36+
37+
## How Pod Replacement Policy works
38+
39+
The feature introduces a new Job-level field, `podReplacementPolicy`, which controls when Kubernetes replaces terminating Pods.
40+
You can choose one of two policies:
41+
- TerminatingOrFailed (default): Replaces Pods as soon as they start terminating.
42+
- Failed: Replaces Pods only after they fully terminate and transition to the `Failed` phase.
43+
44+
Setting the policy to `Failed` ensures that a new Pod is only created after the previous one has completely terminated.
45+
46+
For Jobs with a Pod Failure Policy, the default `podReplacementPolicy` is `Failed`, and no other value is allowed.
47+
See [Pod Failure Policy](/docs/concepts/workloads/controllers/job/#pod-failure-policy) to learn more about Pod Failure Policies for Jobs.
48+
49+
You can check how many Pods are currently terminating by inspecting the Job’s `.status.terminating` field:
50+
51+
```sh
52+
kubectl get job myjob -o=jsonpath='{.status.terminating}'
53+
```
54+
55+
## Example
56+
57+
Here’s a simple Job spec that ensures Pods are replaced only after they terminate completely:
58+
59+
```yaml
60+
apiVersion: batch/v1
61+
kind: Job
62+
metadata:
63+
name: example-job
64+
spec:
65+
podReplacementPolicy: Failed
66+
template:
67+
spec:
68+
restartPolicy: Never
69+
containers:
70+
- name: worker
71+
image: your-image
72+
```
73+
74+
With this setting, Kubernetes won’t launch a replacement Pod while the previous Pod is still terminating.
75+
76+
## How can you learn more?
77+
78+
- Read the user-facing documentation for [Pod Replacement Policy](/docs/concepts/workloads/controllers/job/#pod-replacement-policy),
79+
[Backoff Limit per Index](/docs/concepts/workloads/controllers/job/#backoff-limit-per-index), and
80+
[Pod Failure Policy](/docs/concepts/workloads/controllers/job/#pod-failure-policy).
81+
- Read the KEPs for [Pod Replacement Policy](https://github.com/kubernetes/enhancements/tree/master/keps/sig-apps/3939-allow-replacement-when-fully-terminated),
82+
[Backoff Limit per Index](https://github.com/kubernetes/enhancements/tree/master/keps/sig-apps/3850-backoff-limits-per-index-for-indexed-jobs), and
83+
[Pod Failure Policy](https://github.com/kubernetes/enhancements/tree/master/keps/sig-apps/3329-retriable-and-non-retriable-failures).
84+
85+
86+
## Acknowledgments
87+
88+
As with any Kubernetes feature, multiple people contributed to getting this
89+
done, from testing and filing bugs to reviewing code.
90+
91+
As this feature moves to stable after 2 years, we would like to thank the following people:
92+
* [Kevin Hannon](https://github.com/kannon92) - for writing the KEP and the initial implementation.
93+
* [Michał Woźniak](https://github.com/mimowo) - for guidance, mentorship, and reviews.
94+
* [Aldo Culquicondor](https://github.com/alculquicondor) - for guidance, mentorship, and reviews.
95+
* [Maciej Szulik](https://github.com/soltysh) - for guidance, mentorship, and reviews.
96+
* [Dejan Zele Pejchev](https://github.com/dejanzele) - for taking over the feature and promoting it from Alpha through Beta to GA.
97+
98+
## Get involved
99+
100+
This work was sponsored by the Kubernetes
101+
[batch working group](https://github.com/kubernetes/community/tree/master/wg-batch)
102+
in close collaboration with the
103+
[SIG Apps](https://github.com/kubernetes/community/tree/master/sig-apps) community.
104+
105+
If you are interested in working on new features in the space we recommend
106+
subscribing to our [Slack](https://kubernetes.slack.com/messages/wg-batch)
107+
channel and attending the regular community meetings.

0 commit comments

Comments
 (0)