You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
By default, when a pod enters a terminating state (e.g. due to preemption or
19
-
eviction), a replacement pod is created immediately, and both pods are running
20
-
at the same time.
21
+
eviction), Kubernetes immediately creates a replacement Pod. Therefore, both Pods are running
22
+
at the same time. In API terms, a pod is considered terminating when it has a
23
+
`deletionTimestamp` and it has a phase `Pending` or `Running`.
21
24
22
-
This is problematic for some popular machine learning frameworks, such as
23
-
TensorFlow and [JAX](https://jax.readthedocs.io/en/latest/), which require at most one pod running at the same time,
25
+
The scenario when two Pods are running at a given time is problematic for
26
+
some popular machine learning frameworks, such as
27
+
TensorFlow and [JAX](https://jax.readthedocs.io/en/latest/), which require at most one Pod running at the same time,
24
28
for a given index (see more details in the [issue](https://github.com/kubernetes/kubernetes/issues/115844)).
25
29
26
30
Creating the replacement Pod before the previous one fully terminates can also
27
-
cause problems in clusters with scarce resources or with tight budgets. These
28
-
resources can be difficult to obtain so pods can take a long time to find
29
-
resources and they may only be able to find nodes until the existing pods are
30
-
fully terminated. Further, if cluster autoscaler is enabled, the replacement
31
-
Pods might produce undesired scale ups.
31
+
cause problems in clusters with scarce resources or with tight budgets, such as:
32
+
* cluster resources can be difficult to obtain for Pods pending to be scheduled,
33
+
as Kubernetes might take a long time to find available nodes until the existing
34
+
Pods are fully terminated.
35
+
* if cluster autoscaler is enabled, the replacement Pods might produce undesired
36
+
scale ups.
32
37
33
-
### How can I use it
38
+
### How can you use it? {#pod-replacement-policy-how-to-use}
34
39
35
-
This is an alpha feature, which you can enable by enabling the`JobPodReplacementPolicy`
40
+
This is an alpha feature, which you can enable by turning on`JobPodReplacementPolicy`
36
41
[feature gate](/docs/reference/command-line-tools-reference/feature-gates/) in
37
42
your cluster.
38
43
39
-
Once the feature is enabled you can use it by creating a new Job, which specifies
44
+
Once the feature is enabled in your cluster, you can use it by creating a new Job that specifies a
40
45
`podReplacementPolicy` field as shown here:
41
46
42
47
```yaml
@@ -49,6 +54,9 @@ spec:
49
54
...
50
55
```
51
56
57
+
In that Job, the Pods would only be replaced once they reached the `Failed` phase,
58
+
and not when they are terminating.
59
+
52
60
Additionally, you can inspect the `.status.terminating` field of a Job. The value
53
61
of the field is the number of Pods owned by the Job that are currently terminating.
54
62
@@ -64,50 +72,49 @@ status:
64
72
```
65
73
66
74
This can be particularly useful for external queueing controllers, such as
67
-
[Kueue](https://github.com/kubernetes-sigs/kueue), that would calculate the
68
-
quota and suspend the start of a new Job until the resources are reclaimed from
75
+
[Kueue](https://github.com/kubernetes-sigs/kueue), that tracks quota
76
+
from running Pods of a Job until the resources are reclaimed from
69
77
the currently terminating Job.
70
78
71
-
### How can I learn more?
72
-
73
-
- Read the KEP: [PodReplacementPolicy](https://github.com/kubernetes/enhancements/tree/master/keps/sig-apps/3939-allow-replacement-when-fully-terminated)
74
-
75
-
## Job Backoff Limit per Index
79
+
Note that the `podReplacementPolicy: Failed` is the default when using a custom
- Read the KEPs for [Pod Replacement Policy](https://github.com/kubernetes/enhancements/tree/master/keps/sig-apps/3939-allow-replacement-when-fully-terminated),
201
+
[Backoff limit per index](https://github.com/kubernetes/enhancements/tree/master/keps/sig-apps/3850-backoff-limits-per-index-for-indexed-jobs), and
0 commit comments