|
1 | 1 | ---
|
2 | 2 | layout: blog
|
3 | 3 | title: "Kubernetes 1.28: Updates to the Job API"
|
4 |
| -date: 2023-08-20T10:00:00-08:00 |
5 |
| -slug: kubernetes-1-28-podreplacementpolicy-backoffconfigs |
| 4 | +date: 2023-07-27 |
| 5 | +slug: kubernetes-1-28-jobapi-update |
6 | 6 | ---
|
| 7 | + |
| 8 | +**Authors:** Kevin Hannon (G-Research), Michał Woźniak (Google) |
| 9 | + |
| 10 | +This blog discusses two features to improve Jobs for batch users: PodRecreationPolicy and JobBackoffLimitPerIndex. |
| 11 | + |
| 12 | +These are two features requested from users of the Job API to enhance a user's experience. |
| 13 | + |
| 14 | +## Pod Recreation Policy |
| 15 | + |
| 16 | +### What problem does this solve? |
| 17 | + |
| 18 | +Many common machine learning frameworks, such as Tensorflow and JAX, require unique pods per Index. Currently, if a pod enters a terminating state (due to preemption, eviction or other external factors), a replacement pod is created and immediately fail to start. |
| 19 | + |
| 20 | +Having a replacement Pod before the previous one fully terminates can also cause problems in clusters with scarce resources or with tight budgets. These resources can be difficult to obtain so pods can take a long time to find resources and they may only be able to find nodes once the existing pods have been terminated. If cluster autoscaler is enabled, the replacement Pods might produce undesired scale ups. |
| 21 | + |
| 22 | +On the other hand, if a replacement Pod is not immediately created, the Job status would show that the number of active pods doesn't match the desired parallelism. To provide better visibility, the job status can have a new field to track the number of Pods currently terminating. |
| 23 | + |
| 24 | +This new field can also be used by queueing controllers, such as Kueue, to track the number of terminating pods to calculate quotas. |
| 25 | + |
| 26 | +### How can I use it |
| 27 | + |
| 28 | +This is an alpha feature, which means you have to enable the `JobPodReplacementPolicy` |
| 29 | +[feature gate](/docs/reference/command-line-tools-reference/feature-gates/), |
| 30 | +with the command line argument `--feature-gates=JobPodReplacementPolicy=true` |
| 31 | +to the kube-apiserver. |
| 32 | + |
| 33 | +```yaml |
| 34 | +kind: Job |
| 35 | +metadata: |
| 36 | + name: new |
| 37 | + ... |
| 38 | +spec: |
| 39 | + podReplacementPolicy: Failed |
| 40 | + ... |
| 41 | +``` |
| 42 | + |
| 43 | +`podReplacementPolicy` can take either `Failed` or `TerminatingOrFailed`. In cases where `PodFailurePolicy` is set, you can only use `Failed`. |
| 44 | + |
| 45 | +This feature enables two components in the Job controller: Adds a `terminating` field to the status and adds a new API field called `podReplacementPolicy`. |
| 46 | + |
| 47 | +The Job controller uses `parallelism` field in the Job API to determine the number of pods that it is expects to be active (not finished). If there is a mismatch of active pods and the pod has not finished, we would normally assume that the pod has failed and the Job controller would recreate the pod. In cases where `Failed` is specified, the Job controller will wait for the pod to be fully terminated (`DeletionTimeStamp != nil`). |
| 48 | + |
| 49 | +### How can I learn more? |
| 50 | + |
| 51 | +- Read the KEP: [PodReplacementPolicy](https://github.com/kubernetes/enhancements/tree/master/keps/sig-apps/3939-allow-replacement-when-fully-terminated) |
| 52 | + |
| 53 | +## JobBackoffLimitPerIndex |
| 54 | + |
| 55 | +### Getting Involved |
| 56 | + |
| 57 | +These features were sponsored under the domain of SIG Apps. Batch is actively being improved for Kubernetes users in the batch working group. |
| 58 | +Working groups are relatively short-lived initatives focused on specific goals. In the case of Batch, the goal is to improve/support batch users and enhance the Job API for common use cases. If that interests you, please join the working group either by subscriping to our [mailing list](https://groups.google.com/a/kubernetes.io/g/wg-batch) or on [Slack](https://kubernetes.slack.com/messages/wg-batch). |
| 59 | + |
| 60 | +### Acknowledgments |
| 61 | + |
| 62 | +As with any Kubernetes feature, multiple people contributed to getting this |
| 63 | +done, from testing and filing bugs to reviewing code. |
| 64 | + |
| 65 | +We would not have been able to achieve either of these features without Aldo Culquicondor (Google) providing excellent domain knowledge and expertise throughout the Kubernetes ecosystem. |
0 commit comments