|
| 1 | +--- |
| 2 | +layout: blog |
| 3 | +title: "Kubernetes 1.33: Job's SuccessPolicy Goes GA" |
| 4 | +date: 2025-04-23 |
| 5 | +draft: true |
| 6 | +slug: kubernetes-1-33-jobs-success-policy-goes-ga |
| 7 | +authors: > |
| 8 | + [Yuki Iwai](https://github.com/tenzen-y) (CyberAgent, Inc) |
| 9 | +--- |
| 10 | + |
| 11 | +On behalf of the Kubernetes project, I'm pleased to announce that Job _success policy_ has graduated to General Availability (GA) as part of the v1.33 release. |
| 12 | + |
| 13 | +## About Job's Success Policy |
| 14 | + |
| 15 | +In batch workloads, you might want to use leader-follower patterns like [MPI](https://en.wikipedia.org/wiki/Message_Passing_Interface), |
| 16 | +in which the leader controls the execution, including the followers' lifecycle. |
| 17 | + |
| 18 | +In this case, you might want to mark it as succeeded |
| 19 | +even if some of the indexes failed. Unfortunately, a leader-follower Kubernetes Job that didn't use a success policy, in most cases, would have to require **all** Pods to finish successfully |
| 20 | +for that Job to reach an overall succeeded state. |
| 21 | + |
| 22 | +For Kubernetes Jobs, the API allows you to specify the early exit criteria using the `.spec.successPolicy` |
| 23 | +field (you can only use the `.spec.successPolicy` field for an [indexed Job](/docs/concept/workloads/controllers/job/#completion-mode)). |
| 24 | +Which describes a set of rules either using a list of succeeded indexes for a job, or defining a minimal required size of succeeded indexes. |
| 25 | + |
| 26 | +This newly stable field is especially valuable for scientific simulation, AI/ML and High-Performance Computing (HPC) batch workloads. |
| 27 | +Users in these areas often run numerous experiments and may only need a specific number to complete successfully, rather than requiring all of them to succeed. |
| 28 | +In this case, the leader index failure is the only relevant Job exit criteria, and the outcomes for individual follower Pods are handled |
| 29 | +only indirectly via the status of the leader index. |
| 30 | +Moreover, followers do not know when they can terminate themselves. |
| 31 | + |
| 32 | +After Job meets any __Success Policy__, the Job is marked as succeeded, and all Pods are terminated including the running ones. |
| 33 | + |
| 34 | +## How it works |
| 35 | + |
| 36 | +The following excerpt from a Job manifest, using `.successPolicy.rules[0].succeededCount`, shows an example of |
| 37 | +using a custom success policy: |
| 38 | + |
| 39 | +```yaml |
| 40 | + parallelism: 10 |
| 41 | + completions: 10 |
| 42 | + completionMode: Indexed |
| 43 | + successPolicy: |
| 44 | + rules: |
| 45 | + - succeededCount: 1 |
| 46 | +``` |
| 47 | +
|
| 48 | +Here, the Job is marked as succeeded when one index succeeded regardless of its number. |
| 49 | +Additionally, you can constrain index numbers against `succeededCount` in `.successPolicy.rules[0].succeededCount` |
| 50 | +as shown below: |
| 51 | + |
| 52 | +```yaml |
| 53 | +parallelism: 10 |
| 54 | +completions: 10 |
| 55 | +completionMode: Indexed |
| 56 | +successPolicy: |
| 57 | + rules: |
| 58 | + - succeededIndexes: 0 # index of the leader Pod |
| 59 | + succeededCount: 1 |
| 60 | +``` |
| 61 | + |
| 62 | +This example shows that the Job will be marked as succeeded once a Pod with a specific index (Pod index 0) has succeeded. |
| 63 | + |
| 64 | +Once the Job either reaches one of the `successPolicy` rules, or achieves its `Complete` criteria based on `.spec.completions`, |
| 65 | +the Job controller within kube-controller-manager adds the `SuccessCriteriaMet` condition to the Job status. |
| 66 | +After that, the job-controller initiates cleanup and termination of Pods for Jobs with `SuccessCriteriaMet` condition. |
| 67 | +Eventually, Jobs obtain `Complete` condition when the job-controller finished cleanup and termination. |
| 68 | + |
| 69 | +## Learn more |
| 70 | + |
| 71 | +- Read the documentation for |
| 72 | + [success policy](/docs/concepts/workloads/controllers/job/#success-policy). |
| 73 | +- Read the KEP for the [Job success/completion policy](https://github.com/kubernetes/enhancements/tree/master/keps/sig-apps/3998-job-success-completion-policy) |
| 74 | + |
| 75 | +## Get involved |
| 76 | + |
| 77 | +This work was led by the Kubernetes |
| 78 | +[batch working group](https://github.com/kubernetes/community/tree/master/wg-batch) |
| 79 | +in close collaboration with the |
| 80 | +[SIG Apps](https://github.com/kubernetes/community/tree/master/sig-apps) community. |
| 81 | + |
| 82 | +If you are interested in working on new features in the space I recommend |
| 83 | +subscribing to our [Slack](https://kubernetes.slack.com/messages/wg-batch) |
| 84 | +channel and attending the regular community meetings. |
0 commit comments