Skip to content

Commit 084d633

Browse files
authored
Merge pull request kubernetes#5168 from dejanzele/kep-3939/promote-to-ga
KEP-3939: Job Pod Replacement Policy; promote to GA
2 parents 2e2f530 + c4a1b9d commit 084d633

File tree

3 files changed

+32
-12
lines changed

3 files changed

+32
-12
lines changed
Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,7 @@
11
kep-number: 3939
2-
alpha:
2+
alpha:
33
approver: "@wojtek-t"
44
beta:
5+
approver: "@wojtek-t"
6+
stable:
57
approver: "@wojtek-t"

keps/sig-apps/3939-allow-replacement-when-fully-terminated/README.md

Lines changed: 24 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -545,6 +545,9 @@ The following scenarios related to [tracking the terminating pods](#tracking-the
545545
- `FailureTarget` is added when backoffLimitCount is exceeded, or activeDeadlineSeconds timeout is exceeded
546546
- `SuccessCriteriaMet` is added when the `completions` are satisfied
547547

548+
The `integration` tests are implemented in <https://github.com/kubernetes/kubernetes/blob/v1.31.0/test/integration/job/job_test.go>.
549+
Most relevant test is `TestJobPodReplacementPolicy`.
550+
548551
##### e2e tests
549552

550553
Generally the only tests that are useful for this feature are when `PodReplacementPolicy: Failed`.
@@ -568,6 +571,15 @@ An e2e test can verify that deletion will not trigger a new pod creation until t
568571

569572
If `podReplacementPolicy: TerminatingOrFailed` is specified we would test that pod creation happens closely after deletion.
570573

574+
The `e2e` tests are implemented in <https://github.com/kubernetes/kubernetes/blob/v1.31.0/test/e2e/apps/job.go>.
575+
576+
Test grid:
577+
578+
- [`gce`](https://testgrid.k8s.io/sig-apps#gce)
579+
```
580+
Kubernetes e2e suite.[It] [sig-apps] Job should recreate pods only after they have failed if pod replacement policy is set to Failed
581+
```
582+
571583
<!--
572584
This question should be filled when targeting a release.
573585
For Alpha, describe what tests will be added to ensure proper quality of the enhancement.
@@ -600,7 +612,7 @@ We expect no non-infra related flakes in the last month as a GA graduation crite
600612
- Address reviews and bug reports from Beta users
601613
- Allow Job API clients tracking the number of the terminating pods until all
602614
the resources are released (see [tracking the terminating pods](#tracking-the-terminating-pods)).
603-
Also, link provide links for the relevant integration tests in the KEP.
615+
Also, provide links for the relevant integration tests in the KEP.
604616
- Lock the `JobPodReplacementPolicy` feature-gate to true
605617

606618
#### Deprecation
@@ -966,7 +978,7 @@ In beta, we will add a new metric `job_pods_creation_total`.
966978

967979
In [Risks and Mitigations](#risks-and-mitigations) we discuss the interaction with [3329-retriable-and-non-retriable-failures](https://github.com/kubernetes/enhancements/blob/master/keps/sig-apps/3329-retriable-and-non-retriable-failures/README.md).
968980
We will have to guard against cases if `PodFailurePolicy` is off while this feature is on.
969-
`PodFailurePolicy` is in beta and is enabled by default but we should guard against cases where `PodDisruptionCondition` is turned off.
981+
`PodFailurePolicy` is in stable and is locked to `true` by default but we should guard against cases where `PodDisruptionCondition` is turned off.
970982

971983
#### Does this feature depend on any specific services running in the cluster?
972984

@@ -993,7 +1005,7 @@ No
9931005

9941006
#### Will enabling / using this feature result in increasing size or count of the existing API objects?
9951007

996-
For Job API, we are adding a enum field named `PodReplacementPolicy` which takes
1008+
For Job API, we are adding an enum field named `PodReplacementPolicy` which takes
9971009
either a `TerminatingOrFailed` or `Failed`
9981010

9991011
- API type(s): enum
@@ -1067,9 +1079,7 @@ There are no other failure modes.
10671079

10681080
#### What steps should be taken if SLOs are not being met to determine the problem?
10691081

1070-
One could disable this feature.
1071-
1072-
Or if one wants to keep the feature on and they could suspend the jobs that are using this feature.
1082+
If one wants to keep the feature on and they could suspend the jobs that are using this feature.
10731083
Setting `Suspend:True` in your JobSpec will halt the execution of all jobs.
10741084

10751085
## Implementation History
@@ -1078,6 +1088,14 @@ Setting `Suspend:True` in your JobSpec will halt the execution of all jobs.
10781088
- 2023-05-19: KEP Merged.
10791089
- 2023-07-16: Alpha PRs merged.
10801090
- 2023-09-29: KEP marked for beta promotion.
1091+
- 2023-10-24: Merged bugfix [Fix tracking of terminating Pods when nothing else changes](https://github.com/kubernetes/kubernetes/pull/121342)
1092+
- 2023-10-24: Merged adding a metric required for beta promotion [feat: add job_pods_creation_total metric](https://github.com/kubernetes/kubernetes/pull/121481)
1093+
- 2023-10-27: Merged [Switch feature flag to beta for pod replacement policy and add e2e test #121491](https://github.com/kubernetes/kubernetes/pull/121491)
1094+
- 2024-06-11: [v1.31] Merged [Count terminating pods when deleting active pods for failed jobs #125175](https://github.com/kubernetes/kubernetes/pull/125175)
1095+
- 2024-07-12: [v1.31] Merged [Delay setting terminal Job conditions until all pods are terminal #125510](https://github.com/kubernetes/kubernetes/pull/125510)
1096+
1097+
This feature was promoted to beta in v1.29, but important updates were implemented in v1.31.
1098+
For additional info, check the PRs linked above with the tag `[v1.31]`.
10811099

10821100
## Drawbacks
10831101

keps/sig-apps/3939-allow-replacement-when-fully-terminated/kep.yaml

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@ title: Allow Replacement of Pods in a Job when fully terminating
22
kep-number: 3939
33
authors:
44
- "@kannon92"
5-
- "@dejanzele"
5+
- "@dejanzele"
66
- "@alculquicondor"
77
owning-sig: sig-apps
88
participating-sigs:
@@ -19,18 +19,18 @@ see-also:
1919
- "/keps/sig-apps/3329-retriable-and-non-retriable-failures"
2020

2121
# The target maturity stage in the current dev cycle for this KEP.
22-
stage: beta
22+
stage: stable
2323

2424
# The most recent milestone for which work toward delivery of this KEP has been
2525
# done. This can be the current (upcoming) milestone, if it is being actively
2626
# worked on.
27-
latest-milestone: "v1.29"
27+
latest-milestone: "v1.33"
2828

2929
# The milestone at which this feature was, or is targeted to be, at each stage.
3030
milestone:
3131
alpha: "v1.28"
3232
beta: "v1.29"
33-
stable: ""
33+
stable: "v1.33"
3434

3535
# The following PRR answers are required at alpha release
3636
# List the feature gate name and the components for which it must be enabled
@@ -43,4 +43,4 @@ disable-supported: true
4343

4444
# The following PRR answers are required at beta release
4545
metrics:
46-
- job_pod_creation
46+
- job_pods_creation_total

0 commit comments

Comments
 (0)