You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: keps/sig-apps/3939-allow-replacement-when-fully-terminated/README.md
+24-6Lines changed: 24 additions & 6 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -545,6 +545,9 @@ The following scenarios related to [tracking the terminating pods](#tracking-the
545
545
-`FailureTarget` is added when backoffLimitCount is exceeded, or activeDeadlineSeconds timeout is exceeded
546
546
-`SuccessCriteriaMet` is added when the `completions` are satisfied
547
547
548
+
The `integration` tests are implemented in <https://github.com/kubernetes/kubernetes/blob/v1.31.0/test/integration/job/job_test.go>.
549
+
Most relevant test is `TestJobPodReplacementPolicy`.
550
+
548
551
##### e2e tests
549
552
550
553
Generally the only tests that are useful for this feature are when `PodReplacementPolicy: Failed`.
@@ -568,6 +571,15 @@ An e2e test can verify that deletion will not trigger a new pod creation until t
568
571
569
572
If `podReplacementPolicy: TerminatingOrFailed` is specified we would test that pod creation happens closely after deletion.
570
573
574
+
The `e2e` tests are implemented in <https://github.com/kubernetes/kubernetes/blob/v1.31.0/test/e2e/apps/job.go>.
575
+
576
+
Test grid:
577
+
578
+
-[`gce`](https://testgrid.k8s.io/sig-apps#gce)
579
+
```
580
+
Kubernetes e2e suite.[It] [sig-apps] Job should recreate pods only after they have failed if pod replacement policy is set to Failed
581
+
```
582
+
571
583
<!--
572
584
This question should be filled when targeting a release.
573
585
For Alpha, describe what tests will be added to ensure proper quality of the enhancement.
@@ -600,7 +612,7 @@ We expect no non-infra related flakes in the last month as a GA graduation crite
600
612
- Address reviews and bug reports from Beta users
601
613
- Allow Job API clients tracking the number of the terminating pods until all
602
614
the resources are released (see [tracking the terminating pods](#tracking-the-terminating-pods)).
603
-
Also, link provide links for the relevant integration tests in the KEP.
615
+
Also, provide links for the relevant integration tests in the KEP.
604
616
- Lock the `JobPodReplacementPolicy` feature-gate to true
605
617
606
618
#### Deprecation
@@ -966,7 +978,7 @@ In beta, we will add a new metric `job_pods_creation_total`.
966
978
967
979
In [Risks and Mitigations](#risks-and-mitigations) we discuss the interaction with [3329-retriable-and-non-retriable-failures](https://github.com/kubernetes/enhancements/blob/master/keps/sig-apps/3329-retriable-and-non-retriable-failures/README.md).
968
980
We will have to guard against cases if `PodFailurePolicy` is off while this feature is on.
969
-
`PodFailurePolicy`is in beta and is enabled by default but we should guard against cases where `PodDisruptionCondition` is turned off.
981
+
`PodFailurePolicy`is in stable and is locked to `true` by default but we should guard against cases where `PodDisruptionCondition` is turned off.
970
982
971
983
#### Does this feature depend on any specific services running in the cluster?
972
984
@@ -993,7 +1005,7 @@ No
993
1005
994
1006
#### Will enabling / using this feature result in increasing size or count of the existing API objects?
995
1007
996
-
For Job API, we are adding a enum field named `PodReplacementPolicy` which takes
1008
+
For Job API, we are adding an enum field named `PodReplacementPolicy` which takes
997
1009
either a `TerminatingOrFailed` or `Failed`
998
1010
999
1011
- API type(s): enum
@@ -1067,9 +1079,7 @@ There are no other failure modes.
1067
1079
1068
1080
#### What steps should be taken if SLOs are not being met to determine the problem?
1069
1081
1070
-
One could disable this feature.
1071
-
1072
-
Or if one wants to keep the feature on and they could suspend the jobs that are using this feature.
1082
+
If one wants to keep the feature on and they could suspend the jobs that are using this feature.
1073
1083
Setting `Suspend:True` in your JobSpec will halt the execution of all jobs.
1074
1084
1075
1085
## Implementation History
@@ -1078,6 +1088,14 @@ Setting `Suspend:True` in your JobSpec will halt the execution of all jobs.
1078
1088
- 2023-05-19: KEP Merged.
1079
1089
- 2023-07-16: Alpha PRs merged.
1080
1090
- 2023-09-29: KEP marked for beta promotion.
1091
+
- 2023-10-24: Merged bugfix [Fix tracking of terminating Pods when nothing else changes](https://github.com/kubernetes/kubernetes/pull/121342)
1092
+
- 2023-10-24: Merged adding a metric required for beta promotion [feat: add job_pods_creation_total metric](https://github.com/kubernetes/kubernetes/pull/121481)
1093
+
- 2023-10-27: Merged [Switch feature flag to beta for pod replacement policy and add e2e test #121491](https://github.com/kubernetes/kubernetes/pull/121491)
1094
+
- 2024-06-11: [v1.31] Merged [Count terminating pods when deleting active pods for failed jobs #125175](https://github.com/kubernetes/kubernetes/pull/125175)
1095
+
- 2024-07-12: [v1.31] Merged [Delay setting terminal Job conditions until all pods are terminal #125510](https://github.com/kubernetes/kubernetes/pull/125510)
1096
+
1097
+
This feature was promoted to beta in v1.29, but important updates were implemented in v1.31.
1098
+
For additional info, check the PRs linked above with the tag `[v1.31]`.
0 commit comments