Skip to content

Commit e6475c7

Browse files
mimowoandreyvelich
andauthored
Update "KEP-4368 Job API managed-by mechanism" targeting Beta in 1.32 (#4856)
* KEP-4368 Job managed-by field update for Beta * Review remarks and other updates * Update keps/sig-apps/4368-support-managed-by-for-batch-jobs/README.md Co-authored-by: Andrey Velichkevich <[email protected]> --------- Co-authored-by: Andrey Velichkevich <[email protected]>
1 parent 2c50c98 commit e6475c7

File tree

3 files changed

+66
-33
lines changed

3 files changed

+66
-33
lines changed
Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,5 @@
11
kep-number: 4368
22
alpha:
33
approver: "@wojtek-t"
4+
beta:
5+
approver: "@wojtek-t"

keps/sig-apps/4368-support-managed-by-for-batch-jobs/README.md

Lines changed: 62 additions & 32 deletions
Original file line numberDiff line numberDiff line change
@@ -49,6 +49,7 @@
4949
- [Implementation History](#implementation-history)
5050
- [Drawbacks](#drawbacks)
5151
- [Alternatives](#alternatives)
52+
- [Skip reconciliation in the event handler](#skip-reconciliation-in-the-event-handler)
5253
- [Reserved controller name value](#reserved-controller-name-value)
5354
- [Defaulting of the for newly created jobs](#defaulting-of-the-for-newly-created-jobs)
5455
- [Alternative names for field](#alternative-names-for-field)
@@ -99,8 +100,8 @@ Items marked with (R) are required *prior to targeting to a milestone / release*
99100
- [x] (R) Production readiness review completed
100101
- [x] (R) Production readiness review approved
101102
- [x] "Implementation History" section is up-to-date for milestone
102-
- [ ] User-facing documentation has been created in [kubernetes/website], for publication to [kubernetes.io]
103-
- [ ] Supporting documentation—e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes
103+
- [x] User-facing documentation has been created in [kubernetes/website], for publication to [kubernetes.io]
104+
- [x] Supporting documentation—e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes
104105

105106
<!--
106107
**Note:** This checklist is iterative and should be reviewed and updated every time this enhancement is being considered for a milestone.
@@ -207,8 +208,15 @@ a blocker.
207208

208209
It would also complicate debuggability of the feature.
209210

210-
We decide to keep the field immutable, at least for [Alpha](#alpha), we will
211-
re-evaluate the decision for [Beta](#beta).
211+
Also, we already observe the adoption of the mechanism in other batch projects,
212+
such as:
213+
- [JobSet](https://github.com/kubernetes-sigs/jobset/blob/665bc42e0a33a0ebdf7fc09b2b6ae5d88eb7d33c/api/jobset/v1alpha2/jobset_types.go#L121-L133)
214+
- [Kubeflow Training Operator](https://github.com/kubeflow/training-operator/blob/da11d1116c29322c481d0b8f174df8d6f05004aa/pkg/apis/kubeflow.org/v1/common_types.go#L238-L239).
215+
216+
These projects for now follow the decision taken in the core k8s to make the
217+
field immutable to avoid complication of the support for mutability.
218+
219+
All together, we decide to keep the field immutable.
212220

213221
#### Use for MultiKueue
214222

@@ -358,17 +366,15 @@ We skip synchronization of the Jobs with the "managedBy" field, if it has any
358366
different value than `kubernetes.io/job-controller`. When the synchronization is skipped,
359367
the name of the controller managing the Job object is logged.
360368

361-
We leave the particular place at which the synchronization is skipped as
362-
implementation detail which can be determined during the implementation phase,
363-
however, two candidate places are:
364-
1. inside `syncJob` function
365-
2. inside `enqueueSyncJobInternal` function
369+
We skip the reconciliation inside the `syncJob` function
370+
(see [here](https://github.com/kubernetes/kubernetes/blob/15d08bf7c8813b0533dc147a03d9f42aae735ecd/pkg/controller/job/job_controller.go#L819-L822)).
366371

367-
Note that, if we skip inside `enqueueSyncJobInternal` we may save on some memory
368-
needed to needlessly enqueue the Job keys.
372+
We will re-evaluate for [GA](#ga) to also skip the reconciliation within the
373+
`enqueueSyncJobInternal` for optimal performance. See discussion in the
374+
[Skip reconciliation in the event handler](#skip-reconciliation-in-the-event-handler).
369375

370-
There is no validation for the values of the field beyond that of standard
371-
permitted field values.
376+
There is no validation for a value of the field beyond its format as described
377+
in the [API](#API) comment above.
372378

373379
#### Job status validation
374380

@@ -393,7 +399,7 @@ For that we plan to follow the approach described [below](#terminating-pods-and-
393399
which extend the scope of the interim `FailureTarget` and `SuccessCriteriaMet`
394400
conditions. We will also validate that the transition to `Failed` or `Complete`
395401
condition is preceded by adding the `FailureTarget` or `SuccessCriteriaMet`
396-
condition, respecively.
402+
condition, respectively.
397403

398404
Additionally, we are going to introduce a validation rule that the count of
399405
ready `status.ready` pods is lower or equal than the number of active `status.active`
@@ -480,21 +486,21 @@ The following scenarios related to [Terminating pods and terminal Job conditions
480486
##### Integration tests
481487

482488
The following scenarios are covered:
483-
- the Job controller reconciles jobs with the "managedBy" field equal to `kubernetes.io/job-controller`
484-
- the Job controller reconciles jobs without the "managedBy" field
489+
- the Job controller reconciles jobs with the "managedBy" field equal to `kubernetes.io/job-controller` ([link](https://github.com/kubernetes/kubernetes/blob/856475e5fffe3d99c71606d6024f5ed93e37eebc/test/integration/job/job_test.go#L2016))
490+
- the Job controller reconciles jobs without the "managedBy" field ([link](https://github.com/kubernetes/kubernetes/blob/856475e5fffe3d99c71606d6024f5ed93e37eebc/test/integration/job/job_test.go#L2000))
485491
- the Job controller does not reconcile a job with any other value of the "managedBy" field. In particular:
486-
- it does not reset the status for a Job with `.spec.suspend=false`,
487-
- it does not add the Suspended condition for a Job with `.spec.suspend=true`.
488-
- the Job controller reconciles jobs with custom "managedBy" field when the feature gate is disabled
492+
- it does not reset the status for a Job with `.spec.suspend=false` ([link](https://github.com/kubernetes/kubernetes/blob/856475e5fffe3d99c71606d6024f5ed93e37eebc/test/integration/job/job_test.go#L2044)),
493+
- it does not add the Suspended condition for a Job with `.spec.suspend=true` ([link](https://github.com/kubernetes/kubernetes/blob/856475e5fffe3d99c71606d6024f5ed93e37eebc/test/integration/job/job_test.go#L2059)).
494+
- the Job controller reconciles jobs with custom "managedBy" field when the feature gate is disabled ([link](https://github.com/kubernetes/kubernetes/blob/856475e5fffe3d99c71606d6024f5ed93e37eebc/test/integration/job/job_test.go#L2030))
489495
- the Job controller handles correctly re-enablement of the feature gate [link](https://github.com/kubernetes/kubernetes/blob/169a952720ebd75fcbcb4f3f5cc64e82fdd3ec45/test/integration/job/job_test.go#L1691)
490-
- the `job_by_external_controller_total` metric is incremented when a new Job with custom "managedBy" is created
491-
- the `job_by_external_controller_total` metric is not incremented for a new Job without "managedBy" or with default value
492-
- the `job_by_external_controller_total` metric is not incremented for Job updates (regardless of the "managedBy")
496+
- the `job_by_external_controller_total` metric is incremented when a new Job with custom "managedBy" is created ([link](https://github.com/kubernetes/kubernetes/blob/856475e5fffe3d99c71606d6024f5ed93e37eebc/test/integration/job/job_test.go#L2044-L2058))
497+
- the `job_by_external_controller_total` metric is not incremented for a new Job without "managedBy" or with default value ([link](https://github.com/kubernetes/kubernetes/blob/856475e5fffe3d99c71606d6024f5ed93e37eebc/test/integration/job/job_test.go#L2000-L2029))
498+
- the `job_by_external_controller_total` metric is not incremented for Job updates (regardless of the "managedBy") (tested indirectly as [here](https://github.com/kubernetes/kubernetes/blob/856475e5fffe3d99c71606d6024f5ed93e37eebc/test/integration/job/job_test.go#L2000-L2029) the Job controller updates the Job status)
493499

494500
The following scenarios related to [Terminating pods and terminal Job conditions](#terminating-pods-and-terminal-job-conditions) are covered:
495-
- `Failed` or `Complete` conditions are not added while there are still terminating pods
496-
- `FailureTarget` is added when backoffLimitCount is exceeded, or activeDeadlineSeconds timeout is exceeded
497-
- `SuccessCriteriaMet` is added when the `completions` are satisfied
501+
- `Failed` or `Complete` conditions are not added while there are still terminating pods ([link](https://github.com/kubernetes/kubernetes/blob/856475e5fffe3d99c71606d6024f5ed93e37eebc/test/integration/job/job_test.go#L1183))
502+
- `FailureTarget` is added when backoffLimitCount is exceeded, or activeDeadlineSeconds timeout is exceeded ([link](https://github.com/kubernetes/kubernetes/blob/master/test/integration/job/job_test.go#L1253))
503+
- `SuccessCriteriaMet` is added when the `completions` are satisfied ([link](https://github.com/kubernetes/kubernetes/blob/856475e5fffe3d99c71606d6024f5ed93e37eebc/test/integration/job/job_test.go#L1355))
498504

499505
During the implementation more scenarios might be covered.
500506

@@ -541,8 +547,8 @@ Second Alpha (1.31):
541547

542548
- Address reviews and bug reports from Beta users
543549
- Re-evaluate the ideas of improving debuggability (like [extended `kubectl`](#debuggability), [dedicated condition](#condition-to-indicated-job-is-skipped), or [events](#event-indicating-the-job-is-skipped))
544-
- Re-evaluate the support for mutability of the field
545-
- Asses the fragmentation of the ecosystem. Look for other implementations of a job controller and asses their conformance with k8s.
550+
- Re-evaluate the need to skip reconciliation in the event handlers to optimize performance
551+
- Assess the fragmentation of the ecosystem. Look for other implementations of a job controller and asses their conformance with k8s.
546552
- Lock the feature gate
547553

548554
#### Deprecation
@@ -775,10 +781,10 @@ Describe manual testing that was done and the outcomes.
775781
Longer term, we may want to require automated upgrade/rollback tests, but we
776782
are missing a bunch of machinery and tooling and can't do that now.
777783
-->
778-
The Upgrade->downgrade->upgrade testing will be done manually prior to release
779-
as Beta, with the following steps:
784+
The Upgrade->downgrade->upgrade was tested manually using the 1.31 release
785+
(Alpha), with the following steps:
780786

781-
1. Start the cluster with the `JobManagedBy` enabled for api server and control-plane.
787+
1. Start the cluster with the `JobManagedBy` enabled for kube-apiserver and kube-controller-manager.
782788

783789
Then, create two-long running Jobs:
784790
- `job-managed` with custom value of the "managedBy" field
@@ -788,13 +794,13 @@ Then, verify that:
788794
- the `job-managed` does not get status updates from built-in controller. Update the status manually and observe it is not reset by the built-in controller.
789795
- the `job-regular` starts making progress (creates pods and updates the status accordingly by the built-in controller)
790796

791-
2. Simulate downgrade by disabling the feature for api server and control-plane.
797+
2. Simulate downgrade by disabling the feature for kube-apiserver and kube-controller-manager.
792798

793799
Then, verify that:
794800
- the `job-managed` starts to make progress, the status is reset, and updated to some new values
795801
- the `job-regular` continues making progress
796802

797-
3. Simulate upgrade by re-enabling the feature for api server and control-plane.
803+
3. Simulate upgrade by re-enabling the feature for kube-apiserver and kube-controller-manager.
798804

799805
Then, verify that:
800806
- the `job-managed` stops getting status updates from the built-in controller. Update the status manually and observe it is not reset by the built-in controller.
@@ -1080,6 +1086,10 @@ N/A.
10801086
- 2024-03-08 - Merged [Follow up fix to the job status update test](https://github.com/kubernetes/kubernetes/pull/123815)
10811087
- 2024-03-11 - Merged [Adjust the Job field API comments and validation to the current state](https://github.com/kubernetes/kubernetes/pull/123792)
10821088
- 2024-05-16 - Merged [Fix the comment for the Job managedBy field](https://github.com/kubernetes/kubernetes/pull/124793)
1089+
- 2024-06-11 - Merged [Count terminating pods when deleting active pods for failed jobs](https://github.com/kubernetes/kubernetes/pull/1251753)
1090+
- 2024-06-21 - Merged [Update the count of ready pods when deleting pods](https://github.com/kubernetes/kubernetes/pull/125546)
1091+
- 2024-07-12 - Merged [Delay setting terminal Job conditions until all pods are terminal](https://github.com/kubernetes/kubernetes/pull/125510)
1092+
- 2024-07-30 - Merged [Update the docs for JobManagedBy and JobPodReplacementPolicy related to pod termination](https://github.com/kubernetes/website/pull/46808)
10831093

10841094
<!--
10851095
Major milestones in the lifecycle of a KEP should be tracked in this section.
@@ -1100,6 +1110,26 @@ Why should this KEP _not_ be implemented?
11001110

11011111
## Alternatives
11021112

1113+
### Skip reconciliation in the event handler
1114+
1115+
We discussed to skip the reconciliation only within the
1116+
[`enqueueSyncJobInternal`](https://github.com/kubernetes/kubernetes/blob/15d08bf7c8813b0533dc147a03d9f42aae735ecd/pkg/controller/job/job_controller.go#L575).
1117+
1118+
However, it was noted that it would cause race conditions when the Job with the
1119+
same name and namespace is re-created, but with the `managedBy` field. The
1120+
race condition was reproduced by the
1121+
[TestManagedBy_RecreatedJob](https://github.com/kubernetes/kubernetes/blob/15d08bf7c8813b0533dc147a03d9f42aae735ecd/test/integration/job/job_test.go#L2229)
1122+
integration test which demonstrated the issue with such an implementation.
1123+
1124+
Still, it is a potential improvement to skip the reconciliation inside
1125+
`syncJob` and skip queuing within the `enqueueSyncJobInternal` function for
1126+
optimal performance (by saving memory and off-loading the reconciliation queue).
1127+
1128+
**Reasons for discarding/deferring**
1129+
1130+
Potentially a premature optimization which would complicate the code. We will
1131+
prefer to base the introduction of the optimization on users' feedback.
1132+
11031133
### Reserved controller name value
11041134

11051135
We could also use just `job-controller` for the reserved value of the field

keps/sig-apps/4368-support-managed-by-for-batch-jobs/kep.yaml

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,7 @@ see-also:
1818
- "https://github.com/kubernetes/enhancements/pull/4073" # closed PR
1919

2020
# The target maturity stage in the current dev cycle for this KEP.
21-
stage: alpha
21+
stage: beta
2222

2323
# The most recent milestone for which work toward delivery of this KEP has been
2424
# done. This can be the current (upcoming) milestone, if it is being actively
@@ -28,6 +28,7 @@ latest-milestone: "v1.31"
2828
# The milestone at which this feature was, or is targeted to be, at each stage.
2929
milestone:
3030
alpha: "v1.30"
31+
beta: "v1.32"
3132

3233
# The following PRR answers are required at alpha release
3334
# List the feature gate name and the components for which it must be enabled

0 commit comments

Comments
 (0)