35
35
36
36
Items marked with (R) are required * prior to targeting to a milestone / release* .
37
37
38
- - [ ] (R) Enhancement issue in release milestone, which links to KEP dir in [ kubernetes/enhancements] (not the initial KEP PR)
38
+ - [x ] (R) Enhancement issue in release milestone, which links to KEP dir in [ kubernetes/enhancements] (not the initial KEP PR)
39
39
- [x] (R) KEP approvers have approved the KEP status as ` implementable `
40
40
- [x] (R) Design details are appropriately documented
41
41
- [ ] (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input (including test refactors)
@@ -47,7 +47,7 @@ Items marked with (R) are required *prior to targeting to a milestone / release*
47
47
- [x] (R) Production readiness review completed
48
48
- [x] (R) Production readiness review approved
49
49
- [ ] "Implementation History" section is up-to-date for milestone
50
- - [ ] User-facing documentation has been created in [ kubernetes/website] , for publication to [ kubernetes.io]
50
+ - [x ] User-facing documentation has been created in [ kubernetes/website] , for publication to [ kubernetes.io]
51
51
- [ ] Supporting documentation—e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes
52
52
53
53
[ kubernetes.io ] : https://kubernetes.io/
@@ -95,9 +95,9 @@ field based on the number of Pods that have the `Ready` condition.
95
95
### Risks and Mitigations
96
96
97
97
- An increase in Job status updates. To mitigate this, the job controller holds
98
- the Pod updates that happen in X ms before syncing a Job. X will be determined
99
- from experiments on integration tests, but we expect it to be between 500ms
100
- and 1s .
98
+ the Pod updates that happen in X ms before syncing a Job.
99
+ From experiments using integration tests, X=500ms was found to be a reasonable
100
+ value .
101
101
102
102
## Design Details
103
103
@@ -107,7 +107,7 @@ field based on the number of Pods that have the `Ready` condition.
107
107
type JobStatus struct {
108
108
...
109
109
Active int32
110
- Ready int32 // new field
110
+ Ready * int32 // new field
111
111
Succeeded int32
112
112
Failed int32
113
113
}
@@ -131,12 +131,20 @@ pods that have the `Ready` condition.
131
131
#### Alpha
132
132
133
133
- Feature gate disabled by default.
134
- - Unit and integration tests passing.
134
+ - Unit and [ integration] tests passing.
135
+
136
+ [ integration ] : https://testgrid.k8s.io/conformance-all#Conformance%20-%20GCE%20-%20master&include-filter-by-regex=sig-apps&include-filter-by-regex=Job&exclude-filter-by-regex=CronJob
135
137
136
138
#### Beta
137
139
138
140
- Feature gate enabled by default.
139
- - Existing E2E and conformance tests passing.
141
+ - Existing [ E2E] and [ conformance] tests passing.
142
+ - Scalability tests for Jobs of varying sizes, up to 500 parallelism, that keep
143
+ track of metric ` job_sync_duration_seconds ` . There should be no significant
144
+ degradation after enabling the feature gate.
145
+
146
+ [ E2E ] : https://testgrid.k8s.io/sig-apps#gce&include-filter-by-regex=apps%5C%5D%20Job
147
+ [ Conformance ] : https://testgrid.k8s.io/conformance-all#Conformance%20-%20GCE%20-%20master&include-filter-by-regex=sig-apps&include-filter-by-regex=Job&exclude-filter-by-regex=CronJob
140
148
141
149
#### GA
142
150
@@ -191,7 +199,7 @@ The Job controller will start populating the field again.
191
199
192
200
###### Are there any tests for feature enablement/disablement?
193
201
194
- Yes, there will be tests at unit and integration level.
202
+ Yes, there are tests at unit and [ integration] level.
195
203
196
204
### Rollout, Upgrade and Rollback Planning
197
205
@@ -201,11 +209,21 @@ The field is only informative, it doesn't affect running workloads.
201
209
202
210
###### What specific metrics should inform a rollback?
203
211
204
- N/A
212
+ - An increase in ` job_sync_duration_seconds ` .
213
+ - A reduction in ` job_sync_num ` .
205
214
206
215
###### Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested?
207
216
208
- N/A
217
+ A manual test will be performed, as follows:
218
+
219
+ 1 . Create a cluster in 1.23.
220
+ 1 . Upgrade to 1.24.
221
+ 1 . Create long running Job A, ensure that the ready field is populated.
222
+ 1 . Downgrade to 1.23.
223
+ 1 . Verify that ready field in Job A is not lost, but also not updated.
224
+ 1 . Create long running Job B, ensure that ready field is not populated.
225
+ 1 . Upgrade to 1.24.
226
+ 1 . Verify that Job A and B ready field is tracked again.
209
227
210
228
###### Is the rollout accompanied by any deprecations and/or removals of features, APIs, fields of API types, flags, etc.?
211
229
@@ -301,7 +319,8 @@ No change from existing behavior of the Job controller.
301
319
302
320
## Implementation History
303
321
304
- - 2021-08-19: Proposed KEP starting in beta status.
322
+ - 2021-08-19: Proposed KEP starting in alpha status, including full PRR questionnaire.
323
+ - 2022-01-05: Proposed graduation to beta.
305
324
306
325
## Drawbacks
307
326
0 commit comments