@@ -94,25 +94,8 @@ field based on the number of Pods that have the `Ready` condition.
94
94
95
95
### Risks and Mitigations
96
96
97
- During upgrades, a cluster can have apiservers with version skew, or the
98
- administrator might decide to do a rollback. This can cause:
99
-
100
- - Loss of the new API field value
101
-
102
- This is acceptable for the first release. The value is only informative: the
103
- kubernetes control plane doesn't use the value to influence behavior.
104
-
105
- - Repeated Job status updates.
106
-
107
- If one apiserver populates the value and another apiserver (running an older
108
- version) drops the field, the job controller might try to update the field
109
- again, potentially causing subsequent updates. This can be mitigated by only
110
- updating the field if the job controller is already updating the status due
111
- to changes in other fields. This check is only necessary in the first release.
112
-
113
- For both problems, in the first release, the API documentation, can state that
114
- the field can remain at zero indefinitely even if pods have been Ready for a long
115
- time.
97
+ An increase in Job status updates. This is capped by the number of times Pods
98
+ reach the ready State, usually once in their lifetime.
116
99
117
100
## Design Details
118
101
@@ -134,43 +117,24 @@ The Job controller already lists the Pods to populate the `active`, `succeeded`
134
117
and ` failed ` fields. To count ` ready ` pods, the job controller will filter the
135
118
pods that have the ` Ready ` condition.
136
119
137
- In a first release, the Job controller counts the ready pods and updates the
138
- field if and only if:
139
- - The job controller is already updating other Job status fields.
140
- - The ` JobReadyPods ` feature gate is enabled.
141
-
142
- In the second release, the Job controller updates the field unconditionally.
143
-
144
120
### Test Plan
145
121
146
122
- Unit and integration tests covering:
147
123
- Count of ready pods.
148
- - Not producing updates in the cases described in the design .
124
+ - Feature gate disablement .
149
125
- Verify passing existing E2E and conformance tests for Job.
150
126
151
127
### Graduation Criteria
152
128
153
129
#### Alpha
154
130
155
- This KEP proposes to skip this stage, for the following reasons:
156
- - The added calculation is trivial.
157
- - It is acceptable to report .status.ready as zero in the first release, as
158
- the value is only informative.
131
+ - Feature gate disabled by default.
132
+ - Unit and integration tests passing.
159
133
160
134
#### Beta
161
135
162
- - Ability to completely disable the feature, through a feature gate. The feature
163
- gate is enabled by default.
164
-
165
- In a first release:
166
-
167
- - The job controller only fills the field if there are other Job status updates.
168
- - Unit and integration tests.
169
-
170
- In a second release:
171
-
172
- - The job controller fills the field whenever the number of ready Pods changes.
173
- The feature can still be disabled through the feature gate.
136
+ - Feature gate enabled by default.
137
+ - Existing E2E and conformance tests passing.
174
138
175
139
#### GA
176
140
@@ -189,8 +153,8 @@ No changes required for existing cluster to use the enhancement.
189
153
190
154
The feature doesn't affect nodes.
191
155
192
- In the first release, a version skew between apiservers might cause the new field
193
- to remain at zero even if there are Pods ready.
156
+ In the first release, a version skew between apiservers might cause the new
157
+ field to remain at zero even if there are Pods ready.
194
158
195
159
## Production Readiness Review Questionnaire
196
160
@@ -200,7 +164,9 @@ to remain at zero even if there are Pods ready.
200
164
201
165
- [x] Feature gate (also fill in values in ` kep.yaml ` )
202
166
- Feature gate name: JobReadyPods
203
- - Components depending on the feature gate: kube-controller-manager
167
+ - Components depending on the feature gate:
168
+ - kube-controller-manager
169
+ - kube-apiserver
204
170
- [ ] Other
205
171
- Describe the mechanism:
206
172
- Will enabling / disabling the feature require downtime of the control
0 commit comments