You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@@ -32,13 +39,17 @@ Items marked with (R) are required *prior to targeting to a milestone / release*
32
39
-[x] (R) Enhancement issue in release milestone, which links to KEP dir in [kubernetes/enhancements] (not the initial KEP PR)
33
40
-[x] (R) KEP approvers have approved the KEP status as `implementable`
34
41
-[x] (R) Design details are appropriately documented
35
-
-[x] (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input
36
-
-[x] (R) Graduation criteria is in place
42
+
-[ ] (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input (including test refactors)
43
+
-[ ] e2e Tests for all Beta API Operations (endpoints)
44
+
-[ ] (R) Ensure GA e2e tests meet requirements for [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md)
45
+
-[ ] (R) Minimum Two Week Window for GA e2e tests to prove flake free
46
+
-[ ] (R) Graduation criteria is in place
47
+
-[ ] (R) [all GA Endpoints](https://github.com/kubernetes/community/pull/1806) must be hit by [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md)
37
48
-[ ] (R) Production readiness review completed
38
-
-[ ] Production readiness review approved
49
+
-[ ](R) Production readiness review approved
39
50
-[ ] "Implementation History" section is up-to-date for milestone
40
51
-[ ] User-facing documentation has been created in [kubernetes/website], for publication to [kubernetes.io]
41
-
-[ ] Supporting documentatione.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes
52
+
-[ ] Supporting documentation—e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes
42
53
43
54
44
55
## Summary
@@ -122,7 +133,6 @@ API changes to Service:
122
133
Unit tests:
123
134
- unit tests for the ipvs and iptables rules
124
135
- unit tests for the validation
125
-
- unit tests for a new util in pkg/proxy
126
136
127
137
E2E tests:
128
138
- The default behavior for `ipMode` does not break any existing e2e tests
@@ -140,7 +150,8 @@ Adds new field `ipMode` to Service, which is used when `LoadBalancerIPMode` feat
140
150
141
151
### Upgrade / Downgrade Strategy
142
152
143
-
On upgrade, while the feature gate is disabled, nothing will change. Once the feature gate is enabled, all the previous LoadBalancer service will get an `ipMode` of `VIP`.
153
+
On upgrade, while the feature gate is disabled, nothing will change. Once the feature gate is enabled,
154
+
all the previous LoadBalancer service will get an `ipMode` of `VIP` by the defaulting function when we get them from kube-apiserver(xref https://github.com/kubernetes/kubernetes/pull/118895/files#r1248316868).
144
155
If `kube-proxy` was not yet upgraded: the field will simply be ignored.
145
156
If `kube-proxy` was upgraded, and the feature gate enabled, it will stil behave as before if the `ipMode` is `VIP`, and will behave accordingly if the `ipMode` is `Proxy`.
146
157
@@ -149,3 +160,284 @@ On downgrade, the feature gate will simply be disabled, and as long as `kube-pro
149
160
### Version Skew Strategy
150
161
151
162
Version skew from the control plane to `kube-proxy` should be trivial since `kube-proxy` will simply ignore the `ipMode` field.
163
+
164
+
## Production Readiness Review Questionnaire
165
+
166
+
### Feature Enablement and Rollback
167
+
168
+
###### How can this feature be enabled / disabled in a live cluster?
169
+
170
+
-[x] Feature gate (also fill in values in `kep.yaml`)
171
+
- Feature gate name: LoadBalancerIPMode
172
+
- Components depending on the feature gate: kube-proxy, kube-apiserver, cloud-controller-manager
173
+
174
+
###### Does enabling the feature change any default behavior?
175
+
176
+
No.
177
+
178
+
###### Can the feature be disabled once it has been enabled (i.e. can we roll back the enablement)?
179
+
180
+
Yes, by disabling the feature gate. Disabling it in kube-proxy is necessary and sufficient to have a user-visible effect.
181
+
182
+
###### What happens if we reenable the feature if it was previously rolled back?
183
+
184
+
It works. The forwarding rules for services which have the value of `ipMode` had been set to "Proxy" will be removed by kube-proxy.
185
+
186
+
###### Are there any tests for feature enablement/disablement?
187
+
188
+
Yes. It is tested by `TestUpdateServiceLoadBalancerStatus` in pkg/registry/core/service/storage/storage_test.go.
189
+
190
+
### Rollout, Upgrade and Rollback Planning
191
+
192
+
<!--
193
+
This section must be completed when targeting beta to a release.
194
+
-->
195
+
196
+
###### How can a rollout or rollback fail? Can it impact already running workloads?
197
+
198
+
<!--
199
+
Try to be as paranoid as possible - e.g., what if some components will restart
200
+
mid-rollout?
201
+
202
+
Be sure to consider highly-available clusters, where, for example,
203
+
feature flags will be enabled on some API servers and not others during the
204
+
rollout. Similarly, consider large clusters and how enablement/disablement
205
+
will rollout across nodes.
206
+
-->
207
+
208
+
###### What specific metrics should inform a rollback?
209
+
210
+
<!--
211
+
What signals should users be paying attention to when the feature is young
212
+
that might indicate a serious problem?
213
+
-->
214
+
215
+
###### Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested?
216
+
217
+
<!--
218
+
Describe manual testing that was done and the outcomes.
219
+
Longer term, we may want to require automated upgrade/rollback tests, but we
220
+
are missing a bunch of machinery and tooling and can't do that now.
221
+
-->
222
+
223
+
###### Is the rollout accompanied by any deprecations and/or removals of features, APIs, fields of API types, flags, etc.?
224
+
225
+
<!--
226
+
Even if applying deprecation policies, they may still surprise some users.
227
+
-->
228
+
229
+
### Monitoring Requirements
230
+
231
+
<!--
232
+
This section must be completed when targeting beta to a release.
233
+
234
+
For GA, this section is required: approvers should be able to confirm the
235
+
previous answers based on experience in the field.
236
+
-->
237
+
238
+
###### How can an operator determine if the feature is in use by workloads?
239
+
240
+
<!--
241
+
Ideally, this should be a metric. Operations against the Kubernetes API (e.g.,
242
+
checking if there are objects with field X set) may be a last resort. Avoid
243
+
logs or events for this purpose.
244
+
-->
245
+
246
+
###### How can someone using this feature know that it is working for their instance?
247
+
248
+
<!--
249
+
For instance, if this is a pod-related feature, it should be possible to determine if the feature is functioning properly
250
+
for each individual pod.
251
+
Pick one more of these and delete the rest.
252
+
Please describe all items visible to end users below with sufficient detail so that they can verify correct enablement
253
+
and operation of this feature.
254
+
Recall that end users cannot usually observe component logs or access metrics.
255
+
-->
256
+
257
+
-[ ] Events
258
+
- Event Reason:
259
+
-[ ] API .status
260
+
- Condition name:
261
+
- Other field:
262
+
-[ ] Other (treat as last resort)
263
+
- Details:
264
+
265
+
###### What are the reasonable SLOs (Service Level Objectives) for the enhancement?
266
+
267
+
<!--
268
+
This is your opportunity to define what "normal" quality of service looks like
269
+
for a feature.
270
+
271
+
It's impossible to provide comprehensive guidance, but at the very
272
+
high level (needs more precise definitions) those may be things like:
273
+
- per-day percentage of API calls finishing with 5XX errors <= 1%
274
+
- 99% percentile over day of absolute value from (job creation time minus expected
275
+
job creation time) for cron job <= 10%
276
+
- 99.9% of /health requests per day finish with 200 code
277
+
278
+
These goals will help you determine what you need to measure (SLIs) in the next
279
+
question.
280
+
-->
281
+
282
+
###### What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service?
283
+
284
+
<!--
285
+
Pick one more of these and delete the rest.
286
+
-->
287
+
288
+
-[ ] Metrics
289
+
- Metric name:
290
+
-[Optional] Aggregation method:
291
+
- Components exposing the metric:
292
+
-[ ] Other (treat as last resort)
293
+
- Details:
294
+
295
+
###### Are there any missing metrics that would be useful to have to improve observability of this feature?
296
+
297
+
<!--
298
+
Describe the metrics themselves and the reasons why they weren't added (e.g., cost,
299
+
implementation difficulties, etc.).
300
+
-->
301
+
302
+
### Dependencies
303
+
304
+
<!--
305
+
This section must be completed when targeting beta to a release.
306
+
-->
307
+
308
+
###### Does this feature depend on any specific services running in the cluster?
309
+
310
+
<!--
311
+
Think about both cluster-level services (e.g. metrics-server) as well
312
+
as node-level agents (e.g. specific version of CRI). Focus on external or
313
+
optional services that are needed. For example, if this feature depends on
314
+
a cloud provider API, or upon an external software-defined storage or network
315
+
control plane.
316
+
317
+
For each of these, fill in the following—thinking about running existing user workloads
318
+
and creating new ones, as well as about cluster-level services (e.g. DNS):
319
+
- [Dependency name]
320
+
- Usage description:
321
+
- Impact of its outage on the feature:
322
+
- Impact of its degraded performance or high-error rates on the feature:
323
+
-->
324
+
325
+
### Scalability
326
+
327
+
<!--
328
+
For alpha, this section is encouraged: reviewers should consider these questions
329
+
and attempt to answer them.
330
+
331
+
For beta, this section is required: reviewers must answer these questions.
332
+
333
+
For GA, this section is required: approvers should be able to confirm the
334
+
previous answers based on experience in the field.
335
+
-->
336
+
337
+
###### Will enabling / using this feature result in any new API calls?
0 commit comments