You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@@ -234,13 +239,17 @@ tried this feature in Alpha, we would have time to fix issues.
234
239
235
240
### Upgrades/Downgrades
236
241
237
-
- Upgrades
238
-
When upgrading from a release without this feature, to a release with maxUnavailable, we will set maxUnavailable to 1. This would give users the same default
239
-
behavior they have to come to expect of in previous releases
240
-
- Downgrades
241
-
When downgrading from a release with this feature, to a release without maxUnavailable, there are two cases
242
-
-- if maxUnavailable is greater than 1 -- in this case user can see unexpected behavior(Find out what is the recommendation here(Warning, disable upgrade, drop field, etc? )
243
-
-- if maxUnavailable is less than equal to 1 -- in this case user wont see any difference in behavior
242
+
We will default to 1 for maxUnavailable field in StatefulSet for backward compatibility
243
+
244
+
Downgrades
245
+
246
+
When downgrading from a release with this feature, to a release without maxUnavailable, there are two cases
247
+
- If maxUnavailable is greater than 1, there are two more cases:-
248
+
- If you're rolling back to a release that doesn't have this field - then there is even no way to discover it
249
+
- If you're just disabling the feature (either together with downgrade to a release that has a field or without downgrade),the field should remain set
250
+
(unless someone will explicitly delete it later), but controller should ignore its behavior (and there shouldn't be a way to set it if the feature gate
251
+
is switched off).
252
+
- If maxUnavailable is less than equal to 1 -- in this case user wont see any difference in behavior
244
253
245
254
### Tests
246
255
@@ -254,11 +263,126 @@ tried this feature in Alpha, we would have time to fix issues.
254
263
- maxUnavailable greater than 1 with partition and staged pods greater than maxUnavailable
255
264
- maxUnavailable greater than 1 with partition and maxUnavailable greater than replicas
256
265
266
+
## Test Plan
267
+
For `Alpha`, unit tests and e2e tests will be added to test functionality at both
268
+
with feature flag enabled and disabled. Defaults will be verified so that users
269
+
who donot set this flag are not surprised at all.
270
+
271
+
257
272
## Graduation Criteria
258
273
259
-
- Alpha: Initial support for maxUnavailable in StatefulSets added. Disabled by default.
260
-
- Beta: Enabled by default with default value of 1.
274
+
- Alpha: Initial support for maxUnavailable in StatefulSets added. Disabled by default with default value of 1.
275
+
- Beta: Enabled by default with default value of 1 with upgrade downgrade testedd at least manually.
276
+
277
+
278
+
## Production Readiness Review Questionnaire
279
+
280
+
### Feature Enablement and Rollback
281
+
282
+
###### How can this feature be enabled / disabled in a live cluster?
283
+
284
+
-[x] Feature gate (also fill in values in `kep.yaml`)
285
+
- Feature gate name: MaxUnavailableStatefulSet
286
+
- Components depending on the feature gate: kube-apiserver and kube-controller-manager
287
+
288
+
###### Does enabling the feature change any default behavior?
289
+
290
+
No, the default behavior remains the same.
291
+
292
+
###### Can the feature be disabled once it has been enabled (i.e. can we roll back the enablement)?
293
+
294
+
Yes this feature can be disabled. Once disabled, all existing StatefulSet will
295
+
revert to the old behavior where rolling update will proceed one pod at a time.
296
+
297
+
###### What happens if we reenable the feature if it was previously rolled back?
298
+
299
+
We will restore the desired behavior for StatefulSets for which the maxunavailable field wasn't deleted after
300
+
the feature gate was disabled.
301
+
302
+
###### Are there any tests for feature enablement/disablement?
303
+
yes, there are unit tests which make sure the field is correctly dropped
304
+
on feature enable and disabled
305
+
306
+
### Rollout, Upgrade and Rollback Planning
307
+
308
+
###### How can a rollout or rollback fail? Can it impact already running workloads?
309
+
310
+
A rollout or rollback of this feature can fail if there is a bug which causes the kube-apiserver or
311
+
the kube-controller-manager to start crashing when the feature flag is enabled.
312
+
313
+
314
+
Yes, it can impact already running workloads.
315
+
316
+
If a rolling update is in progress for a StatefulSet, while this feature is being enabled in kube-apiserver
317
+
and kube-controller-manager, the StatefulSet controller can run into corner cases where it will take longer
318
+
for the controller to converge. This will only happen if after enabling the feature, the customer also sets
319
+
maxUnavailable to a number greater than 1, but the invariants and the logic will ensure that there are never more than
320
+
maxUnavailable pods with the same identity and never more than maxUnavailable being deleted.
321
+
322
+
###### What specific metrics should inform a rollback?
323
+
TODO when we reach Beta
261
324
325
+
###### Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested?
326
+
Will be tested when graduating to Beta.
327
+
328
+
329
+
###### Is the rollout accompanied by any deprecations and/or removals of features, APIs, fields of API types, flags, etc.?
330
+
No
331
+
332
+
### Monitoring Requirements
333
+
334
+
###### How can an operator determine if the feature is in use by workloads?
335
+
If their StatefulSet rollingUpdate section has the field maxUnavailable specified with
336
+
a value different than 1.
337
+
The below command should show maxUnavailable value:
338
+
```
339
+
kubectl get statefulsets -o yaml | grep maxUnavailable
340
+
```
341
+
342
+
###### How can someone using this feature know that it is working for their instance?
343
+
TODO when we reach Beta
344
+
345
+
###### What are the reasonable SLOs (Service Level Objectives) for the enhancement?
346
+
347
+
###### What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service?
348
+
349
+
###### Are there any missing metrics that would be useful to have to improve observability of this feature?
350
+
351
+
### Dependencies
352
+
353
+
###### Does this feature depend on any specific services running in the cluster?
354
+
NA
355
+
356
+
### Scalability
357
+
358
+
###### Will enabling / using this feature result in any new API calls?
359
+
It doesnt make any extra API calls.
360
+
361
+
###### Will enabling / using this feature result in introducing new API types?
362
+
No
363
+
364
+
###### Will enabling / using this feature result in any new calls to the cloud provider?
365
+
No
366
+
367
+
368
+
###### Will enabling / using this feature result in increasing size or count of the existing API objects?
369
+
A struct gets added to every StatefulSet object which has three fields, one 32 bit integer and two fields of type string.
370
+
The struct in question is IntOrString.
371
+
372
+
###### Will enabling / using this feature result in increasing time taken by any operations covered by existing SLIs/SLOs?
373
+
No
374
+
375
+
###### Will enabling / using this feature result in non-negligible increase of resource usage (CPU, RAM, disk, IO, ...) in any components?
376
+
The controller-manager will see very negligible and almost un-notoceable increase in cpu usage.
377
+
378
+
### Troubleshooting
379
+
380
+
###### How does this feature react if the API server and/or etcd is unavailable?
381
+
The RollingUpdate will fail or will not be able to proceed if etcd or apiserver is unavailable and
382
+
hence this feature will also be not be able to be used.
0 commit comments