@@ -278,6 +278,12 @@ Planned tests:
278
278
- Pod failed due to unhealthy device, earlier than device plugin detected it. Pod status is still updated.
279
279
- Pod is in crash loop backoff due to unhealthy device - pod status is updated to unhealthy
280
280
281
+ For alpha rollout and rollback:
282
+
283
+ - Fields dropped on update when feature gate is disabled
284
+ - Field is not populated after the feature gate is disabled
285
+ - Field is populated again when the feature gate is enabled
286
+
281
287
Test coverage will be listed once tests are implemented.
282
288
283
289
- <test >: <link to test coverage >
330
336
331
337
###### Can the feature be disabled once it has been enabled (i.e. can we roll back the enablement)?
332
338
333
- Yes, with no side effect except of missing the new field in pod status.
339
+ Yes, with no side effect except of missing the new field in pod status. Values written
340
+ while the feature was enabled will continue to have it and may be wiped on next update request.
341
+ They also may be ignored on reads.
342
+ Re-enablement of the feature will not guarantee to keep the values written before the
343
+ feature was disabled.
334
344
335
345
###### What happens if we reenable the feature if it was previously rolled back?
336
346
337
- The pod status will be updated again.
347
+ The pod status will be updated again. Consistency will not be guaranteed for fields written
348
+ before the last enablement.
338
349
339
350
###### Are there any tests for feature enablement/disablement?
340
351
341
- Nothing is planned .
352
+ Yes, see in e2e tests section .
342
353
343
354
### Rollout, Upgrade and Rollback Planning
344
355
348
359
349
360
###### What specific metrics should inform a rollback?
350
361
351
- N/A
362
+ API server error rate increase. ` apiserver_request_total ` filtered by ` code ` to be non ` 2xx ` .
363
+ API validation error is the most likely indication of an error.
364
+
365
+ Potential errors on kubelet would likely be exposed as error logs and events on Pods.
352
366
353
367
###### Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested?
354
368
378
392
379
393
###### Are there any missing metrics that would be useful to have to improve observability of this feature?
380
394
381
- N/A
395
+ There are a few error modes for this feature:
396
+ 1 . API issues accepting the new field - for example kubelet is writing the field in a format not acceptable by the API server
397
+ 2 . kubelet fails while populating this field
398
+
399
+ First error mode can be observer with the metric ` apiserver_request_total ` filtered by ` code ` to be non ` 2xx ` .
400
+
401
+ There is no good metric for the second error mode because it will not be clear what part of processing may fail.
402
+ The most likely indication of an error would be the increased number of error events on the Pod.
382
403
383
404
### Dependencies
384
405
0 commit comments