@@ -208,12 +208,14 @@ Additional tests will be added to this file to cover the garbage collection e2e.
208
208
209
209
- Configuration field added to the Kubelet (disabled by default)
210
210
- Feature supported by Kubelet Image Manager
211
- - Unit tests and e2e tests added
211
+ - Unit tests
212
212
- Add a metric ` kubelet_image_garbage_collected_total ` which tracks the number of images the kubelet is GC'ing through any mechanism.
213
213
214
214
#### Beta
215
215
216
- - Gather feedback from users
216
+ - Add e2e tests
217
+ - Document ` kubelet_image_garbage_collected_total ` (a step missed in alpha)
218
+ - Add "reason" field to ` kubelet_image_garbage_collected_total ` to allow distinguishing between GC reasons (space based or time based).
217
219
218
220
#### GA
219
221
@@ -276,8 +278,8 @@ removed, so no running workloads can be affected.
276
278
277
279
###### What specific metrics should inform a rollback?
278
280
279
- - ` kubelet_image_garbage_collected_total ` metric drastically (100x) increasing, indicating thrashing of the GC manager and
280
- images being pulled.
281
+ - ` kubelet_image_garbage_collected_total ` metric drastically (100x) increasing, with the "reason" field being "age",
282
+ indicating thrashing of the GC manager and images being pulled.
281
283
282
284
###### Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested?
283
285
292
294
###### How can an operator determine if the feature is in use by workloads?
293
295
294
296
- Verify the Kubelet Configuration with the Kubelet's configz endpoint
295
- - Monitor the ` kubelet_image_garbage_collected_total ` , and expect a slight increase.
297
+ - Monitor the ` kubelet_image_garbage_collected_total ` , and expect some images are removed for reason "age"
296
298
297
299
###### How can someone using this feature know that it is working for their instance?
298
300
302
304
###### What are the reasonable SLOs (Service Level Objectives) for the enhancement?
303
305
304
306
- The eventual default value should increase the average ` kubelet_image_garbage_collected_total ` by no more than 10x
305
- - TODO: On what clusters?
306
307
307
308
###### What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service?
308
309
370
371
371
372
372
373
2023-09-18: KEP opened, targeted at Alpha
374
+ 2024-01-22: KEP updated to Beta
373
375
374
376
## Drawbacks
375
377
0 commit comments