Skip to content

Commit a1fa456

Browse files
authored
Merge pull request kubernetes#4211 from haircommander/max-image-gc
KEP-4210: Add ImageGCMaximumAge KEP
2 parents ecc58b9 + 9cabce4 commit a1fa456

File tree

3 files changed

+434
-0
lines changed

3 files changed

+434
-0
lines changed
Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
kep-number: 4210
2+
alpha:
3+
approver: "@johnbelamaric"
Lines changed: 395 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,395 @@
1+
# KEP-4210: ImageMaximumGCAge in Kubelet
2+
3+
<!-- toc -->
4+
- [Release Signoff Checklist](#release-signoff-checklist)
5+
- [Summary](#summary)
6+
- [Motivation](#motivation)
7+
- [Goals](#goals)
8+
- [Non-Goals](#non-goals)
9+
- [Proposal](#proposal)
10+
- [User Stories (Optional)](#user-stories-optional)
11+
- [Story 1](#story-1)
12+
- [Notes/Constraints/Caveats (Optional)](#notesconstraintscaveats-optional)
13+
- [Risks and Mitigations](#risks-and-mitigations)
14+
- [Design Details](#design-details)
15+
- [Test Plan](#test-plan)
16+
- [Prerequisite testing updates](#prerequisite-testing-updates)
17+
- [Unit tests](#unit-tests)
18+
- [e2e tests](#e2e-tests)
19+
- [Graduation Criteria](#graduation-criteria)
20+
- [Alpha](#alpha)
21+
- [Beta](#beta)
22+
- [GA](#ga)
23+
- [Upgrade / Downgrade Strategy](#upgrade--downgrade-strategy)
24+
- [Version Skew Strategy](#version-skew-strategy)
25+
- [Production Readiness Review Questionnaire](#production-readiness-review-questionnaire)
26+
- [Feature Enablement and Rollback](#feature-enablement-and-rollback)
27+
- [Rollout, Upgrade and Rollback Planning](#rollout-upgrade-and-rollback-planning)
28+
- [Monitoring Requirements](#monitoring-requirements)
29+
- [Dependencies](#dependencies)
30+
- [Scalability](#scalability)
31+
- [Troubleshooting](#troubleshooting)
32+
- [Implementation History](#implementation-history)
33+
- [Drawbacks](#drawbacks)
34+
- [Alternatives](#alternatives)
35+
- [Infrastructure Needed (Optional)](#infrastructure-needed-optional)
36+
<!-- /toc -->
37+
38+
## Release Signoff Checklist
39+
40+
Items marked with (R) are required *prior to targeting to a milestone / release*.
41+
42+
- [x] (R) Enhancement issue in release milestone, which links to KEP dir in [kubernetes/enhancements] (not the initial KEP PR)
43+
- [x] (R) KEP approvers have approved the KEP status as `implementable`
44+
- [x] (R) Design details are appropriately documented
45+
- [x] (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input (including test refactors)
46+
- [x] e2e Tests for all Beta API Operations (endpoints)
47+
- [x] (R) Ensure GA e2e tests meet requirements for [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md)
48+
- [x] (R) Minimum Two Week Window for GA e2e tests to prove flake free
49+
- [x] (R) Graduation criteria is in place
50+
- [x] (R) [all GA Endpoints](https://github.com/kubernetes/community/pull/1806) must be hit by [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md)
51+
- [x] (R) Production readiness review completed
52+
- [x] (R) Production readiness review approved
53+
- [x] "Implementation History" section is up-to-date for milestone
54+
- [ ] User-facing documentation has been created in [kubernetes/website], for publication to [kubernetes.io]
55+
- [ ] Supporting documentation—e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes
56+
57+
<!--
58+
**Note:** This checklist is iterative and should be reviewed and updated every time this enhancement is being considered for a milestone.
59+
-->
60+
61+
[kubernetes.io]: https://kubernetes.io/
62+
[kubernetes/enhancements]: https://git.k8s.io/enhancements
63+
[kubernetes/kubernetes]: https://git.k8s.io/kubernetes
64+
[kubernetes/website]: https://git.k8s.io/website
65+
66+
## Summary
67+
68+
Add an option ImageMaximumGCAge, which allows an admin to specify a time after which unused images will be garbage collected
69+
by the Kubelet, regardless of disk usage, as well as an associated feature gate to toggle the behavior.
70+
71+
## Motivation
72+
73+
Currently, all image garbage collection the Kubelet is triggered by disk usage going over a threshold (ImageGCLowThresholdPercent).
74+
However, there are cases that additional conditions could be considered useful. One such condition is maximum age of an image.
75+
If an image is unused for a long time (the exact amount of time will be decided, but on the order of weeks is what comes to mind),
76+
then it is not likely to be used again.
77+
78+
One such condition that can be imagined are clusters with automatic upgrades. If a cluster goes through an upgrade process,
79+
it will likely have images cached from the old release (old kube-apiserver/etcd/etc). While these images would eventually get removed
80+
through the disk usage condition, they will needlessly occupy disk space before that.
81+
82+
### Goals
83+
84+
- Introduce an option to the Kubelet ImageMaximumGCAge and a feature gate ImageMaximumGCAge
85+
86+
### Non-Goals
87+
88+
- Introduce other conditions for image garbage collection to trigger.
89+
- A WG was put together in SIG-Node to collect other GC use cases. While some other cases were identified, all seemed to be covered by this (see Alternatives)
90+
91+
92+
## Proposal
93+
94+
Kubelet has three different configuration fields for image garbage collection:
95+
- ImageMinimumGCAge: the youngest an image can be to be qualified for garbage collection
96+
- ImageGCLowThresholdPercent: The lowest disk usage will be before garbage collection begins
97+
- ImageGCHighThresholdPercent: The highest disk usage will be before garbage collection runs each GC period.
98+
99+
Between each of these options, there is a common thread: image garbage collection only triggers when disk usage has reached a certain threshold.
100+
In other words, there are no alternative conditions the Kubelet will begin triggering collection. Luckily, this condition is the most important:
101+
the primary goal of garbage collection is to ensure images don't clutter the disk too much and cause it to fill up needlessly.
102+
However, having the Kubelet be purely reactive means that images will clutter the disk and cause it to fill up. While there aren't reported cases
103+
where this causes issues, it is inefficient with disk space and can cause the Kubelet to scramble to save disk space when the threshold is met.
104+
105+
An additional approach is to define a way for an admin to request images are cleaned up after they're unused for a certain period of time.
106+
This would reduce the frequency of the disk usage hitting the level, and provide an admin more flexibility in how garbage collection is defined.
107+
108+
The proposal of this KEP is to add an option to the KubeletConfiguration object that looks like:
109+
```
110+
// ImageMaximumGCAge is the maximum age an image can be unused before it is garbage collected.
111+
// The default of this field is 0, which disables it.
112+
// +optional
113+
ImageMaximumGCAge metav1.Duration
114+
```
115+
116+
To begin, this option will be set to 0, which will be interpreted as "disabled". In the future, a more reasonable default may be chosen.
117+
118+
This option will only be adhered to if the feature gate ImageMaximumGCAge is configured for the Kubelet.
119+
120+
### User Stories (Optional)
121+
122+
#### Story 1
123+
124+
- As a cluster admin, I would like my unused images to be garbage collected in a timely manner, and not occupy disk space forever.
125+
126+
### Notes/Constraints/Caveats (Optional)
127+
128+
<!--
129+
What are the caveats to the proposal?
130+
What are some important details that didn't come across above?
131+
Go in to as much detail as necessary here.
132+
This might be a good place to talk about core concepts and how they relate.
133+
-->
134+
135+
### Risks and Mitigations
136+
137+
- If set incorrectly the ImageMaximumGCAge option could cause unneeded image pulls. For instance, if a Cron job ran
138+
once a week, but the ImageMaximumGCAge was set to less than a week, that image would get pulled every week, causing needless
139+
traffic from the registry
140+
- Proper documentation on this is the best way to mitigate this risk.
141+
- Defining a good default for this value will be similarly tricky.
142+
- Reliability of image age
143+
- Good testing will mitigate/fix any errors
144+
- New, undiscovered races
145+
- If the max image gc age is set very low, will the kubelet race with itself and remove the image right after pulling it?
146+
- May need to define a minimum maximum gc age to prevent races like this.
147+
- Runtime misbehavior
148+
- It's possible the runtime won't GC the image and kubelet will begin thrashing on the image.
149+
- Runtime maintainers should ensure to avoid this situation
150+
151+
152+
## Design Details
153+
154+
Add an option to the Kubelet configuration:
155+
```
156+
// ImageMaximumGCAge is the maximum age an image can be unused before it is garbage collected.
157+
// The default of this field is 0, which disables it.
158+
// +optional
159+
ImageMaximumGCAge metav1.Duration
160+
```
161+
162+
This option will be wired down to the Kubelet's [image manager](https://github.com/kubernetes/kubernetes/blob/d5690f12b69a/pkg/kubelet/images/image_gc_manager.go),
163+
similarly to the other garbage collection fields.
164+
165+
The Kubelet's image manager already keeps track of the last time an image was used through the `lastUsed` field in the
166+
[imageRecord](https://github.com/kubernetes/kubernetes/blob/d5690f12b69a/pkg/kubelet/images/image_gc_manager.go#L153) structure.
167+
So a comparison can be made in the realImageGCManager's function
168+
[GarbageCollect](https://github.com/kubernetes/kubernetes/blob/d5690f12b69a/pkg/kubelet/images/image_gc_manager.go#L288) to garbage collect
169+
the images that are older than the specified image age.
170+
171+
Since the Kubelet does not own images, and can only request images be cleaned up, this cleaning should be considered "best effort".
172+
173+
Further, since Kubelet's GC runs periodically every [5 minutes](https://github.com/kubernetes/kubernetes/blob/d5690f12b69a/pkg/kubelet/kubelet.go#L194)
174+
the ImageMaximumGCAge may not be exactly precise. An image could be GC'd up to 5 minutes after it has aged out.
175+
176+
Finally, Kubelet restarts are a point that needs to be figured out. The easiest way to handle it would be waiting the full ImageGCMaximumAge for an image to be qualified for GC,
177+
but that would essentially disable the feature if the Kubelet restarts more frequently than ImageGCMaximumAge.
178+
179+
### Test Plan
180+
181+
[x] I/we understand the owners of the involved components may require updates to
182+
existing tests to make this code solid enough prior to committing the changes necessary
183+
to implement this enhancement.
184+
185+
##### Prerequisite testing updates
186+
187+
<!--
188+
Based on reviewers feedback describe what additional tests need to be added prior
189+
implementing this enhancement to ensure the enhancements have also solid foundations.
190+
-->
191+
192+
##### Unit tests
193+
194+
- `pkg/kubelet/images`: `2023-09-14` - `84.2`
195+
196+
Additional tests will be added to pkg/kubelet/images to unit test the new field and verify it works along with the other GC options.
197+
198+
##### e2e tests
199+
200+
- `test/e2e_node/garbage_collector_test.go`
201+
202+
Additional tests will be added to this file to cover the garbage collection e2e.
203+
204+
### Graduation Criteria
205+
206+
207+
#### Alpha
208+
209+
- Configuration field added to the Kubelet (disabled by default)
210+
- Feature supported by Kubelet Image Manager
211+
- Unit tests and e2e tests added
212+
- Add a metric `kubelet_image_garbage_collected_total` which tracks the number of images the kubelet is GC'ing through any mechanism.
213+
214+
#### Beta
215+
216+
- Gather feedback from users
217+
218+
#### GA
219+
220+
- Addition of conformance tests
221+
- Some examples of real-world usage
222+
- Allowing time for feedback
223+
224+
### Upgrade / Downgrade Strategy
225+
226+
This option is purely contained within the Kubelet, so the only concern is the flag is added to the configuration of the newer
227+
Kubelet and then downgraded.
228+
229+
There's nothing the Kubernetes community can do to prevent this, and admins should ensure their configuration fields will function with
230+
the processes they run.
231+
232+
### Version Skew Strategy
233+
234+
Version skew is not a worry assuming the internal Kubelet changes are synchronized with the configuration changes.
235+
236+
## Production Readiness Review Questionnaire
237+
238+
239+
### Feature Enablement and Rollback
240+
241+
###### How can this feature be enabled / disabled in a live cluster?
242+
243+
- [x] Feature gate (also fill in values in `kep.yaml`)
244+
- Feature gate name: ImageGCMaximumAge
245+
- Components depending on the feature gate: kubelet
246+
- [ ] Other
247+
- Describe the mechanism:
248+
- Will enabling / disabling the feature require downtime of the control
249+
plane?
250+
- Will enabling / disabling the feature require downtime or reprovisioning
251+
of a node?
252+
253+
###### Does enabling the feature change any default behavior?
254+
255+
No
256+
257+
###### Can the feature be disabled once it has been enabled (i.e. can we roll back the enablement)?
258+
259+
Yes, given a restart of the Kubelet.
260+
261+
###### What happens if we reenable the feature if it was previously rolled back?
262+
263+
- Nothing unexpected.
264+
265+
###### Are there any tests for feature enablement/disablement?
266+
267+
There will be a test to verify when the Kubelet configuration option is disabled that the image isn't GC'd early.
268+
269+
### Rollout, Upgrade and Rollback Planning
270+
271+
###### How can a rollout or rollback fail? Can it impact already running workloads?
272+
273+
- Invalid configuration configured.
274+
- Even in the case where the ImageMaximumGCAge is set to 0, the Kubelet will only GC images when their corresponding containers are
275+
removed, so no running workloads can be affected.
276+
277+
###### What specific metrics should inform a rollback?
278+
279+
- `kubelet_image_garbage_collected_total` metric drastically (100x) increasing, indicating thrashing of the GC manager and
280+
images being pulled.
281+
282+
###### Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested?
283+
284+
They will be, there should be no side effects.
285+
286+
###### Is the rollout accompanied by any deprecations and/or removals of features, APIs, fields of API types, flags, etc.?
287+
288+
No.
289+
290+
### Monitoring Requirements
291+
292+
###### How can an operator determine if the feature is in use by workloads?
293+
294+
- Verify the Kubelet Configuration with the Kubelet's configz endpoint
295+
- Monitor the `kubelet_image_garbage_collected_total`, and expect a slight increase.
296+
297+
###### How can someone using this feature know that it is working for their instance?
298+
299+
- [x] Other (treat as last resort)
300+
- `kubelet_image_garbage_collected_total` metric increases when an image ages out.
301+
302+
###### What are the reasonable SLOs (Service Level Objectives) for the enhancement?
303+
304+
- The eventual default value should increase the average `kubelet_image_garbage_collected_total` by no more than 10x
305+
- TODO: On what clusters?
306+
307+
###### What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service?
308+
309+
- [x] Metrics
310+
- Metric name: `kubelet_image_garbage_collected_total`
311+
- Components exposing the metric: Kubelet
312+
313+
###### Are there any missing metrics that would be useful to have to improve observability of this feature?
314+
315+
- A metric for each different GC trigger (disk usage vs time based).
316+
317+
### Dependencies
318+
319+
###### Does this feature depend on any specific services running in the cluster?
320+
321+
Just Kubelet
322+
323+
### Scalability
324+
325+
###### Will enabling / using this feature result in any new API calls?
326+
327+
- Kubelet will call `RemoveImage` to the CRI implementation when an image should be garbage collected,
328+
which could happen more frequently/faster.
329+
330+
###### Will enabling / using this feature result in introducing new API types?
331+
332+
No
333+
334+
###### Will enabling / using this feature result in any new calls to the cloud provider?
335+
336+
No
337+
338+
###### Will enabling / using this feature result in increasing size or count of the existing API objects?
339+
340+
KubeletConfiguration will gain an additional int64
341+
342+
###### Will enabling / using this feature result in increasing time taken by any operations covered by existing SLIs/SLOs?
343+
344+
No
345+
346+
###### Will enabling / using this feature result in non-negligible increase of resource usage (CPU, RAM, disk, IO, ...) in any components?
347+
348+
- Potentially, depending on the age chosen, there could be more CPU used to do the image removal.
349+
- The frequency of the image removal will be a tradeoff for existing disk space
350+
351+
###### Can enabling / using this feature result in resource exhaustion of some node resources (PIDs, sockets, inodes, etc.)?
352+
353+
- Not likely, it's intended to prevent resource exhaustion of disk
354+
355+
### Troubleshooting
356+
357+
###### How does this feature react if the API server and/or etcd is unavailable?
358+
359+
- N/A
360+
361+
###### What are other known failure modes?
362+
363+
- The Kubelet could thrash with itself in a image pull/remove cycle if the value is set too low.
364+
365+
###### What steps should be taken if SLOs are not being met to determine the problem?
366+
367+
- Set a minimum value this field could be.
368+
369+
## Implementation History
370+
371+
372+
2023-09-18: KEP opened, targeted at Alpha
373+
374+
## Drawbacks
375+
376+
- It could be considered unnecessary, as the disk usage based garbage collection already covers this use case, albeit slower.
377+
378+
## Alternatives
379+
380+
- Add a Kubelet garbage collection plugin system
381+
- Too complicated, probably won't be needed.
382+
- The Image GC WG off of SIG-Node brainstormed use cases:
383+
- Additional conditions for GC:
384+
- Removing "older" tags of the same image.
385+
- do not keep images policy.
386+
- image GC based on pod priority
387+
- Only the last item is not covered here, and it was deemed not useful enough to warrant a generic solution.
388+
- Delegate responsibility down to CRI
389+
- Would cause code duplication between CRI implementaions, out of scope for this.
390+
- The image GC WG worked to identify other conditions for GC:
391+
- Both of these can be satisfied by this KEP, so we're not pursuing a more generic GC Plugin mechanism.
392+
393+
## Infrastructure Needed (Optional)
394+
395+
N/A

0 commit comments

Comments
 (0)