Skip to content

Commit c6850f8

Browse files
Size memory backed volumes
1 parent 8d7447d commit c6850f8

File tree

2 files changed

+305
-0
lines changed

2 files changed

+305
-0
lines changed
Lines changed: 263 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,263 @@
1+
# KEP-1967: Sizable memory backed volumes
2+
3+
<!-- toc -->
4+
- [Release Signoff Checklist](#release-signoff-checklist)
5+
- [Summary](#summary)
6+
- [Motivation](#motivation)
7+
- [Goals](#goals)
8+
- [Non-Goals](#non-goals)
9+
- [Proposal](#proposal)
10+
- [Risks and Mitigations](#risks-and-mitigations)
11+
- [Design Details](#design-details)
12+
- [Test Plan](#test-plan)
13+
- [Graduation Criteria](#graduation-criteria)
14+
- [Alpha -&gt; Beta Graduation](#alpha---beta-graduation)
15+
- [Beta -&gt; GA Graduation](#beta---ga-graduation)
16+
- [Upgrade / Downgrade Strategy](#upgrade--downgrade-strategy)
17+
- [Version Skew Strategy](#version-skew-strategy)
18+
- [Production Readiness Review Questionnaire](#production-readiness-review-questionnaire)
19+
- [Feature Enablement and Rollback](#feature-enablement-and-rollback)
20+
- [Rollout, Upgrade and Rollback Planning](#rollout-upgrade-and-rollback-planning)
21+
- [Monitoring Requirements](#monitoring-requirements)
22+
- [Dependencies](#dependencies)
23+
- [Scalability](#scalability)
24+
- [Troubleshooting](#troubleshooting)
25+
- [Implementation History](#implementation-history)
26+
- [Drawbacks](#drawbacks)
27+
- [Alternatives](#alternatives)
28+
- [Infrastructure Needed (Optional)](#infrastructure-needed-optional)
29+
<!-- /toc -->
30+
31+
## Release Signoff Checklist
32+
33+
Items marked with (R) are required *prior to targeting to a milestone / release*.
34+
35+
- [ ] (R) Enhancement issue in release milestone, which links to KEP dir in [kubernetes/enhancements] (not the initial KEP PR)
36+
- [ ] (R) KEP approvers have approved the KEP status as `implementable`
37+
- [ ] (R) Design details are appropriately documented
38+
- [ ] (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input
39+
- [ ] (R) Graduation criteria is in place
40+
- [ ] (R) Production readiness review completed
41+
- [ ] Production readiness review approved
42+
- [ ] "Implementation History" section is up-to-date for milestone
43+
- [ ] User-facing documentation has been created in [kubernetes/website], for publication to [kubernetes.io]
44+
- [ ] Supporting documentation—e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes
45+
46+
## Summary
47+
48+
This KEP improves the portability of pod definitions that use memory backed empty dir
49+
volumes by sizing an empty dir memory backed volume as the minimum of pod allocatable
50+
memory on a host and an optional explicit user provided value.
51+
52+
## Motivation
53+
54+
Kubernetes supports emptyDir volumes whose backing storage is memory (i.e. tmpfs).
55+
The size of this memory backed volume is defaulted to 50% of the memory on a Linux host.
56+
The coupling of default memory backed volume size with the host that runs the pod makes
57+
pod definitions less portable across node instance types and providers.
58+
59+
This impacts workloads that make heavy use of /dev/shm or other use cases oriented around
60+
memory backed volume usage (AI/ML, etc.)
61+
62+
### Goals
63+
64+
- Size a memory backed volume to match the pod allocatable memory
65+
- Enable a user to size the memory backed volume less than the pod allocatable memory
66+
67+
### Non-Goals
68+
69+
- Address memory chargeback of empty dir volumes across container restarts
70+
71+
## Proposal
72+
73+
Define a new feature gate: `SizeMemoryBackedVolumes`.
74+
75+
If enabled, the `kubelet` will change the behavior when building memory backed
76+
volume to specify a non-zero size that is the following:
77+
78+
`min(nodeAllocatable[memory], podAllocatable[memory], emptyDir.sizeLimit)`
79+
80+
This is an improvement over present behavior as pods will see emptyDir memory
81+
backed volumes sized based on actual allowed usage rather than a heuristic
82+
based on the node that is executing the pod.
83+
84+
### Risks and Mitigations
85+
86+
The risks for this proposal are minimal.
87+
88+
The empty dir volume will now be sized consistently with pod level cgroup
89+
memory limit. A container that writes to a memory backed volume is charged
90+
for that write while accounting memory. If a container restarts, the charge
91+
goes to the pod cgroup. Sizing the emptyDir volume to match the actual amount
92+
of memory that can be charged to a pod basically avoids undersizing or oversizing
93+
the appearance of more memory.
94+
95+
## Design Details
96+
97+
The design for this implementation makes the existing `emptyDir.sizeLimit`
98+
not just used during eviction heuristics, but for sizing of the volume.
99+
Since the user is unable to write more to the volume than what the pod
100+
cgroup bounds, there is no material difference to enforcement around
101+
memory consumption, it just provides better sizing across node types.
102+
103+
### Test Plan
104+
105+
Node e2e testing will capture the following:
106+
107+
- verify empty dir volume size matches sizeLimit (if specified) OR
108+
- verify empty dir volume size matches pod available memory
109+
110+
To verify the pod available memory scenario, we will verify the
111+
memory backed volume size is equivalent to the pod cgroup memory
112+
or node allocatable memory limit.
113+
114+
### Graduation Criteria
115+
116+
#### Alpha -> Beta Graduation
117+
118+
- All feedback gathered from users of memory backed volumes (expect to be minimal)
119+
- Adequate test signal quality for node e2e
120+
- Tests are in Testgrid and linked in KEP
121+
122+
#### Beta -> GA Graduation
123+
124+
- Allowing time for additional user feedback and bug reports
125+
126+
### Upgrade / Downgrade Strategy
127+
128+
Not applicable.
129+
130+
The `kubelet` will size the memory backed volume to map how writes
131+
are charged. If downgrade to a prior kubelet, the volume size would
132+
default to linux host behavior.
133+
134+
### Version Skew Strategy
135+
136+
The feature changes the operating environment presented to a pod,
137+
so a pod will either get an accurate empty dir volume size, or a
138+
potentially inaccurate volume size based on node configuration.
139+
140+
## Production Readiness Review Questionnaire
141+
142+
### Feature Enablement and Rollback
143+
144+
_This section must be completed when targeting alpha to a release._
145+
146+
* **How can this feature be enabled / disabled in a live cluster?**
147+
- [x] Feature gate (also fill in values in `kep.yaml`)
148+
- Feature gate name: SizeMemoryBackedVolumes
149+
- Components depending on the feature gate: kubelet
150+
- Will enabling / disabling the feature require downtime or reprovisioning
151+
of a node? No
152+
153+
* **Does enabling the feature change any default behavior?**
154+
Yes, the kubelet will size the empty dir volume to match the precise
155+
amount of memory the pod is able to write rather than over or undersizing.
156+
Prior behavior is node dependent, and so pod authors had no mechanism
157+
to control this behavior properly.
158+
159+
* **Can the feature be disabled once it has been enabled (i.e. can we roll back
160+
the enablement)?** Yes
161+
162+
* **What happens if we reenable the feature if it was previously rolled back?**
163+
Pods that run on that node will have memory backed volumes sized based on Linux
164+
host default. The sizing may not align with actual available memory for an app.
165+
166+
* **Are there any tests for feature enablement/disablement?**
167+
No, testing behavior with the feature disabled is dependent on node operating
168+
system configuration. The point of this KEP is to address that coupling.
169+
170+
### Rollout, Upgrade and Rollback Planning
171+
172+
* **How can a rollout fail? Can it impact already running workloads?**
173+
If a pod has more allocatable memory than the default node instance behavior
174+
of taking 50% node instance memory for sizing emptyDir, a pod could potentially
175+
write more content to the empty dir volume than previously. This should have
176+
no impact on rollout of the cluster or workload. In practice, applications
177+
that did exhaust the size of the memory backed volume were not portable across
178+
instance types or would have had to handle running out of room in that volume.
179+
180+
* **What specific metrics should inform a rollback?**
181+
None.
182+
183+
* **Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested?**
184+
I do not believe this is applicable.
185+
186+
* **Is the rollout accompanied by any deprecations and/or removals of features, APIs,
187+
fields of API types, flags, etc.?**
188+
Even if applying deprecation policies, they may still surprise some users.
189+
No.
190+
191+
### Monitoring Requirements
192+
193+
* **How can an operator determine if the feature is in use by workloads?**
194+
An operator can audit for pods whose emptyDir medium is memory and a size limit
195+
is specified. It's not clear there is a benefit to track this because it only
196+
impacts how the kubelet better enforces an existing API.
197+
198+
* **What are the SLIs (Service Level Indicators) an operator can use to determine
199+
the health of the service?**
200+
This does not seem relevant to this feature.
201+
202+
* **What are the reasonable SLOs (Service Level Objectives) for the above SLIs?**
203+
This does not seem relevant to this feature.
204+
205+
* **Are there any missing metrics that would be useful to have to improve observability
206+
of this feature?**
207+
No.
208+
209+
### Dependencies
210+
211+
* **Does this feature depend on any specific services running in the cluster?**
212+
No
213+
214+
### Scalability
215+
216+
* **Will enabling / using this feature result in any new API calls?**
217+
No.
218+
219+
* **Will enabling / using this feature result in introducing new API types?**
220+
No
221+
222+
* **Will enabling / using this feature result in any new calls to the cloud
223+
provider?**
224+
No
225+
226+
* **Will enabling / using this feature result in increasing size or count of
227+
the existing API objects?**
228+
No
229+
230+
* **Will enabling / using this feature result in increasing time taken by any
231+
operations covered by [existing SLIs/SLOs]?**
232+
No
233+
234+
* **Will enabling / using this feature result in non-negligible increase of
235+
resource usage (CPU, RAM, disk, IO, ...) in any components?**
236+
No
237+
238+
### Troubleshooting
239+
240+
* **How does this feature react if the API server and/or etcd is unavailable?**
241+
No impact.
242+
243+
* **What are other known failure modes?**
244+
Not applicable.
245+
246+
* **What steps should be taken if SLOs are not being met to determine the problem?**
247+
Not applicable
248+
249+
## Implementation History
250+
251+
## Drawbacks
252+
253+
None.
254+
255+
This eliminates an unintentional coupling of pod and node.
256+
257+
## Alternatives
258+
259+
None.
260+
261+
## Infrastructure Needed (Optional)
262+
263+
None.
Lines changed: 42 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,42 @@
1+
title: Size memory backed volumes
2+
kep-number: 1967
3+
authors:
4+
- "@derekwaynecarr"
5+
owning-sig: sig-node
6+
participating-sigs:
7+
- sig-storage
8+
status: implementable
9+
creation-date: 2020-09-03
10+
reviewers:
11+
- "@dashpole"
12+
approvers:
13+
- "@dashpole"
14+
prr-approvers:
15+
- johnbelamaric
16+
see-also:
17+
replaces:
18+
19+
# The target maturity stage in the current dev cycle for this KEP.
20+
stage: alpha
21+
22+
# The most recent milestone for which work toward delivery of this KEP has been
23+
# done. This can be the current (upcoming) milestone, if it is being actively
24+
# worked on.
25+
latest-milestone: "v1.20"
26+
27+
# The milestone at which this feature was, or is targeted to be, at each stage.
28+
milestone:
29+
alpha: "v1.20"
30+
beta: "v1.21"
31+
stable: "v1.22"
32+
33+
# The following PRR answers are required at alpha release
34+
# List the feature gate name and the components for which it must be enabled
35+
feature-gates:
36+
- name: SizeMemoryBackedVolumes
37+
components:
38+
- kubelet
39+
disable-supported: true
40+
41+
# The following PRR answers are required at beta release
42+
metrics:

0 commit comments

Comments
 (0)