Skip to content

Commit 9d4e9c9

Browse files
authored
Merge pull request kubernetes#2055 from derekwaynecarr/downward-api-hugepages
Downward API support for HugePages
2 parents 5dcf841 + 029967e commit 9d4e9c9

File tree

2 files changed

+312
-0
lines changed

2 files changed

+312
-0
lines changed
Lines changed: 265 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,265 @@
1+
# KEP-1967: Downward API HugePages
2+
3+
<!-- toc -->
4+
- [Release Signoff Checklist](#release-signoff-checklist)
5+
- [Summary](#summary)
6+
- [Motivation](#motivation)
7+
- [Goals](#goals)
8+
- [Non-Goals](#non-goals)
9+
- [Proposal](#proposal)
10+
- [Risks and Mitigations](#risks-and-mitigations)
11+
- [Design Details](#design-details)
12+
- [Test Plan](#test-plan)
13+
- [Graduation Criteria](#graduation-criteria)
14+
- [Alpha](#alpha)
15+
- [Alpha -&gt; Beta Graduation](#alpha---beta-graduation)
16+
- [Beta -&gt; GA Graduation](#beta---ga-graduation)
17+
- [Upgrade / Downgrade Strategy](#upgrade--downgrade-strategy)
18+
- [Version Skew Strategy](#version-skew-strategy)
19+
- [Production Readiness Review Questionnaire](#production-readiness-review-questionnaire)
20+
- [Feature Enablement and Rollback](#feature-enablement-and-rollback)
21+
- [Rollout, Upgrade and Rollback Planning](#rollout-upgrade-and-rollback-planning)
22+
- [Monitoring Requirements](#monitoring-requirements)
23+
- [Dependencies](#dependencies)
24+
- [Scalability](#scalability)
25+
- [Troubleshooting](#troubleshooting)
26+
- [Implementation History](#implementation-history)
27+
- [Drawbacks](#drawbacks)
28+
- [Alternatives](#alternatives)
29+
- [Infrastructure Needed (Optional)](#infrastructure-needed-optional)
30+
<!-- /toc -->
31+
32+
## Release Signoff Checklist
33+
34+
Items marked with (R) are required *prior to targeting to a milestone / release*.
35+
36+
- [ ] (R) Enhancement issue in release milestone, which links to KEP dir in [kubernetes/enhancements] (not the initial KEP PR)
37+
- [ ] (R) KEP approvers have approved the KEP status as `implementable`
38+
- [ ] (R) Design details are appropriately documented
39+
- [ ] (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input
40+
- [ ] (R) Graduation criteria is in place
41+
- [ ] (R) Production readiness review completed
42+
- [ ] Production readiness review approved
43+
- [ ] "Implementation History" section is up-to-date for milestone
44+
- [ ] User-facing documentation has been created in [kubernetes/website], for publication to [kubernetes.io]
45+
- [ ] Supporting documentation—e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes
46+
47+
## Summary
48+
49+
This KEP exposes hugepages in the downward API.
50+
51+
## Motivation
52+
53+
Pods are unable to know their hugepage request or limits via the downward API. HugePages
54+
are a natively supported resource in Kubernetes and should be visible in downward API
55+
consistent with other resources like cpu, memory, ephemeral-storage.
56+
57+
### Goals
58+
59+
- Add support for hugepage requests and limits for all page sizes in downward API
60+
61+
### Non-Goals
62+
63+
- Change any other aspect of hugepage support
64+
65+
## Proposal
66+
67+
Define a new feature gate: `DownwardAPIHugePages`.
68+
69+
If enabled, the `kube-apiserver` will allow pod specifications to make use
70+
of hugepages in downward API when the feature gate is enabled. The `kubelet`
71+
will add support for hugepages in the downward API independent of the feature
72+
gate.
73+
74+
### Risks and Mitigations
75+
76+
The primary risk for this proposal is that it loosens validation for Pods.
77+
78+
The mitigation proposed is as follows:
79+
80+
- Add support for the new fields in `kubelet` by default. This is considered
81+
low risk as the code is inert when pods do not use the tokens, and the subsystem
82+
in the kubelet is localized.
83+
- The `kube-apiserver` will have the feature gate disabled by default for 2
84+
releases until we know all supported skew scenarios result in all kubelets having
85+
the supported code present.
86+
87+
When the gate is enabled, the `kube-apiserver` will permit the newly allowed
88+
values in all creation and update scenarios. When the gate is disabled, the
89+
new values are permitted only in updates of objects which already contain
90+
the new values. Use in creation of in updates of objects which do not
91+
already use the new values will fail validation.
92+
93+
## Design Details
94+
95+
Add support for `requests.hugepages-<pagesize>` and `limits.hugepages-<pagesize>`
96+
to downward API consistent with cpu, memory, and ephemeral storage. Enable the
97+
support by default in the kubelet, but gate its usage by default in the `kube-apiserver`
98+
for 2 releases to ensure all nodes in the cluster have been proper support.
99+
100+
It is important to remember that `hugepages-<pagesize>` is not a resource
101+
that is subject to overcommit. A pod must have a matching request and limit
102+
for an explicit `hugepages-<pagesize>` in order to consume hugepages. Absent
103+
an explicit request, no `hugepages-<pagesize>` is provided to a pod.
104+
105+
The `kube-apiserver` will not require pods to make an explicit `hugepages-<pagesize>`
106+
request in its pod spec in order to use the field in the downward API. The rationale
107+
for this behavior is that pod templates for specific workload types may support
108+
running with or without `hugepages-<pagesize>` made available to them and as a result,
109+
it may include both memory and hugepages in the downward API in order to know how to adjust.
110+
The `kubelet` will ensure that the downward API value projected into the container for
111+
a specific `hugepages-<pagesize>` will match what is provided with its bounding pod
112+
and or container cgroup.
113+
114+
### Test Plan
115+
116+
Unit and e2e testing will be added consistent with other resources in downward API.
117+
118+
e2e testing will only function if a node in the cluster exposes hugepages, otherwise,
119+
it will gracefully skip (as expected).
120+
121+
### Graduation Criteria
122+
123+
#### Alpha
124+
125+
- Feature gate is present and enforced in kube-apiserver
126+
- Validation logic is in-place in kube-apiserver
127+
- Kubelet has support for projecting the value in the pod
128+
- unit testing for downward API enhancement
129+
130+
#### Alpha -> Beta Graduation
131+
132+
- Added support in kube-apiserver protected by feature gate
133+
- Added support in kubelet for 2 releases.
134+
- e2e testing for hosts with hugepages enabled
135+
136+
#### Beta -> GA Graduation
137+
138+
- Enable support by default one release after kube-apiserver feature gate is enabled in beta.
139+
140+
### Upgrade / Downgrade Strategy
141+
142+
The kubelet will have the support for 2 releases before its
143+
enabled in the kube-apiserver. This ensures that pods cannot
144+
get accepted in the platform for which nodes do not have support.
145+
146+
### Version Skew Strategy
147+
148+
The kubelet will have the support for 2 releases before its
149+
enabled in the kube-apiserver. This ensures that pods cannot
150+
get accepted in the platform for which nodes do not have support.
151+
152+
## Production Readiness Review Questionnaire
153+
154+
### Feature Enablement and Rollback
155+
156+
_This section must be completed when targeting alpha to a release._
157+
158+
* **How can this feature be enabled / disabled in a live cluster?**
159+
- [x] Feature gate (also fill in values in `kep.yaml`)
160+
- Feature gate name: DownwardAPIHugePages
161+
- Components depending on the feature gate: kube-apiserver
162+
- Will enabling / disabling the feature require downtime or reprovisioning
163+
of a node? No
164+
165+
* **Does enabling the feature change any default behavior?**
166+
Yes, the kube-apiserver will admit pods that use the new downward API support.
167+
168+
* **Can the feature be disabled once it has been enabled (i.e. can we roll back
169+
the enablement)?** Yes
170+
Only if pods were not admitted that used the feature.
171+
172+
* **What happens if we reenable the feature if it was previously rolled back?**
173+
Nothing. New pods will now accept the new fields in admission.
174+
175+
* **Are there any tests for feature enablement/disablement?**
176+
No, this will be handled by coordinating support in the kubelet.
177+
178+
### Rollout, Upgrade and Rollback Planning
179+
180+
* **How can a rollout fail? Can it impact already running workloads?**
181+
If all kubelets in a cluster do not have support for hugepages enabled
182+
prior to accepting pods in the kube-apiserver that use it in the downward api,
183+
a node may not start with the downward api information made available. It would
184+
impact the operating environment for the application and not the cluster.
185+
186+
* **What specific metrics should inform a rollback?**
187+
None.
188+
189+
* **Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested?**
190+
I do not believe this is applicable.
191+
192+
* **Is the rollout accompanied by any deprecations and/or removals of features, APIs,
193+
fields of API types, flags, etc.?**
194+
Even if applying deprecation policies, they may still surprise some users.
195+
No, validation is loosened but coordinated across N-2 releases.
196+
197+
### Monitoring Requirements
198+
199+
* **How can an operator determine if the feature is in use by workloads?**
200+
An operator could audit pods that use the new downward API tokens.
201+
202+
* **What are the SLIs (Service Level Indicators) an operator can use to determine
203+
the health of the service?**
204+
This does not seem relevant to this feature.
205+
206+
* **What are the reasonable SLOs (Service Level Objectives) for the above SLIs?**
207+
This does not seem relevant to this feature.
208+
209+
* **Are there any missing metrics that would be useful to have to improve observability
210+
of this feature?**
211+
No.
212+
213+
### Dependencies
214+
215+
* **Does this feature depend on any specific services running in the cluster?**
216+
No
217+
218+
### Scalability
219+
220+
* **Will enabling / using this feature result in any new API calls?**
221+
No.
222+
223+
* **Will enabling / using this feature result in introducing new API types?**
224+
No
225+
226+
* **Will enabling / using this feature result in any new calls to the cloud
227+
provider?**
228+
No
229+
230+
* **Will enabling / using this feature result in increasing size or count of
231+
the existing API objects?**
232+
No
233+
234+
* **Will enabling / using this feature result in increasing time taken by any
235+
operations covered by [existing SLIs/SLOs]?**
236+
No
237+
238+
* **Will enabling / using this feature result in non-negligible increase of
239+
resource usage (CPU, RAM, disk, IO, ...) in any components?**
240+
No
241+
242+
### Troubleshooting
243+
244+
* **How does this feature react if the API server and/or etcd is unavailable?**
245+
No impact.
246+
247+
* **What are other known failure modes?**
248+
Not applicable.
249+
250+
* **What steps should be taken if SLOs are not being met to determine the problem?**
251+
Not applicable
252+
253+
## Implementation History
254+
255+
## Drawbacks
256+
257+
None.
258+
259+
## Alternatives
260+
261+
None.
262+
263+
## Infrastructure Needed (Optional)
264+
265+
None.
Lines changed: 47 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,47 @@
1+
title: Downward API HugePages
2+
kep-number: 2053
3+
authors:
4+
- "@derekwaynecarr"
5+
owning-sig: sig-node
6+
participating-sigs: []
7+
status: implementable
8+
creation-date: 2020-06-18
9+
reviewers:
10+
- "@dashpole"
11+
- "@sjenning"
12+
approvers:
13+
- "@dashpole"
14+
- "@sjenning"
15+
- "@dchen1107"
16+
prr-approvers:
17+
- "deads2k"
18+
- "johnbelamaric"
19+
- "wojtek-t"
20+
see-also:
21+
- "/keps/sig-node/20190129-hugepages.md"
22+
replaces: []
23+
24+
# The target maturity stage in the current dev cycle for this KEP.
25+
stage: alpha
26+
27+
# The most recent milestone for which work toward delivery of this KEP has been
28+
# done. This can be the current (upcoming) milestone, if it is being actively
29+
# worked on.
30+
latest-milestone: "v1.20"
31+
32+
# The milestone at which this feature was, or is targeted to be, at each stage.
33+
milestone:
34+
alpha: "v1.20"
35+
beta: "v1.21"
36+
stable: "v1.22"
37+
38+
# The following PRR answers are required at alpha release
39+
# List the feature gate name and the components for which it must be enabled
40+
feature-gates:
41+
- name: DownwardAPIHugePages
42+
components:
43+
- kube-apiserver
44+
disable-supported: true
45+
46+
metrics:
47+
- "N/A"

0 commit comments

Comments
 (0)