Skip to content

Commit f4441d3

Browse files
committed
Allow hostNetwork pods to use user namespaces
1 parent cafbf08 commit f4441d3

File tree

2 files changed

+416
-0
lines changed

2 files changed

+416
-0
lines changed
Lines changed: 379 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,379 @@
1+
# KEP-5607: Allow HostNetwork Pods to Use User Namespaces
2+
3+
<!-- toc -->
4+
- [Release Signoff Checklist](#release-signoff-checklist)
5+
- [Summary](#summary)
6+
- [Motivation](#motivation)
7+
- [Goals](#goals)
8+
- [Non-Goals](#non-goals)
9+
- [Proposal](#proposal)
10+
- [User Stories (Optional)](#user-stories-optional)
11+
- [Story 1](#story-1)
12+
- [Notes/Constraints/Caveats (Optional)](#notesconstraintscaveats-optional)
13+
- [Risks and Mitigations](#risks-and-mitigations)
14+
- [Design Details](#design-details)
15+
- [Test Plan](#test-plan)
16+
- [Prerequisite testing updates](#prerequisite-testing-updates)
17+
- [Unit tests](#unit-tests)
18+
- [Integration tests](#integration-tests)
19+
- [e2e tests](#e2e-tests)
20+
- [Graduation Criteria](#graduation-criteria)
21+
- [Alpha](#alpha)
22+
- [Beta](#beta)
23+
- [GA](#ga)
24+
- [Upgrade / Downgrade Strategy](#upgrade--downgrade-strategy)
25+
- [Version Skew Strategy](#version-skew-strategy)
26+
- [Production Readiness Review Questionnaire](#production-readiness-review-questionnaire)
27+
- [Feature Enablement and Rollback](#feature-enablement-and-rollback)
28+
- [Rollout, Upgrade and Rollback Planning](#rollout-upgrade-and-rollback-planning)
29+
- [Monitoring Requirements](#monitoring-requirements)
30+
- [Dependencies](#dependencies)
31+
- [Scalability](#scalability)
32+
- [Troubleshooting](#troubleshooting)
33+
- [Implementation History](#implementation-history)
34+
- [Drawbacks](#drawbacks)
35+
- [Alternatives](#alternatives)
36+
- [Infrastructure Needed (Optional)](#infrastructure-needed-optional)
37+
<!-- /toc -->
38+
39+
## Release Signoff Checklist
40+
41+
<!--
42+
**ACTION REQUIRED:** In order to merge code into a release, there must be an
43+
issue in [kubernetes/enhancements] referencing this KEP and targeting a release
44+
milestone **before the [Enhancement Freeze](https://git.k8s.io/sig-release/releases)
45+
of the targeted release**.
46+
47+
For enhancements that make changes to code or processes/procedures in core
48+
Kubernetes—i.e., [kubernetes/kubernetes], we require the following Release
49+
Signoff checklist to be completed.
50+
51+
Check these off as they are completed for the Release Team to track. These
52+
checklist items _must_ be updated for the enhancement to be released.
53+
-->
54+
55+
Items marked with (R) are required *prior to targeting to a milestone / release*.
56+
57+
- [ ] (R) Enhancement issue in release milestone, which links to KEP dir in [kubernetes/enhancements] (not the initial KEP PR)
58+
- [ ] (R) KEP approvers have approved the KEP status as `implementable`
59+
- [ ] (R) Design details are appropriately documented
60+
- [ ] (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input (including test refactors)
61+
- [ ] e2e Tests for all Beta API Operations (endpoints)
62+
- [ ] (R) Ensure GA e2e tests meet requirements for [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md)
63+
- [ ] (R) Minimum Two Week Window for GA e2e tests to prove flake free
64+
- [ ] (R) Graduation criteria is in place
65+
- [ ] (R) [all GA Endpoints](https://github.com/kubernetes/community/pull/1806) must be hit by [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md) within one minor version of promotion to GA
66+
- [ ] (R) Production readiness review completed
67+
- [ ] (R) Production readiness review approved
68+
- [ ] "Implementation History" section is up-to-date for milestone
69+
- [ ] User-facing documentation has been created in [kubernetes/website], for publication to [kubernetes.io]
70+
- [ ] Supporting documentation—e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes
71+
72+
<!--
73+
**Note:** This checklist is iterative and should be reviewed and updated every time this enhancement is being considered for a milestone.
74+
-->
75+
76+
[kubernetes.io]: https://kubernetes.io/
77+
[kubernetes/enhancements]: https://git.k8s.io/enhancements
78+
[kubernetes/kubernetes]: https://git.k8s.io/kubernetes
79+
[kubernetes/website]: https://git.k8s.io/website
80+
81+
## Summary
82+
83+
This KEP proposes introducing a new feature gate to allow Pods to have both `hostNetwork` enabled and user namespaces enabled (by setting `hostUsers: false`).
84+
85+
## Motivation
86+
87+
The primary motivation is to enhance the security of Kubernetes control plane components. Many control plane components, such as the `kube-apiserver` and `kube-controller-manager` often run as static Pods and are configured with `hostNetwork: true` to bind to node ports or interact directly with the host's network stack.
88+
89+
Currently, a validation rule in the kube-apiserver prevents the combination of `hostNetwork: true` and `hostUsers: false`. This KEP aims to remove that barrier.
90+
91+
### Goals
92+
93+
* Introduce a new, separate alpha feature gate: `UserNamespacesHostNetworkSupport`.
94+
95+
* When this feature gate is enabled, modify the Pod validation logic to allow Pod specs where `spec.hostNetwork` is true and `spec.hostUsers` is false.
96+
97+
### Non-Goals
98+
99+
Including this functionality as part of the `UserNamespacesSupport` feature gate. As `UserNamespacesSupport` is nearing GA, it would be unwise to add a new, unstable feature with external dependencies.
100+
101+
## Proposal
102+
103+
We propose the introduction of a new feature gate named `UserNamespacesHostNetworkSupport`.
104+
105+
When this feature gate is disabled (the default state), the kube-apiserver will maintain the current validation behavior, rejecting any Pod spec that includes both `spec.hostNetwork: true` and `spec.hostUsers: false`.
106+
107+
When the `UserNamespacesHostNetworkSupport` feature gate is enabled, we will relax this validation check.
108+
The kube-apiserver will accept such a Pod spec and pass it on to the kubelet.
109+
At this point, the responsibility for successfully creating and running the Pod shifts to the container runtime.
110+
If the low-level container runtime (e.g., containerd/runc) does not support this combination, the pod will remain stuck in the `ContainerCreating` state and report an exception event, which is the expected behavior.
111+
112+
This change will primarily involve modifying the Pod validation function in pkg/apis/core/validation/validation.go to account for the state of the new feature gate.
113+
114+
### User Stories (Optional)
115+
116+
#### Story 1
117+
As a cluster administrator, I want to enable user namespaces for my control plane static Pods (e.g., kube-apiserver, kube-controller-manager) to follow the principle of least privilege and reduce the attack surface. These Pods need to use hostNetwork to interact correctly with the cluster network. By enabling the new feature gate, I can add a critical layer of security isolation to these vital components without changing their networking model.
118+
119+
120+
### Notes/Constraints/Caveats (Optional)
121+
122+
### Risks and Mitigations
123+
124+
125+
## Design Details
126+
127+
The core design change is very simple: in the apiserver's Pod validation logic, locate the code block that prevents the `hostNetwork: true` and `hostUsers: false` combination, and wrap it in a conditional that only executes the validation if the `UserNamespacesHostNetworkSupport` feature gate is disabled.
128+
```
129+
func validateHostUsers(spec *core.PodSpec, fldPath *field.Path, opts PodValidationOptions) field.ErrorList {
130+
allErrs := field.ErrorList{}
131+
132+
// ... existing validations ...
133+
134+
// Note we already validated above spec.SecurityContext is not nil.
135+
if !utilfeature.DefaultFeatureGate.Enabled(features.UserNamespacesHostNetworkSupport) && spec.SecurityContext.HostNetwork {
136+
allErrs = append(allErrs, field.Forbidden(fldPath.Child("hostNetwork"), "when `hostUsers` is false"))
137+
}
138+
139+
// ... existing validations ...
140+
141+
return allErrs
142+
}
143+
144+
```
145+
146+
### Test Plan
147+
148+
[ ] I/we understand the owners of the involved components may require updates to
149+
existing tests to make this code solid enough prior to committing the changes necessary
150+
to implement this enhancement.
151+
152+
##### Prerequisite testing updates
153+
154+
##### Unit tests
155+
156+
- `pkg/apis/core/validation`: `2025-10-03` - `85.1%`
157+
158+
##### Integration tests
159+
160+
##### e2e tests
161+
162+
- Add e2e tests to ensure that pods with the combination of `hostNetwork: true` and `hostUsers: false` can run properly.
163+
164+
### Graduation Criteria
165+
166+
#### Alpha
167+
168+
- The `UserNamespacesHostNetworkSupport` feature gate is implemented and disabled by default.
169+
170+
#### Beta
171+
172+
- At least one mainstream container runtime and one low-level container runtime (e.g., containerd/runc) have released official versions supporting the simultaneous enabling of hostNetwork and user namespaces.
173+
- Add e2e tests to ensure feature availability.
174+
175+
#### GA
176+
177+
- The feature has been stable in Beta for at least 2 Kubernetes releases.
178+
- Multiple major container runtimes support the feature.
179+
180+
181+
### Upgrade / Downgrade Strategy
182+
183+
Upgrade: After upgrading to a version that supports this KEP, the `UserNamespacesHostNetworkSupport` feature gate can be enabled at any time.
184+
185+
Downgrade: If downgrading to a version that does not support this KEP, the kube-apiserver will revert to strict validation. Pods already running with this combination will be unaffected, but new or updated Pod requests attempting to use this combination will be rejected.
186+
187+
### Version Skew Strategy
188+
189+
A newer kube-apiserver with this feature enabled will accept such a Pod.
190+
191+
An older kubelet will still get the Pod definition from the kube-apiserver.
192+
It will attempt to create the Pod, and the success or failure will depend on the version of the container runtime it is using.
193+
194+
## Production Readiness Review Questionnaire
195+
196+
### Feature Enablement and Rollback
197+
198+
###### How can this feature be enabled / disabled in a live cluster?
199+
200+
- [ ] Feature gate (also fill in values in `kep.yaml`)
201+
- Feature gate name: `UserNamespacesHostNetworkSupport`
202+
- Components depending on the feature gate: `kube-apiserver`
203+
- [ ] Other
204+
- Describe the mechanism:
205+
- Will enabling / disabling the feature require downtime of the control
206+
plane?
207+
- Will enabling / disabling the feature require downtime or reprovisioning
208+
of a node?
209+
210+
###### Does enabling the feature change any default behavior?
211+
No. The behavior only changes when a user explicitly sets both `hostNetwork: true` and `hostUsers: false` in a Pod spec.
212+
The behavior of all existing Pods is unaffected.
213+
214+
###### Can the feature be disabled once it has been enabled (i.e. can we roll back the enablement)?
215+
216+
Yes. It can be disabled by setting the feature gate to false and restarting the kube-apiserver.
217+
This restores the old validation logic.
218+
It will not affect any Pods already running with this combination but will prevent new ones from being created.
219+
220+
###### What happens if we reenable the feature if it was previously rolled back?
221+
The kube-apiserver will once again begin to accept the combination of `hostNetwork: true` and `hostUsers: false`.
222+
This is a stateless change, and reenabling is safe.
223+
224+
###### Are there any tests for feature enablement/disablement?
225+
226+
### Rollout, Upgrade and Rollback Planning
227+
228+
###### How can a rollout or rollback fail? Can it impact already running workloads?
229+
230+
The [Version Skew Strategy](#version-skew-strategy) section covers this point.
231+
232+
###### What specific metrics should inform a rollback?
233+
234+
N/A
235+
236+
###### Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested?
237+
238+
This will be validated via manual testing.
239+
240+
###### Is the rollout accompanied by any deprecations and/or removals of features, APIs, fields of API types, flags, etc.?
241+
242+
No.
243+
244+
### Monitoring Requirements
245+
246+
<!--
247+
This section must be completed when targeting beta to a release.
248+
249+
For GA, this section is required: approvers should be able to confirm the
250+
previous answers based on experience in the field.
251+
-->
252+
253+
###### How can an operator determine if the feature is in use by workloads?
254+
255+
<!--
256+
Ideally, this should be a metric. Operations against the Kubernetes API (e.g.,
257+
checking if there are objects with field X set) may be a last resort. Avoid
258+
logs or events for this purpose.
259+
-->
260+
261+
###### How can someone using this feature know that it is working for their instance?
262+
263+
<!--
264+
For instance, if this is a pod-related feature, it should be possible to determine if the feature is functioning properly
265+
for each individual pod.
266+
Pick one more of these and delete the rest.
267+
Please describe all items visible to end users below with sufficient detail so that they can verify correct enablement
268+
and operation of this feature.
269+
Recall that end users cannot usually observe component logs or access metrics.
270+
-->
271+
272+
- [ ] Events
273+
- Event Reason:
274+
- [ ] API .status
275+
- Condition name:
276+
- Other field:
277+
- [ ] Other (treat as last resort)
278+
- Details:
279+
280+
###### What are the reasonable SLOs (Service Level Objectives) for the enhancement?
281+
282+
<!--
283+
This is your opportunity to define what "normal" quality of service looks like
284+
for a feature.
285+
286+
It's impossible to provide comprehensive guidance, but at the very
287+
high level (needs more precise definitions) those may be things like:
288+
- per-day percentage of API calls finishing with 5XX errors <= 1%
289+
- 99% percentile over day of absolute value from (job creation time minus expected
290+
job creation time) for cron job <= 10%
291+
- 99.9% of /health requests per day finish with 200 code
292+
293+
These goals will help you determine what you need to measure (SLIs) in the next
294+
question.
295+
-->
296+
297+
###### What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service?
298+
299+
<!--
300+
Pick one more of these and delete the rest.
301+
-->
302+
303+
- [ ] Metrics
304+
- Metric name:
305+
- [Optional] Aggregation method:
306+
- Components exposing the metric:
307+
- [ ] Other (treat as last resort)
308+
- Details:
309+
310+
###### Are there any missing metrics that would be useful to have to improve observability of this feature?
311+
312+
<!--
313+
Describe the metrics themselves and the reasons why they weren't added (e.g., cost,
314+
implementation difficulties, etc.).
315+
-->
316+
317+
### Dependencies
318+
319+
###### Does this feature depend on any specific services running in the cluster?
320+
321+
No
322+
323+
### Scalability
324+
325+
###### Will enabling / using this feature result in any new API calls?
326+
No.
327+
328+
###### Will enabling / using this feature result in introducing new API types?
329+
No.
330+
331+
###### Will enabling / using this feature result in any new calls to the cloud provider?
332+
No.
333+
334+
###### Will enabling / using this feature result in increasing size or count of the existing API objects?
335+
No.
336+
337+
###### Will enabling / using this feature result in increasing time taken by any operations covered by existing SLIs/SLOs?
338+
No.
339+
340+
###### Will enabling / using this feature result in non-negligible increase of resource usage (CPU, RAM, disk, IO, ...) in any components?
341+
No.
342+
343+
###### Can enabling / using this feature result in resource exhaustion of some node resources (PIDs, sockets, inodes, etc.)?
344+
No.
345+
346+
### Troubleshooting
347+
348+
###### How does this feature react if the API server and/or etcd is unavailable?
349+
No impact to the running workloads
350+
351+
###### What are other known failure modes?
352+
If the container runtime or low-level runtime (e.g., containerd/runc) does not support the combination of hostNetwork and user namespaces, the pod will remain stuck in the `ContainerCreating` state and fail to be created.
353+
354+
###### What steps should be taken if SLOs are not being met to determine the problem?
355+
356+
N/A
357+
358+
## Implementation History
359+
360+
* 2025-10-03: Initial proposal
361+
362+
## Drawbacks
363+
364+
There are no known drawbacks at this time.
365+
366+
367+
## Alternatives
368+
369+
Add this feature to the existing `UserNamespacesSupport` feature gate:
370+
371+
* This was ruled out because the `UserNamespacesSupport` feature is approaching GA, and its functionality should be stable.
372+
Adding a new, externally-dependent, and immature behavior to a nearly-GA feature would introduce unnecessary risk and delays. Keeping the two feature gates separate is cleaner and safer.
373+
374+
Do not implement this feature:
375+
* Users can use `hostPort` as an alternative to `hostNetwork`, but this may cause some disruption to the existing user environment, as certain privileged containers require direct interaction with the host network stack.
376+
377+
## Infrastructure Needed (Optional)
378+
379+
No new infrastructure needed.

0 commit comments

Comments
 (0)