Skip to content

Commit d3c8d0b

Browse files
committed
KEP-2033: KubeletInUserNamespace: promote to beta
Signed-off-by: Akihiro Suda <[email protected]>
1 parent 60c42cd commit d3c8d0b

File tree

3 files changed

+138
-53
lines changed

3 files changed

+138
-53
lines changed
Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,5 @@
11
kep-number: 2033
22
alpha:
33
approver: "@ehashman"
4+
beta:
5+
approver: "@soltysh"

keps/sig-node/2033-kubelet-in-userns-aka-rootless/README.md

Lines changed: 128 additions & 50 deletions
Original file line numberDiff line numberDiff line change
@@ -148,20 +148,20 @@ checklist items _must_ be updated for the enhancement to be released.
148148

149149
Items marked with (R) are required *prior to targeting to a milestone / release*.
150150

151-
- [ ] (R) Enhancement issue in release milestone, which links to KEP dir in [kubernetes/enhancements] (not the initial KEP PR)
152-
- [ ] (R) KEP approvers have approved the KEP status as `implementable`
153-
- [ ] (R) Design details are appropriately documented
154-
- [ ] (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input (including test refactors)
155-
- [ ] e2e Tests for all Beta API Operations (endpoints)
156-
- [ ] (R) Ensure GA e2e tests meet requirements for [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md)
157-
- [ ] (R) Minimum Two Week Window for GA e2e tests to prove flake free
151+
- [X] (R) Enhancement issue in release milestone, which links to KEP dir in [kubernetes/enhancements] (not the initial KEP PR)
152+
- [X] (R) KEP approvers have approved the KEP status as `implementable`
153+
- [X] (R) Design details are appropriately documented
154+
- [X] (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input (including test refactors)
155+
- [N/A] e2e Tests for all Beta API Operations (endpoints)
156+
- [N/A] (R) Ensure GA e2e tests meet requirements for [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md)
157+
- [N/A] (R) Minimum Two Week Window for GA e2e tests to prove flake free
158158
- [ ] (R) Graduation criteria is in place
159-
- [ ] (R) [all GA Endpoints](https://github.com/kubernetes/community/pull/1806) must be hit by [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md) within one minor version of promotion to GA
160-
- [ ] (R) Production readiness review completed
161-
- [ ] (R) Production readiness review approved
162-
- [ ] "Implementation History" section is up-to-date for milestone
163-
- [ ] User-facing documentation has been created in [kubernetes/website], for publication to [kubernetes.io]
164-
- [ ] Supporting documentation—e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes
159+
- [N/A] (R) [all GA Endpoints](https://github.com/kubernetes/community/pull/1806) must be hit by [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md) within one minor version of promotion to GA
160+
- [X] (R) Production readiness review completed
161+
- [X] (R) Production readiness review approved
162+
- [X] "Implementation History" section is up-to-date for milestone
163+
- [X] User-facing documentation has been created in [kubernetes/website], for publication to [kubernetes.io]
164+
- [X] Supporting documentation—e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes
165165

166166
<!--
167167
**Note:** This checklist is iterative and should be reviewed and updated every time this enhancement is being considered for a milestone.
@@ -493,7 +493,11 @@ The patch modifies `kubelet` to ignore errors that happens during setting the fo
493493
#### kube-proxy
494494
Patch: ["kube-proxy: allow running in userns"](https://github.com/rootless-containers/usernetes/blob/v20210303.0/src/patches/kubernetes/0002-kube-proxy-allow-running-in-userns.patch)
495495

496-
The patch modifies `kube-proxy` to ignore an error during setting `RLIMIT_NOFILE`.
496+
The patch modifies `kube-proxy` (`userspace` mode) to ignore an error during setting `RLIMIT_NOFILE`.
497+
No change is needed for non-userspace mode.
498+
499+
> **Note**
500+
> `userspace` proxy was removed in v1.26.
497501

498502
### Test Plan
499503

@@ -508,19 +512,19 @@ when drafting this test plan.
508512
[testing-guidelines]: https://git.k8s.io/community/contributors/devel/sig-testing/testing.md
509513
-->
510514

511-
[ ] I/we understand the owners of the involved components may require updates to
515+
[X] I/we understand the owners of the involved components may require updates to
512516
existing tests to make this code solid enough prior to committing the changes necessary
513517
to implement this enhancement.
514518

515-
Tests are present in several subproject repos and third party repos:
516-
- https://github.com/kubernetes-sigs/kind/blob/v0.17.0/.github/workflows/cgroup2.yaml#L24
517-
- https://github.com/kubernetes/minikube/blob/v1.29.0/.github/workflows/pr.yml#L293-L410
518-
- https://github.com/k3s-io/k3s/blob/v1.26.1+k3s1/.github/workflows/cgroup.yaml#L92-L99
519-
- https://github.com/rootless-containers/usernetes/blob/v20221007.0/.cirrus.yml
519+
See [e2e tests](#e2e-tests) below.
520520

521-
Tests will be added to `kubernetes/test-infra` as well when the [`k8s-infra-prow-build`](https://github.com/kubernetes/k8s.io/blob/a071c4ed0823f193ee29e2f14e191be42dc1a1f0/infra/gcp/terraform/k8s-infra-prow-build/main.tf#L78) cluster
522-
is upgraded to use cgroup v2.
523-
This will probably automatically happen when [GKE bumps up their "regular" channel to Kubernetes v1.26 or later](https://cloud.google.com/kubernetes-engine/docs/how-to/node-system-config).
521+
Additional tests are present in several subproject repos and third party repos:
522+
- https://github.com/kubernetes-sigs/kind/blob/v0.29.0/.github/workflows/vm.yaml#L24
523+
- https://github.com/kubernetes/minikube/blob/v1.36.0/.github/workflows/pr.yml#L299-L415
524+
- https://github.com/k3s-io/k3s/blob/v1.33.1%2Bk3s1/.github/workflows/e2e.yaml#L56
525+
- https://github.com/rootless-containers/usernetes/blob/gen2-v20250501.0/.github/workflows/main.yaml
526+
- Covers multi-node clusters with Flannel (VXLAN)
527+
- Covers several host distributions (Ubuntu, CentOS Stream, and Fedora)
524528

525529
##### Prerequisite testing updates
526530

@@ -550,7 +554,14 @@ This can inform certain test coverage improvements that we want to do before
550554
extending the production code to implement this enhancement.
551555
-->
552556

553-
- `<package>`: `<date>` - `<test coverage>`
557+
N/A.
558+
Unit tests do not make sense here, as the relevant code depends on sysctl:
559+
- https://github.com/kubernetes/kubernetes/blob/v1.34.1/pkg/kubelet/cm/container_manager_linux.go#L483-L485
560+
- https://github.com/kubernetes/kubernetes/blob/v1.34.1/pkg/kubelet/kubelet.go#L559-L567
561+
562+
The feature can be tested only by running the entire node components in UserNS.
563+
564+
See [e2e tests](#e2e-tests) below for how the feature is actually tested.
554565

555566
##### Integration tests
556567

@@ -576,7 +587,9 @@ This can be done with:
576587
- a search in the Kubernetes bug triage tool (https://storage.googleapis.com/k8s-triage/index.html)
577588
-->
578589

579-
- [test name](https://github.com/kubernetes/kubernetes/blob/2334b8469e1983c525c0c6382125710093a25883/test/integration/...): [integration master](https://testgrid.k8s.io/sig-release-master-blocking#integration-master?include-filter-by-regex=MyCoolFeature), [triage search](https://storage.googleapis.com/k8s-triage/index.html?test=MyCoolFeature)
590+
N/A, as integration tests do not make sense here, for the same reason as explained above for the [unit tests](#unit-tests).
591+
592+
See [e2e tests](#e2e-tests) below for how the feature is actually tested.
580593

581594
##### e2e tests
582595

@@ -595,7 +608,31 @@ We expect no non-infra related flakes in the last month as a GA graduation crite
595608
If e2e tests are not necessary or useful, explain why.
596609
-->
597610

598-
- [test name](https://github.com/kubernetes/kubernetes/blob/2334b8469e1983c525c0c6382125710093a25883/test/e2e/...): [SIG ...](https://testgrid.k8s.io/sig-...?include-filter-by-regex=MyCoolFeature), [triage search](https://storage.googleapis.com/k8s-triage/index.html?test=MyCoolFeature)
611+
`NodeConformance` tests are executed using [kubetest2-kindinv](https://github.com/rootless-containers/kubetest2-kindinv).
612+
613+
"kindinv" stands for "Kubernetes in (Rootless) Docker in (GCE) VM".
614+
GCE VM is used for enabling systemd that is required by Rootless Docker to set up cgroup v2.
615+
616+
```bash
617+
exec kubetest2 kindinv \
618+
--boskos-location=http://boskos.test-pods.svc.cluster.local \
619+
--gcp-zone=us-central1-b \
620+
--instance-image=ubuntu-os-cloud/ubuntu-2404-lts-amd64 \
621+
--instance-type=n2-standard-4 \
622+
--kind-rootless \
623+
--user=rootless \
624+
--build \
625+
--up \
626+
--down \
627+
--test=ginkgo \
628+
-- \
629+
--focus-regex='\[NodeConformance\]' \
630+
--skip-regex='\[Environment:NotInUserNS\]|\[Slow\]' \
631+
--parallel=8
632+
```
633+
634+
- Prow manifest: https://github.com/kubernetes/test-infra/blob/aefb999cad82965bd6fb7e3104525fe8d87e434f/config/jobs/kubernetes/sig-testing/kubernetes-kind-ci.yaml#L250-L314
635+
- Logs: https://prow.k8s.io/job-history/gs/kubernetes-ci-logs/logs/ci-kubernetes-e2e-kind-rootless
599636

600637
### Graduation Criteria
601638

@@ -676,11 +713,22 @@ in back-to-back releases.
676713
- Alpha: Basic support for rootless mode on cgroups v2 hosts.
677714

678715
- Beta: e2e tests coverage.
679-
To move to beta, we need clarity if we intend to define two separate types of conformance suites:
680-
- kubernetes clusters that can run privileged workloads
681-
- kubernetes cluster that are restricted to run unprivileged workloads only
716+
The tests are covered by `NodeConformance` tests (see above).
682717
Requirements:
683718
- [the cgroup v2 KEP](../2254-cgroup-v2/) to reach Beta or GA.
719+
Open Source Usage:
720+
- https://github.com/rootless-containers/usernetes/blob/gen2-v20250828.0/kubeadm-config.yaml#L45
721+
- https://github.com/kubernetes-sigs/kind/blob/v0.30.0/pkg/cluster/internal/kubeadm/config.go#L501
722+
- https://github.com/kubernetes/minikube/blob/v1.36.0/cmd/minikube/cmd/start_flags.go#L654
723+
- https://github.com/k3s-io/k3s/blob/v1.33.4%2Bk3s1/pkg/daemons/agent/agent_linux.go#L26
724+
- https://github.com/k3d-io/k3d/blob/v5.8.3/docs/usage/advanced/podman.md?plain=1#L141
725+
- https://github.com/epinio/epinio/blob/v1.12.0/scripts/acceptance-cluster-setup.sh#L92
726+
- https://github.com/lxc/cluster-api-provider-incus/blob/v0.7.0/docs/book/src/explanation/unprivileged-containers.md?plain=1#L23
727+
- https://github.com/NVIDIA/aistore/blob/v1.3.31/deploy/dev/k8s/utils/ci/generate_kind_config.sh#L18
728+
- https://github.com/GoogleCloudPlatform/anthos-samples/blob/8aff62c3f0bd835bda7479a01a591e1849c48fe9/anthos-attached-clusters/kind/main.tf#L37
729+
- https://github.com/GoogleCloudPlatform/cloud-solutions/blob/pino-logging-gcp-config-v1.1.0/projects/k8s-hybrid-neg-controller/hack/kind-cluster-config.yaml#L24
730+
- https://github.com/GoogleCloudPlatform/solutions-workshops/blob/grpc-xds/v0.5.0/grpc-xds/hack/kind-cluster-config-2.yaml#L24
731+
In beta, [`NodeSystemInfo`](https://pkg.go.dev/k8s.io/api/core/v1#NodeSystemInfo) will be updated to include `RunningInUserNS *bool`.
684732

685733
- GA: Assuming no negative user feedback based on production experience, promote after >= 2 releases in beta.
686734
Requirements:
@@ -717,7 +765,8 @@ enhancement:
717765
CRI or CNI may require updating that component before the kubelet.
718766
-->
719767

720-
N/A
768+
N/A.
769+
This KEP only affects the internal of kubelet, and does not affect any API.
721770

722771
## Production Readiness Review Questionnaire
723772

@@ -763,7 +812,7 @@ well as the [existing list] of feature gates.
763812

764813
- [X] Feature gate (also fill in values in `kep.yaml`)
765814
- Feature gate name: `KubeletInUserNamespace`
766-
- Components depending on the feature gate:
815+
- Components depending on the feature gate: kubelet
767816
- [ ] Other
768817
- Describe the mechanism:
769818
- Will enabling / disabling the feature require downtime of the control
@@ -786,7 +835,8 @@ Any change of default behavior may be surprising to users or break existing
786835
automations, so be extremely careful here.
787836
-->
788837

789-
During Alpha, we will document what workloads will work and what will not work.
838+
The limitation is same as Rootless Docker, Podman, etc.
839+
See <https://rootlesscontaine.rs/caveats/>.
790840

791841
###### Can the feature be disabled once it has been enabled (i.e. can we roll back the enablement)?
792842

@@ -801,11 +851,11 @@ feature.
801851
NOTE: Also set `disable-supported` to `true` or `false` in `kep.yaml`.
802852
-->
803853

804-
N/A, as switching back rootless to rootful requires redeploying the kubelet, and vice versa.
854+
Yes, by turning off the feature gate.
805855

806856
###### What happens if we reenable the feature if it was previously rolled back?
807857

808-
N/A.
858+
The rootless functionality is again available in kubelet.
809859

810860
###### Are there any tests for feature enablement/disablement?
811861

@@ -822,17 +872,14 @@ You can take a look at one potential example of such test in:
822872
https://github.com/kubernetes/kubernetes/pull/97058/files#diff-7826f7adbc1996a05ab52e3f5f02429e94b68ce6bce0dc534d1be636154fded3R246-R282
823873
-->
824874

825-
CI will run `kind` (Kubernetes in Docker) tests with Rootless Docker/Podman.
826-
Tests with a real cluster will be added later as well.
875+
Yes. See [Test Plan](#test-plan).
827876

828877
### Rollout, Upgrade and Rollback Planning
829878

830879
<!--
831880
This section must be completed when targeting beta to a release.
832881
-->
833882

834-
This section will be fulfilled when targeting beta graduation to a release.
835-
836883
###### How can a rollout or rollback fail? Can it impact already running workloads?
837884

838885
<!--
@@ -845,13 +892,22 @@ rollout. Similarly, consider large clusters and how enablement/disablement
845892
will rollout across nodes.
846893
-->
847894

895+
Rollout: Rolling out requires recreating a new node instance, in a UserNS.
896+
Typical failures:
897+
- [subuids are not allocated](https://rootlesscontaine.rs/getting-started/common/subuid/)
898+
- [cgroup v2 delegation is not enabled](https://rootlesscontaine.rs/getting-started/common/cgroup2/)
899+
900+
Rollback: this question is not applicable. Rolling back requires recreating a new node instance.
901+
848902
###### What specific metrics should inform a rollback?
849903

850904
<!--
851905
What signals should users be paying attention to when the feature is young
852906
that might indicate a serious problem?
853907
-->
854908

909+
Increase of [`node_collector_unhealthy_nodes_in_zone`](https://kubernetes.io/docs/reference/instrumentation/metrics/).
910+
855911
###### Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested?
856912

857913
<!--
@@ -860,12 +916,16 @@ Longer term, we may want to require automated upgrade/rollback tests, but we
860916
are missing a bunch of machinery and tooling and can't do that now.
861917
-->
862918

919+
This question is not applicable. Rolling out and rolling back requires recreating a new node instance.
920+
863921
###### Is the rollout accompanied by any deprecations and/or removals of features, APIs, fields of API types, flags, etc.?
864922

865923
<!--
866924
Even if applying deprecation policies, they may still surprise some users.
867925
-->
868926

927+
No
928+
869929
### Monitoring Requirements
870930

871931
<!--
@@ -883,7 +943,8 @@ checking if there are objects with field X set) may be a last resort. Avoid
883943
logs or events for this purpose.
884944
-->
885945

886-
N/A
946+
[`NodeSystemInfo`](https://pkg.go.dev/k8s.io/api/core/v1#NodeSystemInfo) will have `RunningInUserNS *bool`
947+
to indicate whether the node is running in UserNS.
887948

888949
###### How can someone using this feature know that it is working for their instance?
889950

@@ -897,9 +958,9 @@ Recall that end users cannot usually observe component logs or access metrics.
897958
-->
898959

899960
- [ ] Events
900-
- Event Reason:
901-
- [ ] API .status
902-
- Condition name:
961+
- Event Reason:
962+
- [X] API .status
963+
- Condition name: [`NodeSystemInfo`](https://pkg.go.dev/k8s.io/api/core/v1#NodeSystemInfo) will have `RunningInUserNS *bool`
903964
- Other field:
904965
- [ ] Other (treat as last resort)
905966
- Details:
@@ -921,20 +982,22 @@ These goals will help you determine what you need to measure (SLIs) in the next
921982
question.
922983
-->
923984

924-
N/A
985+
In default Kubernetes installation with the feature enabled,
986+
99th percentile per cluster-day of `node_collector_unhealthy_nodes_in_zone` <= X
987+
where X depends on the size of the cluster.
925988

926989
###### What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service?
927990

928991
<!--
929992
Pick one more of these and delete the rest.
930993
-->
931994

932-
- [ ] Metrics
933-
- Metric name:
995+
- [X] Metrics
996+
- Metric name: [`node_collector_unhealthy_nodes_in_zone`](https://kubernetes.io/docs/reference/instrumentation/metrics/)
934997
- [Optional] Aggregation method:
935-
- Components exposing the metric:
936-
- [X] Other (treat as last resort)
937-
- Details: Use `systemctl --user is-system-running` to verify whether the processes (RootlessKit, kubelet, kube-proxy, and CRI) are running.
998+
- Components exposing the metric: node-lifecycle-controller
999+
- [ ] Other (treat as last resort)
1000+
- Details:
9381001

9391002
###### Are there any missing metrics that would be useful to have to improve observability of this feature?
9401003

@@ -943,7 +1006,7 @@ Describe the metrics themselves and the reasons why they weren't added (e.g., co
9431006
implementation difficulties, etc.).
9441007
-->
9451008

946-
N/A
1009+
None
9471010

9481011
### Dependencies
9491012

@@ -1060,6 +1123,8 @@ Think about adding additional work or introducing new steps in between
10601123
[existing SLIs/SLOs]: https://git.k8s.io/community/sig-scalability/slos/slos.md#kubernetes-slisslos
10611124
-->
10621125

1126+
No.
1127+
10631128
###### Will enabling / using this feature result in non-negligible increase of resource usage (CPU, RAM, disk, IO, ...) in any components?
10641129

10651130
<!--
@@ -1072,7 +1137,12 @@ This through this both in small and large cases, again with respect to the
10721137
[supported limits]: https://git.k8s.io/community//sig-scalability/configs-and-limits/thresholds.md
10731138
-->
10741139

1075-
RootlessKit and slirp4netns may face high CPU and memory consumption.
1140+
User-mode implementation of TCP/IP (RootlessKit, slirp4netns, paste, etc.) may face high CPU and memory consumption.
1141+
1142+
The "Figure 8: CPU utilization while running iperf3 client" in <https://arxiv.org/pdf/2402.00365> denotes that
1143+
a configuration with RootlessKit for incoming packets and slirp4netns for outgoing packets may face roughly 20% of CPU usage.
1144+
1145+
This issue can be addressed by using `lxc-user-nic` (SETUID helper) or `bypass4netns` (seccomp-based network accelerator).
10761146

10771147
###### Can enabling / using this feature result in resource exhaustion of some node resources (PIDs, sockets, inodes, etc.)?
10781148

@@ -1086,6 +1156,8 @@ Are there any tests that were run/should be run to understand performance charac
10861156
and validate the declared limits?
10871157
-->
10881158

1159+
No
1160+
10891161
### Troubleshooting
10901162

10911163
<!--
@@ -1122,6 +1194,10 @@ Same as traditional rootful Kubernetes.
11221194

11231195
###### What steps should be taken if SLOs are not being met to determine the problem?
11241196

1197+
- Make sure that the supported version of the components are used
1198+
- [Make sure that more than 65536 subuids are allocated](https://rootlesscontaine.rs/getting-started/common/subuid/)
1199+
- [Make sure that cgroup v2 delegation is enabled](https://rootlesscontaine.rs/getting-started/common/cgroup2/)
1200+
11251201
## Implementation History
11261202

11271203
<!--
@@ -1142,6 +1218,8 @@ Major milestones might include:
11421218
- 2019-11-19: @giuseppe submitted [cgroup v2 KEP](https://github.com/kubernetes/enhancements/pull/1370)
11431219
- 2019-11-19: present KEP to SIG-node (cgroup v2 version)
11441220
- 2020-07-07: the cgroup v2 support is in `implementable` status
1221+
- 2021-08-04: Kubernetes v1.22 (Alpha)
1222+
- 2025-12-XX: Kubernetes v1.35 (Beta)
11451223

11461224
## Drawbacks
11471225

0 commit comments

Comments
 (0)