You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Some volume types don't have support for idmapped mounts, like [raw block devices](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.26/#volumedevice-v1-core).
483
+
If a pod runs with such a volume type and a user namespace, the kubelet will fail to create the pod.
@@ -493,6 +499,8 @@ For `baseline` and `restricted` namespaces, if a pod has `hostUsers` set to fals
493
499
For `baseline` namespaces, pods with `hostUsers` set to false can set any value for the `capabilities.add` field,
494
500
whereas normally in a `baseline` namespace a pod is restricted to adding certain capabilities.
495
501
502
+
Finally, for `restricted` namespaces, `hostUsers` will be required to be set to `false`.
503
+
496
504
The validation for capabilities can be relaxed in a `baseline` pod because capabilities
497
505
are user namespaced in the linux kernel, and any pod does not have a seccomp profile (as baseline
498
506
pods may not be required to, depending on the kubelet's `seccompDefault` configuration field)
@@ -892,14 +900,13 @@ When a pod hits this error returned by the kubelet, the status in `kubectl` is s
892
900
Warning FailedCreatePodSandBox 12s (x23 over 5m6s) kubelet Failed to create pod sandbox: user namespaces is not supported by the runtime
893
901
```
894
902
895
-
The following kubelet metrics are useful to check:
896
-
-`kubelet_running_pods`: Shows the actual number of pods running
897
-
-`kubelet_desired_pods`: The number of pods the kubelet is _trying_ to run
903
+
The following kubelet metrics will be added
904
+
-`started_user_namespaced_pods_total`: Shows the number of pods that have been attempted to be created with a user namespace.
905
+
-`started_user_namespaced_pods_errors_total`: The number of pods that failed to create that had a user namespace.
898
906
899
-
If these metrics are very different, it means there are desired pods that can't be set to running.
900
-
If that is the case, checking the pod events to see if they are failing for user namespaces reasons
901
-
(like the errors shown in this KEP) is advised, in which case it is recommended to rollback or
902
-
disable the feature gate.
907
+
If the kubelet metric `started_user_namespaced_pods_errors_total` has a value close to `started_user_namespaced_pods_total`
908
+
it means most of pods with userns started are failing. If that is the case, checking the pod events to see if they are failing for user namespaces reasons
909
+
(like the errors shown in this KEP) is advised, in which case it is recommended to rollback or disable the feature gate.
903
910
904
911
<!--
905
912
What signals should users be paying attention to when the feature is young
@@ -975,9 +982,7 @@ Recall that end users cannot usually observe component logs or access metrics.
975
982
- Condition name:
976
983
- Other field:
977
984
-[x] Other (treat as last resort)
978
-
- Details: check pods with pod.spec.hostUsers field set to false, and see if they are in RUNNING
979
-
state. Exec into a container and run `cat /proc/self/uid_map` to verify that the mappings are different
980
-
than the mappings on the host.
985
+
- Details: `started_user_namespaced_pods_total` metric is greater than `started_user_namespaced_pods_errors_total` for a given node.
981
986
982
987
###### What are the reasonable SLOs (Service Level Objectives) for the enhancement?
983
988
@@ -1018,16 +1023,8 @@ Pick one more of these and delete the rest.
1018
1023
1019
1024
###### Are there any missing metrics that would be useful to have to improve observability of this feature?
1020
1025
1021
-
No.
1022
-
1023
-
This feature is using yet another namespace when creating a pod. If the pod creation fails (by
1024
-
an error on the kubelet or returned by the container runtime), a clear error is returned to the
1025
-
user. The feedback on this is very direct to the user actions.
1026
-
1027
-
A metric like "errors returned in pods with user namespaces enabled" can be very noisy, as the error
1028
-
can be completely unrelated (image pull secret errors, configmap referenced and not defined, any
1029
-
other container runtime error, etc.). We can't see any metric that can be helpful, as the user has a
1030
-
very direct feedback already.
1026
+
Yes, two metrics will be added: `started_user_namespaced_pods_total` and `started_user_namespaced_pods_errors_total`.
1027
+
If error == total for a given node, then there is a problem on that node with user namespace creation.
1031
1028
1032
1029
<!--
1033
1030
Describe the metrics themselves and the reasons why they weren't added (e.g., cost,
@@ -1072,64 +1069,22 @@ and creating new ones, as well as about cluster-level services (e.g. DNS):
1072
1069
1073
1070
### Scalability
1074
1071
1075
-
<!--
1076
-
For alpha, this section is encouraged: reviewers should consider these questions
1077
-
and attempt to answer them.
1078
-
1079
-
For beta, this section is required: reviewers must answer these questions.
1080
-
1081
-
For GA, this section is required: approvers should be able to confirm the
1082
-
previous answers based on experience in the field.
1083
-
-->
1084
-
1085
1072
###### Will enabling / using this feature result in any new API calls?
0 commit comments