@@ -376,26 +376,11 @@ what we expect, the kubelet needs to chown the file to a UID that is mapped into
376
376
the pod's userns (e.g. UID 65536 in this example), as that is what root inside
377
377
the container is mapped to in the user namespace.
378
378
379
- We tried this before, but several limitations were hit:
379
+ We tried this before, but several limitations were hit. See the
380
+ [ alternatives section] ( #dont-use-idmap-mounts-and-rely-chown-all-the-files-correctly )
381
+ for more details on the limitations we hit.
380
382
381
- * Changing the owner of files in configmaps/secrets/etc was a concern, those
382
- volume type today ignore FsUser setting and changing it to honor them was a
383
- concern for some members of the community due to possibly breaking some
384
- existing behavior.
385
-
386
- * Therefore, we need to rely on ` fsGroup ` , that always mark files readable for
387
- the group. This means this solution can't work when we NEED not to have
388
- permissions for the group (think of ssh keys, for example, that tools enforce no
389
- permission for the group)
390
-
391
- * There are several other files that also need to have the proper permissions,
392
- like: /dev/termination-log, /etc/hosts, /etc/resolv.conf, /etc/hotname, etc.
393
- While it is completely possible to do, it adds more complexity to lot of parts
394
- of the code base and future Kubernetes features will have to take this into
395
- account too. Furthermore, some of these files are created by container runtimes,
396
- so complexity creeps out very easily.
397
-
398
- ##### Example without idmap mounts
383
+ ##### Example with idmap mounts
399
384
400
385
Now let's say the pod is using its volumes that were mounted using idmap mounts
401
386
by the container runtime. All the mappings used (the idmap mounts and the pod
@@ -1043,6 +1028,62 @@ Why should this KEP _not_ be implemented?
1043
1028
Here is a list of considerations raised in PRs discussion that were considered.
1044
1029
This list is not exhaustive.
1045
1030
1031
+ ### Don't use idmap mounts and rely chown all the files correctly
1032
+
1033
+ We explored the idea of not using idmap mounts for stateless pods, and instead
1034
+ make sure we chown each file with the hostID a pod is mapped to (see more
1035
+ details in the [ example section] ( #example-without-idmap-mounts ) of how file
1036
+ access works with userns and without idmap mounts).
1037
+
1038
+ The problems were mostly:
1039
+
1040
+ * Changing the owner of files in configmaps/secrets/etc was a concern, those
1041
+ volume type today ignore FsUser setting and changing it to honor them was a
1042
+ concern for some members of the community due to possibly breaking some
1043
+ existing behavior. See discussions [ here] [ fsgroup-1 ] and [ here] [ fsgroup-2 ]
1044
+
1045
+ * Therefore, we need to rely on ` fsGroup ` , that always mark files readable for
1046
+ the group. This means this solution can't work when we NEED not to have
1047
+ permissions for the group (think of ssh keys, for example, that tools enforce no
1048
+ permission for the group)
1049
+
1050
+ * There are several other files that also need to have the proper permissions,
1051
+ like: /dev/termination-log, /etc/hosts, /etc/resolv.conf, /etc/hotname, etc.
1052
+ While it is completely possible to do, it adds more complexity to lot of parts
1053
+ of the code base and future Kubernetes features will have to take this into
1054
+ account too.
1055
+
1056
+ To exemplify the complexity of the last point, let's see some concrete examples.
1057
+ ` /etc/hosts ` is created by [ ensureHostsFile()] [ ensure-hosts-file ] that doesn't
1058
+ know pod attributes like the mapping. The same happens with
1059
+ [ SetupDNSinContainerizedMounter] [ dns-in-container ] that doesn't know anything
1060
+ else but the path. The same happens with the other files mentioned, all of them
1061
+ and live in different "subsystems" of the kubelet, have very long call chains
1062
+ will need to change to take the mapping or similar. Also, future patches, like
1063
+ https://github.com/kubernetes/kubernetes/pull/108076 to fix a security bug in
1064
+ ` /dev/termination-log ` also will need to be adjusted to take into account the
1065
+ pod mappings.
1066
+
1067
+ If we go this route, while possible, more and more subsystems will have to
1068
+ special-case if the pod uses userns and chown to a specific ID. Not only
1069
+ existing places, but future places that create a file that is mounted will have
1070
+ to know about the mapping of the pod.
1071
+
1072
+ Furthermore, some of these files are
1073
+ [ created by containerruntimes] [ containerd-mounts-files ] , so complexity creeps
1074
+ out very easily.
1075
+
1076
+ Taking into account the 3 points (can't easily create secret/configmap files
1077
+ with a specific owner; fsGroup has lot of limitations; and the complexity of
1078
+ chowing each of these files in the kubelet), this approach was discarded. Future
1079
+ KEPs can explore this path if they so want to.
1080
+
1081
+ [ fsgroup-1 ] : https://github.com/kubernetes/kubernetes/pull/111090#discussion_r934057520
1082
+ [ fsgroup-2 ] : https://github.com/kubernetes/kubernetes/pull/111090#discussion_r935802376
1083
+ [ dns-in-container ] : https://github.com/kubernetes/kubernetes/blob/7a55b76f28eddbbb7abf69038d4bd5abab833b4f/pkg/kubelet/network/dns/dns.go#L452-L479
1084
+ [ ensure-hosts-file ] : https://github.com/kubernetes/kubernetes/blob/7a55b76f28eddbbb7abf69038d4bd5abab833b4f/pkg/kubelet/kubelet_pods.go#L327-L347
1085
+ [ containerd-mounts-files ] : https://github.com/containerd/containerd/blob/3d32da8f607a2a43d7157499254713f42b3c6701/pkg/cri/server/container_create_linux.go#L61
1086
+
1046
1087
### 64k mappings?
1047
1088
1048
1089
We will start with mappings of 64K. Tim Hockin, however, has expressed
0 commit comments