Skip to content

Commit eacc6e4

Browse files
committed
[zh-cn] sync blog: 2023-09-13-userns-stateful-pods
Signed-off-by: xin.li <[email protected]>
1 parent eb1ead1 commit eacc6e4

File tree

1 file changed

+291
-0
lines changed
  • content/zh-cn/blog/_posts/2023-09-13-userns-stateful-pods

1 file changed

+291
-0
lines changed
Lines changed: 291 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,291 @@
1+
---
2+
layout: blog
3+
title: "用户命名空间:对运行有状态 Pod 的支持进入 Alpha 阶段!"
4+
date: 2023-09-13
5+
slug: userns-alpha
6+
---
7+
8+
<!--
9+
layout: blog
10+
title: "User Namespaces: Now Supports Running Stateful Pods in Alpha!"
11+
date: 2023-09-13
12+
slug: userns-alpha
13+
-->
14+
15+
<!--
16+
**Authors:** Rodrigo Campos Catelin (Microsoft), Giuseppe Scrivano (Red Hat), Sascha Grunert (Red Hat)
17+
-->
18+
**作者:** Rodrigo Campos Catelin (Microsoft), Giuseppe Scrivano (Red Hat), Sascha Grunert (Red Hat)
19+
20+
**译者:** Xin Li (DaoCloud)
21+
22+
<!--
23+
Kubernetes v1.25 introduced support for user namespaces for only stateless
24+
pods. Kubernetes 1.28 lifted that restriction, after some design changes were
25+
done in 1.27.
26+
-->
27+
Kubernetes v1.25 引入用户命名空间(User Namespace)特性,仅支持无状态(Stateless)Pod。
28+
Kubernetes 1.28 在 1.27 的基础上中进行了一些改进后,取消了这一限制。
29+
30+
<!--
31+
The beauty of this feature is that:
32+
* it is trivial to adopt (you just need to set a bool in the pod spec)
33+
* doesn't need any changes for **most** applications
34+
* improves security by _drastically_ enhancing the isolation of containers and
35+
mitigating CVEs rated HIGH and CRITICAL.
36+
-->
37+
此特性的精妙之处在于:
38+
39+
* 使用起来很简单(只需在 Pod 规约(spec)中设置一个 bool)
40+
* **大多数**应用程序不需要任何更改
41+
* 通过**大幅度**加强容器的隔离性以及应对评级为高(HIGH)和关键(CRITICAL)的 CVE 来提高安全性。
42+
43+
<!--
44+
This post explains the basics of user namespaces and also shows:
45+
* the changes that arrived in the recent Kubernetes v1.28 release
46+
* a **demo of a vulnerability rated as HIGH** that is not exploitable with user namespaces
47+
* the runtime requirements to use this feature
48+
* what you can expect in future releases regarding user namespaces.
49+
-->
50+
这篇文章介绍了用户命名空间的基础知识,并展示了:
51+
52+
* 最近的 Kubernetes v1.28 版本中出现的变化
53+
* 一个评级为**高(HIGH)的漏洞的演示(Demo)**,该漏洞无法在用户命名空间中被利用
54+
* 使用此特性的运行时要求
55+
* 关于用户命名空间的未来版本中可以期待的内容
56+
57+
<!--
58+
## What is a user namespace?
59+
60+
A user namespace is a Linux feature that isolates the user and group identifiers
61+
(UIDs and GIDs) of the containers from the ones on the host. The indentifiers
62+
in the container can be mapped to indentifiers on the host in a way where the
63+
host UID/GIDs used for different containers never overlap. Even more, the
64+
identifiers can be mapped to *unprivileged* non-overlapping UIDs and GIDs on the
65+
host. This basically means two things:
66+
-->
67+
## 用户命名空间是什么?
68+
69+
用户命名空间是 Linux 的一项特性,它将容器的用户和组标识符(UID 和 GID)与宿主机上的标识符隔离开来。
70+
容器中的标识符可以映射到宿主机上的标识符,其中用于不同容器的主机 UID/GID 从不重叠。
71+
更重要的是,标识符可以映射到宿主机上的**非特权**、非重叠的 UID 和 GID。这基本上意味着两件事:
72+
73+
<!--
74+
* As the UIDs and GIDs for different containers are mapped to different UIDs
75+
and GIDs on the host, containers have a harder time to attack each other even
76+
if they escape the container boundaries. For example, if container A is running
77+
with different UIDs and GIDs on the host than container B, the operations it
78+
can do on container B's files and process are limited: only read/write what a
79+
file allows to others, as it will never have permission for the owner or
80+
group (the UIDs/GIDs on the host are guaranteed to be different for
81+
different containers).
82+
-->
83+
* 由于不同容器的 UID 和 GID 映射到宿主机上不同的 UID 和 GID,因此即使它们逃逸出了容器的边界,也很难相互攻击。
84+
例如,如果容器 A 在宿主机上使用与容器 B 不同的 UID 和 GID 运行,则它可以对容器 B
85+
的文件和进程执行的操作受到限制:只能读/写允许其他人使用的文件,
86+
因为它永远不会拥有所有者或组的权限(宿主机上的 UID/GID 保证对于不同的容器是不同的)。
87+
88+
<!--
89+
* As the UIDs and GIDs are mapped to unprivileged users on the host, if a
90+
container escapes the container boundaries, even if it is running as root
91+
inside the container, it has no privileges on the host. This greatly
92+
protects what host files it can read/write, which process it can send signals
93+
to, etc.
94+
95+
Furthermore, capabilities granted are only valid inside the user namespace and
96+
not on the host.
97+
-->
98+
* 由于 UID 和 GID 映射到宿主机上的非特权用户,如果容器逃逸出了容器边界,
99+
即使它在容器内以 root 身份运行,它在宿主机上也没有特权。
100+
这极大地保护了它可以读/写哪些宿主机文件、可以向哪个进程发送信号等。
101+
102+
此外,所授予的权能(Capability)仅在用户命名空间内有效,而在宿主机上无效。
103+
104+
<!--
105+
Without using a user namespace a container running as root, in the case of a
106+
container breakout, has root privileges on the node. And if some capabilities
107+
were granted to the container, the capabilities are valid on the host too. None
108+
of this is true when using user namespaces (modulo bugs, of course 🙂).
109+
-->
110+
在不使用用户命名空间的情况下,以 root 身份运行的容器在发生逃逸的情况下会获得节点上的
111+
root 权限。如果某些权能被授予容器,那么这些权能在主机上也有效。
112+
当使用用户命名空间时,这些情况都会被避免(当然,除非存在漏洞 🙂)。
113+
114+
<!--
115+
## Changes in 1.28
116+
117+
As already mentioned, starting from 1.28, Kubernetes supports user namespaces
118+
with stateful pods. This means that pods with user namespaces can use any type
119+
of volume, they are no longer limited to only some volume types as before.
120+
-->
121+
## 1.28 版本的变化
122+
123+
正如之前提到的,从 1.28 版本开始,Kubernetes 支持有状态的 Pod 的用户命名空间。
124+
这意味着具有用户命名空间的 Pod 可以使用任何类型的卷,不再仅限于以前的部分卷类型。
125+
126+
<!--
127+
The feature gate to activate this feature was renamed, it is no longer
128+
`UserNamespacesStatelessPodsSupport` but from 1.28 onwards you should use
129+
`UserNamespacesSupport`. There were many changes done and the requirements on
130+
the node hosts changed. So with Kubernetes 1.28 the feature flag was renamed to
131+
reflect this.
132+
-->
133+
从 1.28 版本开始,用于激活此特性的特性门控已被重命名,不再是 `UserNamespacesStatelessPodsSupport`
134+
而应该使用 `UserNamespacesSupport`。此特性经历了许多更改,
135+
对节点主机的要求也发生了变化。因此,Kubernetes 1.28 版本将该特性标志重命名以反映这一变化。
136+
137+
<!--
138+
## Demo
139+
140+
Rodrigo created a demo which exploits [CVE 2022-0492][cve-link] and shows how
141+
the exploit can occur without user namespaces. He also shows how it is not
142+
possible to use this exploit from a Pod where the containers are using this
143+
feature.
144+
-->
145+
## 演示
146+
147+
Rodrigo 创建了一个利用 [CVE 2022-0492][cve-link] 的演示,
148+
用以展现如何在没有用户命名空间的情况下利用该漏洞。
149+
他还展示了在容器使用了此特性的 Pod 中无法利用此漏洞的情况。
150+
151+
<!--
152+
This vulnerability is rated **HIGH** and allows **a container with no special
153+
privileges to read/write to any path on the host** and launch processes as root
154+
on the host too.
155+
156+
{{< youtube id="M4a2b4KkXN8" title="Mitigation of CVE-2022-0492 on Kubernetes by enabling User Namespace support">}}
157+
-->
158+
此漏洞被评为高危,允许一个没有特殊特权的容器读/写宿主机上的任何路径,并在宿主机上以 root 身份启动进程。
159+
160+
{{< youtube id="M4a2b4KkXN8" title="Mitigation of CVE-2022-0492 on Kubernetes by enabling User Namespace support">}}
161+
162+
<!--
163+
Most applications in containers run as root today, or as a semi-predictable
164+
non-root user (user ID 65534 is a somewhat popular choice). When you run a Pod
165+
with containers using a userns, Kubernetes runs those containers as unprivileged
166+
users, with no changes needed in your app.
167+
-->
168+
如今,容器中的大多数应用程序都以 root 身份运行,或者以半可预测的非 root
169+
用户身份运行(用户 ID 65534 是一个比较流行的选择)。
170+
当你运行某个 Pod,而其中带有使用用户名命名空间(userns)的容器时,Kubernetes
171+
以非特权用户身份运行这些容器,无需在你的应用程序中进行任何更改。
172+
173+
<!--
174+
This means two containers running as user 65534 will effectively be mapped to
175+
different users on the host, limiting what they can do to each other in case of
176+
an escape, and if they are running as root, the privileges on the host are
177+
reduced to the one of an unprivileged user.
178+
179+
[cve-link]: https://unit42.paloaltonetworks.com/cve-2022-0492-cgroups/
180+
-->
181+
这意味着两个以用户 65534 身份运行的容器实际上会被映射到宿主机上的不同用户,
182+
从而限制了它们在发生逃逸的情况下能够对彼此执行的操作,如果它们以 root 身份运行,
183+
宿主机上的特权也会降低到非特权用户的权限。
184+
185+
[cve-link]: https://unit42.paloaltonetworks.com/cve-2022-0492-cgroups/
186+
187+
<!--
188+
## Node system requirements
189+
190+
There are requirements on the Linux kernel version as well as the container
191+
runtime to use this feature.
192+
-->
193+
## 节点系统要求
194+
195+
要使用此功能,对 Linux 内核版本以及容器运行时有一定要求。
196+
197+
<!--
198+
On Linux you need Linux 6.3 or greater. This is because the feature relies on a
199+
kernel feature named idmap mounts, and support to use idmap mounts with tmpfs
200+
was merged in Linux 6.3.
201+
202+
If you are using CRI-O with crun, this is [supported in CRI-O
203+
1.28.1][CRIO-release] and crun 1.9 or greater. If you are using CRI-O with runc,
204+
this is still not supported.
205+
-->
206+
在 Linux上,你需要 Linux 6.3 或更高版本。这是因为该特性依赖于一个名为
207+
idmap mounts 的内核特性,而 Linux 6.3 中合并了针对 tmpfs 使用 idmap mounts 的支持
208+
209+
如果你使用 CRI-O 与 crun,这一特性在 [CRI-O 1.28.1][CRIO-release] 和 crun 1.9 或更高版本中受支持。
210+
如果你使用 CRI-O 与 runc,目前仍不受支持。
211+
212+
<!--
213+
containerd support is currently targeted for containerd 2.0; it is likely that
214+
it won't matter if you use it with crun or runc.
215+
216+
Please note that containerd 1.7 added _experimental_ support for user
217+
namespaces as implemented in Kubernetes 1.25 and 1.26. The redesign done in 1.27
218+
is not supported by containerd 1.7, therefore it only works, in terms of user
219+
namespaces support, with Kubernetes 1.25 and 1.26.
220+
-->
221+
containerd 对此的支持目前设定的目标是 containerd 2.0;不管你是否与 crun 或 runc 一起使用,或许都不重要。
222+
223+
请注意,containerd 1.7 添加了对用户命名空间的实验性支持,正如在 Kubernetes 1.25
224+
和 1.26 中实现的那样。1.27 版本中进行的重新设计不受 containerd 1.7 支持,
225+
因此它在用户命名空间支持方面仅适用于 Kubernetes 1.25 和 1.26。
226+
227+
<!--
228+
One limitation present in containerd 1.7 is that it needs to change the
229+
ownership of every file and directory inside the container image, during Pod
230+
startup. This means it has a storage overhead and can significantly impact the
231+
container startup latency. Containerd 2.0 will probably include a implementation
232+
that will eliminate the startup latency added and the storage overhead. Take
233+
this into account if you plan to use containerd 1.7 with user namespaces in
234+
production.
235+
236+
None of these containerd limitations apply to [CRI-O 1.28][CRIO-release].
237+
238+
[CRIO-release]: https://github.com/cri-o/cri-o/releases/tag/v1.28.1
239+
-->
240+
containerd 1.7 存在的一个限制是,在 Pod 启动期间需要更改容器镜像中每个文件和目录的所有权。
241+
这意味着它具有存储开销,并且可能会显著影响容器启动延迟。containerd 2.0
242+
可能会包括一个实现,可以消除增加的启动延迟和存储开销。如果计划在生产中使用
243+
containerd 1.7 与用户命名空间,请考虑这一点。
244+
245+
这些 Containerd 限制均不适用于 [CRI-O 1.28][CRIO 版本]
246+
247+
[CRIO-release]: https://github.com/cri-o/cri-o/releases/tag/v1.28.1
248+
249+
<!--
250+
## What’s next?
251+
252+
Looking ahead to Kubernetes 1.29, the plan is to work with SIG Auth to integrate user
253+
namespaces to Pod Security Standards (PSS) and the Pod Security Admission. For
254+
the time being, the plan is to relax checks in PSS policies when user namespaces are
255+
in use. This means that the fields `spec[.*].securityContext` `runAsUser`,
256+
`runAsNonRoot`, `allowPrivilegeEscalation` and `capabilities` will not trigger a
257+
violation if user namespaces are in use. The behavior will probably be controlled by
258+
utilizing a API Server feature gate, like `UserNamespacesPodSecurityStandards`
259+
or similar.
260+
-->
261+
## 接下来?
262+
263+
展望 Kubernetes 1.29,计划是与 SIG Auth 合作,将用户命名空间集成到 Pod 安全标准(PSS)和 Pod 安全准入中。
264+
目前的计划是在使用用户命名空间时放宽 Pod 安全标准(PSS)策略中的检查。这意味着如果使用用户命名空间,那么字段
265+
`spec[.*].securityContext``runAsUser``runAsNonRoot``allowPrivilegeEscalation和capabilities`
266+
将不会触发违规,此行为可能会通过使用 API Server 特性门控来控制,比如 `UserNamespacesPodSecurityStandards` 或其他类似的。
267+
268+
<!--
269+
## How do I get involved?
270+
271+
You can reach SIG Node by several means:
272+
- Slack: [#sig-node](https://kubernetes.slack.com/messages/sig-node)
273+
- [Mailing list](https://groups.google.com/forum/#!forum/kubernetes-sig-node)
274+
- [Open Community Issues/PRs](https://github.com/kubernetes/community/labels/sig%2Fnode)
275+
276+
You can also contact us directly:
277+
- GitHub: @rata @giuseppe @saschagrunert
278+
- Slack: @rata @giuseppe @sascha
279+
-->
280+
## 我该如何参与?
281+
282+
你可以通过以下方式与 SIG Node 联系:
283+
284+
- Slack:[#sig-node](https://kubernetes.slack.com/messages/sig-node)
285+
- [Mailing list](https://groups.google.com/forum/#!forum/kubernetes-sig-node)
286+
- [Open Community Issues/PRs](https://github.com/kubernetes/community/labels/sig%2Fnode)
287+
288+
你还可以直接联系我们:
289+
290+
- GitHub:@rata @giuseppe @saschagrunert
291+
- Slack:@rata @giuseppe @sascha

0 commit comments

Comments
 (0)