Skip to content

Commit 697e683

Browse files
authored
Merge pull request #26126 from tengqm/zh-trans-pid-limiting
[zh] Localize concepts/policy/pid-limiting.md
2 parents 2f344c8 + 55b68bd commit 697e683

File tree

1 file changed

+233
-0
lines changed

1 file changed

+233
-0
lines changed
Lines changed: 233 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,233 @@
1+
---
2+
title: 进程 ID 约束与预留
3+
content_type: concept
4+
weight: 40
5+
---
6+
7+
<!--
8+
reviewers:
9+
- derekwaynecarr
10+
title: Process ID Limits And Reservations
11+
content_type: concept
12+
weight: 40
13+
-->
14+
15+
<!-- overview -->
16+
17+
{{< feature-state for_k8s_version="v1.20" state="stable" >}}
18+
19+
<!--
20+
Kubernetes allow you to limit the number of process IDs (PIDs) that a {{< glossary_tooltip term_id="Pod" text="Pod" >}} can use.
21+
You can also reserve a number of allocatable PIDs for each {{< glossary_tooltip term_id="node" text="node" >}}
22+
for use by the operating system and daemons (rather than by Pods).
23+
-->
24+
Kubernetes 允许你限制一个 {{< glossary_tooltip term_id="Pod" text="Pod" >}} 中可以使用的
25+
进程 ID(PID)数目。你也可以为每个 {{< glossary_tooltip term_id="node" text="节点" >}}
26+
预留一定数量的可分配的 PID,供操作系统和守护进程(而非 Pod)使用。
27+
28+
<!-- body -->
29+
30+
<!--
31+
Process IDs (PIDs) are a fundamental resource on nodes. It is trivial to hit the
32+
task limit without hitting any other resource limits, which can then cause
33+
instability to a host machine.
34+
-->
35+
进程 ID(PID)是节点上的一种基础资源。很容易就会在尚未超出其它资源约束的时候就
36+
已经触及任务个数上限,进而导致宿主机器不稳定。
37+
38+
<!--
39+
Cluster administrators require mechanisms to ensure that Pods running in the
40+
cluster cannot induce PID exhaustion that prevents host daemons (such as the
41+
{{< glossary_tooltip text="kubelet" term_id="kubelet" >}} or
42+
{{< glossary_tooltip text="kube-proxy" term_id="kube-proxy" >}},
43+
and potentially also the container runtime) from running.
44+
In addition, it is important to ensure that PIDs are limited among Pods in order
45+
to ensure they have limited impact on other workloads on the same node.
46+
-->
47+
集群管理员需要一定的机制来确保集群中运行的 Pod 不会导致 PID 资源枯竭,甚而
48+
造成宿主机上的守护进程(例如
49+
{{< glossary_tooltip text="kubelet" term_id="kubelet" >}} 或者
50+
{{< glossary_tooltip text="kube-proxy" term_id="kube-proxy" >}}
51+
乃至包括容器运行时本身)无法正常运行。
52+
此外,确保 Pod 中 PID 的个数受限对于保证其不会影响到同一节点上其它负载也很重要。
53+
54+
{{< note >}}
55+
<!--
56+
On certain Linux installations, the operating system sets the PIDs limit to a low default,
57+
such as `32768`. Consider raising the value of `/proc/sys/kernel/pid_max`.
58+
-->
59+
在某些 Linux 安装环境中,操作系统会将 PID 约束设置为一个较低的默认值,例如
60+
`32768`。这时可以考虑提升 `/proc/sys/kernel/pid_max` 的设置值。
61+
{{< /note >}}
62+
63+
<!--
64+
You can configure a kubelet to limit the number of PIDs a given Pod can consume.
65+
For example, if your node's host OS is set to use a maximum of `262144` PIDs and
66+
expect to host less than `250` Pods, one can give each Pod a budget of `1000`
67+
PIDs to prevent using up that node's overall number of available PIDs. If the
68+
admin wants to overcommit PIDs similar to CPU or memory, they may do so as well
69+
with some additional risks. Either way, a single Pod will not be able to bring
70+
the whole machine down. This kind of resource limiting helps to prevent simple
71+
fork bombs from affecting operation of an entire cluster.
72+
-->
73+
你可以配置 kubelet 限制给定 Pod 能够使用的 PID 个数。
74+
例如,如果你的节点上的宿主操作系统被设置为最多可使用 `262144` 个 PID,同时预期
75+
节点上会运行的 Pod 个数不会超过 `250`,那么你可以为每个 Pod 设置 `1000` 个 PID
76+
的预算,避免耗尽该节点上可用 PID 的总量。
77+
如果管理员系统像 CPU 或内存那样允许对 PID 进行过量分配(Overcommit),他们也可以
78+
这样做,只是会有一些额外的风险。不管怎样,任何一个 Pod 都不可以将整个机器的运行
79+
状态破坏。这类资源限制有助于避免简单的派生炸弹(Fork
80+
Bomb)影响到整个集群的运行。
81+
82+
<!--
83+
Per-Pod PID limiting allows administrators to protect one Pod from another, but
84+
does not ensure that all Pods scheduled onto that host are unable to impact the node overall.
85+
Per-Pod limiting also does not protect the node agents themselves from PID exhaustion.
86+
87+
You can also reserve an amount of PIDs for node overhead, separate from the
88+
allocation to Pods. This is similar to how you can reserve CPU, memory, or other
89+
resources for use by the operating system and other facilities outside of Pods
90+
and their containers.
91+
-->
92+
在 Pod 级别设置 PID 限制使得管理员能够保护 Pod 之间不会互相伤害,不过无法
93+
确保所有调度到该宿主机器上的所有 Pod 都不会影响到节点整体。
94+
Pod 级别的限制也无法保护节点代理任务自身不会受到 PID 耗尽的影响。
95+
96+
你也可以预留一定量的 PID,作为节点的额外开销,与分配给 Pod 的 PID 集合独立。
97+
这有点类似于在给操作系统和其它设施预留 CPU、内存或其它资源时所做的操作,
98+
这些任务都在 Pod 及其所包含的容器之外运行。
99+
100+
<!--
101+
PID limiting is a an important sibling to [compute
102+
resource](/docs/concepts/configuration/manage-resources-containers/) requests
103+
and limits. However, you specify it in a different way: rather than defining a
104+
Pod's resource limit in the `.spec` for a Pod, you configure the limit as a
105+
setting on the kubelet. Pod-defined PID limits are not currently supported.
106+
-->
107+
PID 限制是与[计算资源](/zh/docs/concepts/configuration/manage-resources-containers/)
108+
请求和限制相辅相成的一种机制。不过,你需要用一种不同的方式来设置这一限制:
109+
你需要将其设置到 kubelet 上而不是在 Pod 的 `.spec` 中为 Pod 设置资源限制。
110+
目前还不支持在 Pod 级别设置 PID 限制。
111+
112+
113+
{{< caution >}}
114+
<!--
115+
This means that the limit that applies to a Pod may be different depending on
116+
where the Pod is scheduled. To make things simple, it's easiest if all Nodes use
117+
the same PID resource limits and reservations.
118+
-->
119+
这意味着,施加在 Pod 之上的限制值可能因为 Pod 运行所在的节点不同而有差别。
120+
为了简化系统,最简单的方法是为所有节点设置相同的 PID 资源限制和预留值。
121+
{{< /caution >}}
122+
123+
<!--
124+
## Node PID limits
125+
126+
Kubernetes allows you to reserve a number of process IDs for the system use. To
127+
configure the reservation, use the parameter `pid=<number>` in the
128+
`--system-reserved` and `--kube-reserved` command line options to the kubelet.
129+
The value you specified declares that the specified number of process IDs will
130+
be reserved for the system as a whole and for Kubernetes system daemons
131+
respectively.
132+
-->
133+
## 节点级别 PID 限制 {#node-pid-limits}
134+
135+
Kubernetes 允许你为系统预留一定量的进程 ID。为了配置预留数量,你可以使用
136+
kubelet 的 `--system-reserved``--kube-reserved` 命令行选项中的参数
137+
`pid=<number>`。你所设置的参数值分别用来声明为整个系统和 Kubernetes 系统
138+
守护进程所保留的进程 ID 数目。
139+
140+
{{< note >}}
141+
<!--
142+
Before Kubernetes version 1.20, PID resource limiting with Node-level
143+
reservations required enabling the [feature
144+
gate](/docs/reference/command-line-tools-reference/feature-gates/)
145+
`SupportNodePidsLimit` to work.
146+
-->
147+
在 Kubernetes 1.20 版本之前,在节点级别通过 PID 资源限制预留 PID 的能力
148+
需要启用[特性门控](/zh/docs/reference/command-line-tools-reference/feature-gates/)
149+
`SupportNodePidsLimit` 才行。
150+
{{< /note >}}
151+
152+
<!--
153+
## Pod PID limits
154+
155+
Kubernetes allows you to limit the number of processes running in a Pod. You
156+
specify this limit at the node level, rather than configuring it as a resource
157+
limit for a particular Pod. Each Node can have a different PID limit.
158+
To configure the limit, you can specify the command line parameter
159+
`--pod-max-pids` to the kubelet, or set `PodPidsLimit` in the kubelet
160+
[configuration file](/docs/tasks/administer-cluster/kubelet-config-file/).
161+
-->
162+
## Pod 级别 PID 限制 {#pod-pid-limits}
163+
164+
Kubernetes 允许你限制 Pod 中运行的进程个数。你可以在节点级别设置这一限制,
165+
而不是为特定的 Pod 来将其设置为资源限制。
166+
每个节点都可以有不同的 PID 限制设置。
167+
要设置限制值,你可以设置 kubelet 的命令行参数 `--pod-max-pids`,或者
168+
在 kubelet 的[配置文件](/zh/docs/tasks/administer-cluster/kubelet-config-file/)
169+
中设置 `PodPidsLimit`
170+
171+
{{< note >}}
172+
<!--
173+
Before Kubernetes version 1.20, PID resource limiting for Pods required enabling
174+
the [feature gate](/docs/reference/command-line-tools-reference/feature-gates/)
175+
`SupportPodPidsLimit` to work.
176+
-->
177+
在 Kubernetes 1.20 版本之前,为 Pod 设置 PID 资源限制的能力需要启用
178+
[特性门控](/zh/docs/reference/command-line-tools-reference/feature-gates/)
179+
`SupportNodePidsLimit` 才行。
180+
{{< /note >}}
181+
182+
<!--
183+
## PID based eviction
184+
185+
You can configure kubelet to start terminating a Pod when it is misbehaving and consuming abnormal amount of resources.
186+
This feature is called eviction. You can [Configure Out of Resource Handling](/docs/tasks/administer-cluster/out-of-resource) for various eviction signals.
187+
Use `pid.available` eviction signal to configure the threshold for number of PIDs used by Pod.
188+
You can set soft and hard eviction policies. However, even with the hard eviction policy, if the number of PIDs growing very fast,
189+
node can still get into unstable state by hitting the node PIDs limit.
190+
Eviction signal value is calculated periodically and does NOT enforce the limit.
191+
-->
192+
## 基于 PID 的驱逐 {#pid-based-eviction}
193+
194+
你可以配置 kubelet 使之在 Pod 行为不正常或者消耗不正常数量资源的时候将其终止。
195+
这一特性称作驱逐。你可以针对不同的驱逐信号
196+
[配置资源不足的处理](/zh/docs/tasks/administer-cluster/out-of-resource)
197+
使用 `pid.available` 驱逐信号来配置 Pod 使用的 PID 个数的阈值。
198+
你可以设置硬性的和软性的驱逐策略。不过,即使使用硬性的驱逐策略,
199+
如果 PID 个数增长过快,节点仍然可能因为触及节点 PID 限制而进入一种不稳定状态。
200+
驱逐信号的取值是周期性计算的,而不是一直能够强制实施约束。
201+
202+
<!--
203+
PID limiting - per Pod and per Node sets the hard limit.
204+
Once the limit is hit, workload will start experiencing failures when trying to get a new PID.
205+
It may or may not lead to rescheduling of a Pod,
206+
depending on how workload reacts on these failures and how liveleness and readiness
207+
probes are configured for the Pod. However, if limits were set correctly,
208+
you can guarantee that other Pods workload and system processes will not run out of PIDs
209+
when one Pod is misbehaving.
210+
-->
211+
Pod 级别和节点级别的 PID 限制会设置硬性限制。
212+
一旦触及限制值,工作负载会在尝试获得新的 PID 时开始遇到问题。
213+
这可能会也可能不会导致 Pod 被重新调度,取决于工作负载如何应对这类失败
214+
以及 Pod 的存活性和就绪态探测是如何配置的。
215+
可是,如果限制值被正确设置,你可以确保其它 Pod 负载和系统进程不会因为某个
216+
Pod 行为不正常而没有 PID 可用。
217+
218+
## {{% heading "whatsnext" %}}
219+
220+
<!--
221+
- Refer to the [PID Limiting enhancement document](https://github.com/kubernetes/enhancements/blob/097b4d8276bc9564e56adf72505d43ce9bc5e9e8/keps/sig-node/20190129-pid-limiting.md) for more information.
222+
- For historical context, read [Process ID Limiting for Stability Improvements in Kubernetes 1.14](/blog/2019/04/15/process-id-limiting-for-stability-improvements-in-kubernetes-1.14/).
223+
- Read [Managing Resources for Containers](/docs/concepts/configuration/manage-resources-containers/).
224+
- Learn how to [Configure Out of Resource Handling](/docs/tasks/administer-cluster/out-of-resource).
225+
-->
226+
- 参阅 [PID 约束改进文档](https://github.com/kubernetes/enhancements/blob/097b4d8276bc9564e56adf72505d43ce9bc5e9e8/keps/sig-node/20190129-pid-limiting.md)
227+
以了解更多信息。
228+
- 关于历史背景,请阅读
229+
[Kubernetes 1.14 中限制进程 ID 以提升稳定性](/blog/2019/04/15/process-id-limiting-for-stability-improvements-in-kubernetes-1.14/)
230+
的博文。
231+
- 请阅读[为容器管理资源](/zh/docs/concepts/configuration/manage-resources-containers/)
232+
- 学习如何[配置资源不足情况的处理](/zh/docs/tasks/administer-cluster/out-of-resource)
233+

0 commit comments

Comments
 (0)