@@ -6,12 +6,12 @@ weight: 20
6
6
7
7
{{% capture overview %}}
8
8
9
- {{< feature-state for_k8s_version="v1.16 " state="alpha " >}}
9
+ {{< feature-state for_k8s_version="v1.18 " state="beta " >}}
10
10
11
11
<!--
12
12
When you run a Pod on a Node, the Pod itself takes an amount of system resources. These
13
13
resources are additional to the resources needed to run the container(s) inside the Pod.
14
- _Pod Overhead_ is a feature for accounting for the resources consumed by the pod infrastructure
14
+ _Pod Overhead_ is a feature for accounting for the resources consumed by the Pod infrastructure
15
15
on top of the container requests & limits.
16
16
-->
17
17
@@ -30,63 +30,279 @@ _POD 开销_ 是一个特性,用于计算 Pod 基础设施在容器请求和
30
30
## Pod 开销
31
31
32
32
<!--
33
- In Kubernetes, the pod 's overhead is set at
33
+ In Kubernetes, the Pod 's overhead is set at
34
34
[admission](/docs/reference/access-authn-authz/extensible-admission-controllers/#what-are-admission-webhooks)
35
- time according to the overhead associated with the pod 's
35
+ time according to the overhead associated with the Pod 's
36
36
[RuntimeClass](/docs/concepts/containers/runtime-class/).
37
37
-->
38
38
39
- 在 Kubernetes 中,Pod 的开销是根据与 Pod 的 [ RuntimeClass] ( /docs/concepts/containers/runtime-class/ ) 相关联的开销在[ 准入] ( /docs/reference/access-authn-authz/extensible-admission-controllers/#what-are-admission-webhooks ) 时设置的。
39
+ 在 Kubernetes 中,Pod 的开销是根据与 Pod 的 [ RuntimeClass] ( /docs/concepts/containers/runtime-class/ ) 相关联的开销在
40
+ [ 准入] ( /docs/reference/access-authn-authz/extensible-admission-controllers/#what-are-admission-webhooks ) 时设置的。
40
41
41
42
<!--
42
43
When Pod Overhead is enabled, the overhead is considered in addition to the sum of container
43
- resource requests when scheduling a pod . Similarly, Kubelet will include the pod overhead when sizing
44
- the pod cgroup, and when carrying out pod eviction ranking.
44
+ resource requests when scheduling a Pod . Similarly, Kubelet will include the Pod overhead when sizing
45
+ the Pod cgroup, and when carrying out Pod eviction ranking.
45
46
-->
46
- 当启用 Pod 开销时,在调度 Pod 时,除了考虑容器资源请求的总和外,还要考虑 Pod 开销。类似地,Kubelet 将在确定 pod cgroup 的大小和执行 Pod 驱逐排序时包含 Pod 开销。
47
+ 当启用 Pod 开销时,在调度 Pod 时,除了考虑容器资源请求的总和外,还要考虑 Pod 开销。类似地,Kubelet 将在确定 Pod cgroup 的大小和执行 Pod 驱逐排序时包含 Pod 开销。
47
48
48
49
<!--
49
- ### Set Up
50
+ ## Enabling Pod Overhead {#set-up}
50
51
-->
51
- ### 设置
52
+ ## 启用 Pod 开销 {#set-up}
52
53
53
54
<!--
54
55
You need to make sure that the `PodOverhead`
55
- [feature gate](/docs/reference/command-line-tools-reference/feature-gates/) is enabled (it is off by default)
56
- across your cluster. This means:
56
+ [feature gate](/docs/reference/command-line-tools-reference/feature-gates/) is enabled (it is on by default as of 1.18 )
57
+ across your cluster, and a `RuntimeClass` is utilized which defines the `overhead` field.
57
58
-->
58
- 您需要确保在集群中启用了 ` PodOverhead ` [ 特性门] ( /docs/reference/command-line-tools-reference/feature-gates/ ) (默认情况下是关闭的)。这意味着:
59
+ 您需要确保在集群中启用了 ` PodOverhead ` [ 特性门] ( /docs/reference/command-line-tools-reference/feature-gates/ ) (在 1.18 默认是开启的),以及一个用于定义 ` overhead ` 字段的 ` RuntimeClass ` 。
60
+
61
+ <!--
62
+ ## Usage example
63
+ -->
64
+ ## 使用示例
65
+
66
+ <!--
67
+ To use the PodOverhead feature, you need a RuntimeClass that defines the `overhead` field. As
68
+ an example, you could use the following RuntimeClass definition with a virtualizing container runtime
69
+ that uses around 120MiB per Pod for the virtual machine and the guest OS:
70
+ -->
71
+ 要使用 PodOverhead 特性,需要一个定义 ` overhead ` 字段的 RuntimeClass. 作为例子,可以在虚拟机和来宾操作系统中通过一个虚拟化容器运行时来定义 RuntimeClass 如下,其中每个 Pod 大约使用 120MiB:
72
+
73
+ ``` yaml
74
+ ---
75
+ kind : RuntimeClass
76
+ apiVersion : node.k8s.io/v1beta1
77
+ metadata :
78
+ name : kata-fc
79
+ handler : kata-fc
80
+ overhead :
81
+ podFixed :
82
+ memory : " 120Mi"
83
+ cpu : " 250m"
84
+ ` ` `
85
+
86
+ <!--
87
+ Workloads which are created which specify the ` kata-fc` RuntimeClass handler will take the memory and
88
+ cpu overheads into account for resource quota calculations, node scheduling, as well as Pod cgroup sizing.
89
+
90
+ Consider running the given example workload, test-pod :
91
+ -->
92
+ 通过指定 `kata-fc` RuntimeClass 处理程序创建的工作负载会将内存和 cpu 开销计入资源配额计算、节点调度以及 Pod cgroup 分级。
93
+
94
+ 假设我们运行下面给出的工作负载示例 test-pod :
95
+
96
+ ` ` ` yaml
97
+ apiVersion: v1
98
+ kind: Pod
99
+ metadata:
100
+ name: test-pod
101
+ spec:
102
+ runtimeClassName: kata-fc
103
+ containers:
104
+ - name: busybox-ctr
105
+ image: busybox
106
+ stdin: true
107
+ tty: true
108
+ resources:
109
+ limits:
110
+ cpu: 500m
111
+ memory: 100Mi
112
+ - name: nginx-ctr
113
+ image: nginx
114
+ resources:
115
+ limits:
116
+ cpu: 1500m
117
+ memory: 100Mi
118
+ ` ` `
119
+
120
+ <!--
121
+ At admission time the RuntimeClass [admission controller](https://kubernetes.io/docs/reference/access-authn-authz/admission-controllers/)
122
+ updates the workload's PodSpec to include the `overhead` as described in the RuntimeClass. If the PodSpec already has this field defined,
123
+ the Pod will be rejected. In the given example, since only the RuntimeClass name is specified, the admission controller mutates the Pod
124
+ to include an `overhead`.
125
+ -->
126
+ 在准入阶段 RuntimeClass [准入控制器](https://kubernetes.io/docs/reference/access-authn-authz/admission-controllers/) 更新工作负载的 PodSpec 以包含
127
+ RuntimeClass 中定义的 `overhead`. 如果 PodSpec 中该字段已定义,该 Pod 将会被拒绝。在这个例子中,由于只指定了 RuntimeClass 名称,所以准入控制器更新了 Pod, 包含了一个 `overhead`.
128
+
129
+ <!--
130
+ After the RuntimeClass admission controller, you can check the updated PodSpec :
131
+ -->
132
+ 在 RuntimeClass 准入控制器之后,可以检验一下已更新的 PodSpec :
133
+
134
+ ` ` ` bash
135
+ kubectl get pod test-pod -o jsonpath='{.spec.overhead}'
136
+ ` ` `
137
+
138
+ <!--
139
+ The output is :
140
+ -->
141
+ 输出:
142
+ ```
143
+ map[ cpu:250m memory:120Mi]
144
+ ```
145
+
146
+ <!--
147
+ If a ResourceQuota is defined, the sum of container requests as well as the
148
+ `overhead` field are counted.
149
+ -->
150
+ 如果定义了 ResourceQuata, 则容器请求的总量以及 `overhead` 字段都将计算在内。
151
+
152
+ <!--
153
+ When the kube-scheduler is deciding which node should run a new Pod, the scheduler considers that Pod's
154
+ `overhead` as well as the sum of container requests for that Pod. For this example, the scheduler adds the
155
+ requests and the overhead, then looks for a node that has 2.25 CPU and 320 MiB of memory available.
156
+ -->
157
+ 当 kube-scheduler 决定在哪一个节点调度运行新的 Pod 时,调度器会兼顾该 Pod 的 `overhead` 以及该 Pod 的容器请求总量。在这个示例中,调度器将资源请求和开销相加,然后寻找具备 2.25 CPU 和 320 MiB 内存可用的节点。
158
+
159
+ <!--
160
+ Once a Pod is scheduled to a node, the kubelet on that node creates a new {{< glossary_tooltip text="cgroup" term_id="cgroup" >}}
161
+ for the Pod. It is within this pod that the underlying container runtime will create containers. -->
162
+ 一旦 Pod 调度到了某个节点, 该节点上的 kubelet 将为该 Pod 新建一个 {{< glossary_tooltip text="cgroup" term_id="cgroup" >}}. 底层容器运行时将在这个 pod 中创建容器。
163
+
164
+ <!--
165
+ If the resource has a limit defined for each container (Guaranteed QoS or Bustrable QoS with limits defined),
166
+ the kubelet will set an upper limit for the pod cgroup associated with that resource (cpu.cfs_quota_us for CPU
167
+ and memory.limit_in_bytes memory). This upper limit is based on the sum of the container limits plus the `overhead`
168
+ defined in the PodSpec.
169
+ -->
170
+ 如果该资源对每一个容器都定义了一个限制(定义了受限的 Guaranteed QoS 或者 Bustrable QoS),kubelet 会为与该资源(CPU 的 cpu.cfs_quota_us 以及内存的 memory.limit_in_bytes)
171
+ 相关的 pod cgroup 设定一个上限。该上限基于容器限制总量与 PodSpec 中定义的 `overhead` 之和。
172
+
173
+ <!--
174
+ For CPU, if the Pod is Guaranteed or Burstable QoS, the kubelet will set `cpu.shares` based on the sum of container
175
+ requests plus the `overhead` defined in the PodSpec.
176
+ -->
177
+ 对于 CPU, 如果 Pod 的 QoS 是 Guaranteed 或者 Burstable, kubelet 会基于容器请求总量与 PodSpec 中定义的 `overhead` 之和设置 `cpu.shares`.
178
+
179
+ <!--
180
+ Looking at our example, verify the container requests for the workload:
181
+ -->
182
+ 请看这个例子,验证工作负载的容器请求:
183
+ ```bash
184
+ kubectl get pod test-pod -o jsonpath='{.spec.containers[*].resources.limits}'
185
+ ```
186
+
187
+ <!--
188
+ The total container requests are 2000m CPU and 200MiB of memory:
189
+ -->
190
+ 容器请求总计 2000m CPU 和 200MiB 内存:
191
+ ```
192
+ map[cpu: 500m memory:100Mi] map[cpu:1500m memory:100Mi]
193
+ ```
194
+
195
+ <!--
196
+ Check this against what is observed by the node:
197
+ -->
198
+ 对照从节点观察到的情况来检查一下:
199
+ ``` bash
200
+ kubectl describe node | grep test-pod -B2
201
+ ```
202
+
203
+ <!--
204
+ The output shows 2250m CPU and 320MiB of memory are requested, which includes PodOverhead:
205
+ -->
206
+ 该输出显示请求了 2250m CPU 以及 320MiB 内存,包含了 PodOverhead 在内:
207
+ ```
208
+ Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits AGE
209
+ --------- ---- ------------ ---------- --------------- ------------- ---
210
+ default test-pod 2250m (56%) 2250m (56%) 320Mi (1%) 320Mi (1%) 36m
211
+ ```
212
+
213
+ <!--
214
+ ## Verify Pod cgroup limits
215
+ -->
216
+ ## 验证 Pod cgroup 限制
217
+
218
+ <!--
219
+ Check the Pod's memory cgroups on the node where the workload is running. In the following example, [`crictl`](https://github.com/kubernetes-sigs/cri-tools/blob/master/docs/crictl.md)
220
+ is used on the node, which provides a CLI for CRI-compatible container runtimes. This is an
221
+ advanced example to show PodOverhead behavior, and it is not expected that users should need to check
222
+ cgroups directly on the node.
223
+
224
+ First, on the particular node, determine the Pod identifier:ying
225
+ -->
226
+ 在工作负载所运行的节点上检查 Pod 的内存 cgroups. 在接下来的例子中,将在该节点上使用具备 CRI 兼容的容器运行时命令行工具 [ ` crictl ` ] ( https://github.com/kubernetes-sigs/cri-tools/blob/master/docs/crictl.md ) .
227
+ 这是一个展示 PodOverhead 行为的进阶示例,用户并不需要直接在该节点上检查 cgroups.
228
+
229
+ 首先在特定的节点上确定该 Pod 的标识符:ying
59
230
60
231
<!--
61
- - in {{< glossary_tooltip text="kube-scheduler" term_id="kube-scheduler" >}}
62
- - in {{< glossary_tooltip text="kube-apiserver" term_id="kube-apiserver" >}}
63
- - in the {{< glossary_tooltip text="kubelet" term_id="kubelet" >}} on each Node
64
- - in any custom API servers that use feature gates
232
+ ```bash
233
+ # Run this on the node where the Pod is scheduled
65
234
-->
66
- - 在 {{< glossary_tooltip text="kube-scheduler" term_id="kube-scheduler" >}}
67
- - 在 {{< glossary_tooltip text="kube-apiserver" term_id="kube-apiserver" >}}
68
- - 在每一个 Node 的 {{< glossary_tooltip text="kubelet" term_id="kubelet" >}}
69
- - 在任何使用特性门的自定义api服务器中
235
+ ``` bash
236
+ # 在该 Pod 调度的节点上执行如下命令:
237
+ POD_ID= " $( sudo crictl pods --name test-pod -q ) "
238
+ ```
70
239
240
+ <!--
241
+ From this, you can determine the cgroup path for the Pod:
242
+ -->
243
+ 可以依此判断该 Pod 的 cgroup 路径:
71
244
72
- {{< note >}}
73
245
<!--
74
- Users who can write to RuntimeClass resources are able to have cluster-wide impact on
75
- workload performance. You can limit access to this ability using Kubernetes access controls.
76
- See [Authorization Overview](/docs/reference/access-authn-authz/authorization/) for more details.
246
+ ```bash
247
+ # Run this on the node where the Pod is scheduled
77
248
-->
78
- 能够写入运行时类资源的用户能够对工作负载性能产生集群范围的影响。可以使用 Kubernetes 访问控制来限制对此功能的访问。
79
- 有关详细信息,请参见[ 授权概述] ( /docs/reference/access-authn-authz/authorization/ ) 。
80
- {{< /note >}}
249
+ ``` bash
250
+ # 在该 Pod 调度的节点上执行如下命令:
251
+ sudo crictl inspectp -o=json $POD_ID | grep cgroupsPath
252
+ ```
81
253
254
+ <!--
255
+ The resulting cgroup path includes the Pod's `pause` container. The Pod level cgroup is one directory above.
256
+ -->
257
+ 执行结果的 cgroup 路径中包含了该 Pod 的 ` pause ` 容器。Pod 级别的 cgroup 即上面的一个目录。
258
+ ```
259
+ "cgroupsPath": "/kubepods/podd7f4b509-cf94-4951-9417-d1087c92a5b2/7ccf55aee35dd16aca4189c952d83487297f3cd760f1bbf09620e206e7d0c27a"
260
+ ```
261
+
262
+ <!--
263
+ In this specific case, the pod cgroup path is `kubepods/podd7f4b509-cf94-4951-9417-d1087c92a5b2`. Verify the Pod level cgroup setting for memory:
264
+ -->
265
+ 在这个例子中,该 pod 的 cgroup 路径是 ` kubepods/podd7f4b509-cf94-4951-9417-d1087c92a5b2 ` 。验证内存的 Pod 级别 cgroup 设置:
266
+
267
+ <!--
268
+ ```bash
269
+ # Run this on the node where the Pod is scheduled.
270
+ # Also, change the name of the cgroup to match the cgroup allocated for your pod.
271
+ -->
272
+ ``` bash
273
+ # 在该 Pod 调度的节点上执行这个命令。
274
+ # 另外,修改 cgroup 的名称以匹配为该 pod 分配的 cgroup。
275
+ cat /sys/fs/cgroup/memory/kubepods/podd7f4b509-cf94-4951-9417-d1087c92a5b2/memory.limit_in_bytes
276
+ ```
277
+
278
+ <!--
279
+ This is 320 MiB, as expected:
280
+ -->
281
+ 和预期的一样是 320 MiB
282
+ ```
283
+ 335544320
284
+ ```
285
+
286
+ <!--
287
+ ### Observability
288
+ -->
289
+ ### 可观察性
290
+
291
+ <!--
292
+ A `kube_pod_overhead` metric is available in [kube-state-metrics](https://github.com/kubernetes/kube-state-metrics)
293
+ to help identify when PodOverhead is being utilized and to help observe stability of workloads
294
+ running with a defined Overhead. This functionality is not available in the 1.9 release of
295
+ kube-state-metrics, but is expected in a following release. Users will need to build kube-state-metrics
296
+ from source in the meantime.
297
+ -->
298
+ 在 [ kube-state-metrics] ( https://github.com/kubernetes/kube-state-metrics ) 中可以通过 ` kube_pod_overhead ` 指标来协助确定何时使用 PodOverhead 以及协助观察以一个既定开销运行的工作负载的稳定性。
299
+ 该特性在 kube-state-metrics 的 1.9 发行版本中不可用,不过预计将在后续版本中发布。在此之前,用户需要从源代码构建 kube-state-metrics.
82
300
83
301
{{% /capture %}}
84
302
85
303
{{% capture whatsnext %}}
86
304
87
- <!--
88
305
* [ RuntimeClass] ( /docs/concepts/containers/runtime-class/ )
89
- * [PodOverhead Design](https://github.com/kubernetes/enhancements/blob/master/keps/sig-node/20190226-pod-overhead.md)
90
- -->
306
+ * [ PodOverhead 设计] ( https://github.com/kubernetes/enhancements/blob/master/keps/sig-node/20190226-pod-overhead.md )
91
307
92
308
{{% /capture %}}
0 commit comments