Skip to content

Commit 98e663f

Browse files
committed
[zh-cn]sync system-metrics device-plugins system-traces
Signed-off-by: xin.li <[email protected]>
1 parent 2a631e0 commit 98e663f

File tree

3 files changed

+67
-41
lines changed

3 files changed

+67
-41
lines changed

content/zh-cn/docs/concepts/cluster-administration/system-metrics.md

Lines changed: 17 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -68,7 +68,7 @@ authorization via a user, group or ServiceAccount with a ClusterRole that allows
6868
[Prometheus 服务器](https://prometheus.io/)或某些其他指标搜集器以定期收集这些指标,
6969
并使它们在某种时间序列数据库中可用。
7070

71-
请注意,{{< glossary_tooltip term_id="kubelet" text="kubelet" >}} 还会在 `/metrics/cadvisor`
71+
请注意,{{< glossary_tooltip term_id="kubelet" text="kubelet" >}} 还会在 `/metrics/cadvisor`
7272
`/metrics/resource``/metrics/probes` 端点中公开度量值。这些度量值的生命周期各不相同。
7373

7474
如果你的集群使用了 {{< glossary_tooltip term_id="rbac" text="RBAC" >}},
@@ -172,7 +172,7 @@ patch release, the reason for that is the metrics deprecation policy runs agains
172172

173173
`show-hidden-metrics-for-version` 参数接受版本号作为取值,
174174
版本号给出你希望显示该发行版本中已弃用的指标。
175-
版本表示为 x.y,其中 x 是主要版本,y 是次要版本。补丁程序版本不是必须的,
175+
版本表示为 `x.y`,其中 `x` 是主要版本,`y` 是次要版本。补丁程序版本不是必须的,
176176
即使指标可能会在补丁程序发行版中弃用,原因是指标弃用策略规定仅针对次要版本。
177177

178178
<!--
@@ -186,7 +186,7 @@ deprecated policy, we can reach the following conclusion:
186186
此参数的取值只能使用前一个次要版本。如果管理员将前一个版本设置为 `show-hidden-metrics-for-version`
187187
则前一个版本中隐藏的度量值会再度生成。不允许使用过旧的版本,因为那样会违反指标弃用策略。
188188

189-
以指标 `A` 为例,此处假设 `A` 在 1.n 中已弃用。根据指标弃用策略,我们可以得出以下结论:
189+
以指标 `A` 为例,此处假设 `A``1.n` 中已弃用。根据指标弃用策略,我们可以得出以下结论:
190190

191191
<!--
192192
* In release `1.n`, the metric is deprecated, and it can be emitted by default.
@@ -317,16 +317,16 @@ flag to expose these alpha stability metrics.
317317
-->
318318
### kubelet 压力阻塞信息(PSI)指标
319319

320-
{{< feature-state for_k8s_version="v1.33" state="alpha" >}}
320+
{{< feature-state for_k8s_version="v1.34" state="beta" >}}
321321

322322
<!--
323-
As an alpha feature, Kubernetes lets you configure kubelet to collect Linux kernel
323+
As a beta feature, Kubernetes lets you configure kubelet to collect Linux kernel
324324
[Pressure Stall Information](https://docs.kernel.org/accounting/psi.html)
325-
(PSI) for CPU, memory and IO usage.
325+
(PSI) for CPU, memory and I/O usage.
326326
The information is collected at node, pod and container level.
327327
The metrics are exposed at the `/metrics/cadvisor` endpoint with the following names:
328328
-->
329-
作为一个 Alpha 阶段的特性,Kubernetes 允许你配置 kubelet 以基于 CPU、内存和 IO 的使用情况收集 Linux
329+
作为一个 Beta 阶段的特性,Kubernetes 允许你配置 kubelet 以基于 CPU、内存和 I/O 的使用情况收集 Linux
330330
内核的[压力阻塞信息(PSI)](https://docs.kernel.org/accounting/psi.html)
331331
此信息是在节点、Pod 和容器级别进行收集的。
332332
这些指标通过 `/metrics/cadvisor` 端点暴露,指标名称如下:
@@ -341,13 +341,19 @@ container_pressure_io_waiting_seconds_total
341341
```
342342

343343
<!--
344-
You must enable the `KubeletPSI` [feature gate](/docs/reference/command-line-tools-reference/feature-gates/)
345-
to use this feature. The information is also exposed in the
344+
This feature is enabled by default, by setting the `KubeletPSI` [feature gate](/docs/reference/command-line-tools-reference/feature-gates/). The information is also exposed in the
346345
[Summary API](/docs/reference/instrumentation/node-metrics#psi).
347346
-->
348-
要使用此特性,你必须启用 `KubeletPSI` [特性门控](/zh-cn/docs/reference/command-line-tools-reference/feature-gates/)
347+
此特性默认启用,通过 `KubeletPSI`
348+
[特性门控](/zh-cn/docs/reference/command-line-tools-reference/feature-gates/)管理。
349349
此信息也会通过 [Summary API](/zh-cn/docs/reference/instrumentation/node-metrics#psi) 暴露。
350350

351+
<!--
352+
You can learn how to interpret the PSI metrics in [Understand PSI Metrics](/docs/reference/instrumentation/understand-psi-metrics/).
353+
-->
354+
参见[了解 PSI 指标](/zh-cn/docs/reference/instrumentation/understand-psi-metrics/)
355+
学习如何解读 PSI 指标。
356+
351357
<!--
352358
#### Requirements
353359
@@ -361,7 +367,7 @@ Pressure Stall Information requires:
361367
启用压力阻塞信息需满足以下条件:
362368

363369
- [Linux 内核版本为 4.20 或更高](/zh-cn/docs/reference/node/kernel-version-requirements#requirements-psi)
364-
- [cgroup v2](/zh-cn/docs/concepts/architecture/cgroups)
370+
- [CGroup v2](/zh-cn/docs/concepts/architecture/cgroups)
365371

366372
<!--
367373
## Disabling metrics

content/zh-cn/docs/concepts/cluster-administration/system-traces.md

Lines changed: 6 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -157,7 +157,7 @@ with `--tracing-config-file=<path-to-config>`. This is an example config that re
157157
spans for 1 in 10000 requests, and uses the default OpenTelemetry endpoint:
158158

159159
```yaml
160-
apiVersion: apiserver.config.k8s.io/v1beta1
160+
apiVersion: apiserver.config.k8s.io/v1
161161
kind: TracingConfiguration
162162
# default value
163163
#endpoint: localhost:4317
@@ -169,7 +169,7 @@ kube-apiserver 提供追踪配置文件。下面是一个示例配置,它为
169169
span,并使用了默认的 OpenTelemetry 端点。
170170

171171
```yaml
172-
apiVersion: apiserver.config.k8s.io/v1beta1
172+
apiVersion: apiserver.config.k8s.io/v1
173173
kind: TracingConfiguration
174174
# 默认值
175175
#endpoint: localhost:4317
@@ -178,10 +178,10 @@ samplingRatePerMillion: 100
178178

179179
<!--
180180
For more information about the `TracingConfiguration` struct, see
181-
[API server config API (v1beta1)](/docs/reference/config-api/apiserver-config.v1beta1/#apiserver-k8s-io-v1beta1-TracingConfiguration).
181+
[API server config API (v1)](/docs/reference/config-api/apiserver-config.v1/#apiserver-k8s-io-v1-TracingConfiguration).
182182
-->
183183
有关 TracingConfiguration 结构体的更多信息,请参阅
184-
[API 服务器配置 API (v1beta1)](/zh-cn/docs/reference/config-api/apiserver-config.v1beta1/#apiserver-k8s-io-v1beta1-TracingConfiguration)。
184+
[API 服务器配置 API](/zh-cn/docs/reference/config-api/apiserver-config.v1/#apiserver-k8s-io-v1-TracingConfiguration)。
185185

186186
<!--
187187
### kubelet traces
@@ -213,8 +213,6 @@ This is an example snippet of a kubelet config that records spans for 1 in 10000
213213
```yaml
214214
apiVersion: kubelet.config.k8s.io/v1beta1
215215
kind: KubeletConfiguration
216-
featureGates:
217-
KubeletTracing: true
218216
tracing:
219217
# default value
220218
#endpoint: localhost:4317
@@ -230,8 +228,6 @@ span,并使用默认的 OpenTelemetry 端点:
230228
```yaml
231229
apiVersion: kubelet.config.k8s.io/v1beta1
232230
kind: KubeletConfiguration
233-
featureGates:
234-
KubeletTracing: true
235231
tracing:
236232
# 默认值
237233
#endpoint: localhost:4317
@@ -242,7 +238,8 @@ tracing:
242238
If the `samplingRatePerMillion` is set to one million (`1000000`), then every
243239
span will be sent to the exporter.
244240
-->
245-
如果 `samplingRatePerMillion` 被设置为一百万 (`1000000`),则所有 span 都将被发送到导出器。
241+
如果 `samplingRatePerMillion` 被设置为一百万(`1000000`),
242+
则所有 span 都将被发送到导出器。
246243

247244
<!--
248245
The kubelet in Kubernetes v{{< skew currentVersion >}} collects spans from

content/zh-cn/docs/concepts/extend-kubernetes/compute-storage-net/device-plugins.md

Lines changed: 44 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -62,13 +62,6 @@ During the registration, the device plugin needs to send:
6262
[extended resource naming scheme](/docs/concepts/configuration/manage-resources-containers/#extended-resources)
6363
as `vendor-domain/resourcetype`.
6464
(For example, an NVIDIA GPU is advertised as `nvidia.com/gpu`.)
65-
66-
Following a successful registration, the device plugin sends the kubelet the
67-
list of devices it manages, and the kubelet is then in charge of advertising those
68-
resources to the API server as part of the kubelet node status update.
69-
For example, after a device plugin registers `hardware-vendor.example/foo` with the kubelet
70-
and reports two healthy devices on a node, the node status is updated
71-
to advertise that the node has 2 "Foo" devices installed and available.
7265
-->
7366
设备插件可以通过此 gRPC 服务在 kubelet 进行注册。在注册期间,设备插件需要发送下面几样内容:
7467

@@ -78,6 +71,14 @@ to advertise that the node has 2 "Foo" devices installed and available.
7871
需要遵循[扩展资源命名方案](/zh-cn/docs/concepts/configuration/manage-resources-containers/#extended-resources)
7972
类似于 `vendor-domain/resourcetype`。(比如 NVIDIA GPU 就被公布为 `nvidia.com/gpu`。)
8073

74+
<!--
75+
Following a successful registration, the device plugin sends the kubelet the
76+
list of devices it manages, and the kubelet is then in charge of advertising those
77+
resources to the API server as part of the kubelet node status update.
78+
For example, after a device plugin registers `hardware-vendor.example/foo` with the kubelet
79+
and reports two healthy devices on a node, the node status is updated
80+
to advertise that the node has 2 "Foo" devices installed and available.
81+
-->
8182
成功注册后,设备插件就向 kubelet 发送它所管理的设备列表,然后 kubelet
8283
负责将这些资源发布到 API 服务器,作为 kubelet 节点状态更新的一部分。
8384

@@ -114,13 +115,27 @@ on certain nodes. Here is an example of a pod requesting this resource to run a
114115
下面就是一个 Pod 示例,请求此资源以运行一个工作负载的示例:
115116

116117
<!--
118+
```yaml
119+
---
120+
apiVersion: v1
121+
kind: Pod
122+
metadata:
123+
name: demo-pod
124+
spec:
125+
containers:
126+
- name: demo-container-1
127+
image: registry.k8s.io/pause:3.8
128+
resources:
129+
limits:
130+
hardware-vendor.example/foo: 2
117131
#
118132
# This Pod needs 2 of the hardware-vendor.example/foo devices
119133
# and can only schedule onto a Node that's able to satisfy
120134
# that need.
121135
#
122136
# If the Node has more than 2 of those devices available, the
123137
# remainder would be available for other Pods to use.
138+
```
124139
-->
125140
```yaml
126141
---
@@ -511,15 +526,17 @@ CPU ID、设备插件所报告的设备 ID 以及这些设备分配所处的 NUM
511526

512527
<!--
513528
Starting from Kubernetes v1.27, the `List` endpoint can provide information on resources
514-
of running pods allocated in `ResourceClaims` by the `DynamicResourceAllocation` API. To enable
515-
this feature `kubelet` must be started with the following flags:
529+
of running pods allocated in `ResourceClaims` by the `DynamicResourceAllocation` API.
530+
Starting from Kubernetes v1.34, this feature is enabled by default.
531+
To disable, `kubelet` must be started with the following flags:
516532
-->
517533
从 Kubernetes v1.27 开始,`List` 端点可以通过 `DynamicResourceAllocation` API 提供在
518534
`ResourceClaims` 中分配的当前运行 Pod 的资源信息。
519-
要启用此特性,必须使用以下标志启动 `kubelet`
535+
从 Kubernetes v1.34 开始,此特性默认启用。
536+
要禁用此特性,必须使用以下标志启动 `kubelet`
520537

521538
```
522-
--feature-gates=DynamicResourceAllocation=true,KubeletPodResourcesDynamicResources=true
539+
--feature-gates=KubeletPodResourcesDynamicResources=false
523540
```
524541

525542
<!--
@@ -785,7 +802,7 @@ will continue working.
785802
-->
786803
### `Get` gRPC 端点 {#grpc-endpoint-get}
787804

788-
{{< feature-state state="alpha" for_k8s_version="v1.27" >}}
805+
{{< feature-state state="beta" for_k8s_version="v1.34" >}}
789806

790807
<!--
791808
The `Get` endpoint provides information on resources of a running Pod. It exposes information
@@ -813,24 +830,26 @@ message GetPodResourcesRequest {
813830
```
814831

815832
<!--
816-
To enable this feature, you must start your kubelet services with the following flag:
833+
To disable this feature, you must start your kubelet services with the following flag:
817834
-->
818-
要启用此特性,你必须使用以下标志启动 kubelet 服务:
835+
要禁用此特性,你必须使用以下标志启动 kubelet 服务:
819836

820837
```
821-
--feature-gates=KubeletPodResourcesGet=true
838+
--feature-gates=KubeletPodResourcesGet=false
822839
```
823840

824841
<!--
825842
The `Get` endpoint can provide Pod information related to dynamic resources
826-
allocated by the dynamic resource allocation API. To enable this feature, you must
827-
ensure your kubelet services are started with the following flags:
843+
allocated by the dynamic resource allocation API.
844+
Starting from Kubernetes v1.34, this feature is enabled by default.
845+
To disable, `kubelet` must be started with the following flags:
828846
-->
829847
`Get` 端点可以提供与动态资源分配 API 所分配的动态资源相关的 Pod 信息。
830-
要启用此特性,你必须确保使用以下标志启动 kubelet 服务:
848+
从 Kubernetes v1.34 开始,此特性已默认启用。
849+
要禁用此特性,你必须确保使用以下标志启动 kubelet 服务:
831850

832851
```
833-
--feature-gates=KubeletPodResourcesGet=true,DynamicResourceAllocation=true,KubeletPodResourcesDynamicResources=true
852+
--feature-gates=KubeletPodResourcesDynamicResources=false
834853
```
835854

836855
<!--
@@ -919,11 +938,13 @@ Here are some examples of device plugin implementations:
919938
* [Akri](https://github.com/project-akri/akri),它可以让你轻松公开异构叶子设备(例如 IP 摄像机和 USB 设备)。
920939
* [AMD GPU 设备插件](https://github.com/ROCm/k8s-device-plugin)
921940
* 适用于通用 Linux 设备和 USB 设备的[通用设备插件](https://github.com/squat/generic-device-plugin)
922-
* 用于异构 AI 计算虚拟化中间件(例如 NVIDIA、Cambricon、Hygon、Iluvatar、MThreads、Ascend、Metax 设备)的 [HAMi](https://github.com/Project-HAMi/HAMi)
941+
* 用于异构 AI 计算虚拟化中间件(例如 NVIDIA、Cambricon、Hygon、Iluvatar、MThreads、Ascend、Metax 设备)的
942+
[HAMi](https://github.com/Project-HAMi/HAMi)
923943
* [Intel 设备插件](https://github.com/intel/intel-device-plugins-for-kubernetes)支持
924944
Intel GPU、FPGA、QAT、VPU、SGX、DSA、DLB 和 IAA 设备
925945
* [KubeVirt 设备插件](https://github.com/kubevirt/kubernetes-device-plugins)用于硬件辅助的虚拟化
926-
* [NVIDIA GPU 设备插件](https://github.com/NVIDIA/k8s-device-plugin)NVIDIA 的官方设备插件,用于公布 NVIDIA GPU 和监控 GPU 健康状态。
946+
* [NVIDIA GPU 设备插件](https://github.com/NVIDIA/k8s-device-plugin)NVIDIA 的官方设备插件,
947+
用于公布 NVIDIA GPU 和监控 GPU 健康状态。
927948
* [为 Container-Optimized OS 所提供的 NVIDIA GPU 设备插件](https://github.com/GoogleCloudPlatform/container-engine-accelerators/tree/master/cmd/nvidia_gpu)
928949
* [RDMA 设备插件](https://github.com/hustcat/k8s-rdma-device-plugin)
929950
* [SocketCAN 设备插件](https://github.com/collabora/k8s-socketcan)
@@ -941,8 +962,10 @@ Here are some examples of device plugin implementations:
941962
* Learn about the [Topology Manager](/docs/tasks/administer-cluster/topology-manager/)
942963
* Read about using [hardware acceleration for TLS ingress](/blog/2019/04/24/hardware-accelerated-ssl/tls-termination-in-ingress-controllers-using-kubernetes-device-plugins-and-runtimeclass/)
943964
with Kubernetes
965+
* Read more about [Extended Resource allocation by DRA](/docs/concepts/scheduling-eviction/dynamic-resource-allocation/#extended-resource)
944966
-->
945967
* 查看[调度 GPU 资源](/zh-cn/docs/tasks/manage-gpus/scheduling-gpus/)来学习使用设备插件
946968
* 查看在节点上如何[公布扩展资源](/zh-cn/docs/tasks/administer-cluster/extended-resource-node/)
947969
* 学习[拓扑管理器](/zh-cn/docs/tasks/administer-cluster/topology-manager/)
948970
* 阅读如何在 Kubernetes 中使用 [TLS Ingress 的硬件加速](/zh-cn/blog/2019/04/24/hardware-accelerated-ssl/tls-termination-in-ingress-controllers-using-kubernetes-device-plugins-and-runtimeclass/)
971+
* 阅读更多关于[使用 DRA 分配扩展资源](/zh-cn/docs/concepts/scheduling-eviction/dynamic-resource-allocation/#extended-resource)

0 commit comments

Comments
 (0)