[zh-cn]sync system-metrics device-plugins system-traces

my-git9 · my-git9 · commit 98e663fb57e2 · 2025-08-30T13:41:13.000+08:00
Signed-off-by: xin.li &lt;xin.li@daocloud.io&gt;
diff --git a/content/zh-cn/docs/concepts/cluster-administration/system-metrics.md b/content/zh-cn/docs/concepts/cluster-administration/system-metrics.md
@@ -68,7 +68,7 @@ authorization via a user, group or ServiceAccount with a ClusterRole that allows
 [Prometheus 服务器](https://prometheus.io/)或某些其他指标搜集器以定期收集这些指标，
 并使它们在某种时间序列数据库中可用。
 
-请注意，{{< glossary_tooltip term_id="kubelet" text="kubelet" >}} 还会在 `/metrics/cadvisor`，
+请注意，{{< glossary_tooltip term_id="kubelet" text="kubelet" >}} 还会在 `/metrics/cadvisor`、
 `/metrics/resource` 和 `/metrics/probes` 端点中公开度量值。这些度量值的生命周期各不相同。
 
 如果你的集群使用了 {{< glossary_tooltip term_id="rbac" text="RBAC" >}}，
@@ -172,7 +172,7 @@ patch release, the reason for that is the metrics deprecation policy runs agains
 
 `show-hidden-metrics-for-version` 参数接受版本号作为取值，
 版本号给出你希望显示该发行版本中已弃用的指标。
-版本表示为 x.y，其中 x 是主要版本，y 是次要版本。补丁程序版本不是必须的，
+版本表示为 `x.y`，其中 `x` 是主要版本，`y` 是次要版本。补丁程序版本不是必须的，
 即使指标可能会在补丁程序发行版中弃用，原因是指标弃用策略规定仅针对次要版本。
 
 <!--
@@ -186,7 +186,7 @@ deprecated policy, we can reach the following conclusion:
 此参数的取值只能使用前一个次要版本。如果管理员将前一个版本设置为 `show-hidden-metrics-for-version`，
 则前一个版本中隐藏的度量值会再度生成。不允许使用过旧的版本，因为那样会违反指标弃用策略。
 
-以指标 `A` 为例，此处假设 `A` 在 1.n 中已弃用。根据指标弃用策略，我们可以得出以下结论：
+以指标 `A` 为例，此处假设 `A` 在 `1.n` 中已弃用。根据指标弃用策略，我们可以得出以下结论：
 
 <!--
 * In release `1.n`, the metric is deprecated, and it can be emitted by default.
@@ -317,16 +317,16 @@ flag to expose these alpha stability metrics.
 -->
 ### kubelet 压力阻塞信息（PSI）指标
 
-{{< feature-state for_k8s_version="v1.33" state="alpha" >}}
+{{< feature-state for_k8s_version="v1.34" state="beta" >}}
 
 <!--
-As an alpha feature, Kubernetes lets you configure kubelet to collect Linux kernel
+As a beta feature, Kubernetes lets you configure kubelet to collect Linux kernel
 [Pressure Stall Information](https://docs.kernel.org/accounting/psi.html)
-(PSI) for CPU, memory and IO usage.
+(PSI) for CPU, memory and I/O usage.
 The information is collected at node, pod and container level.
 The metrics are exposed at the `/metrics/cadvisor` endpoint with the following names:
 -->
-作为一个 Alpha 阶段的特性，Kubernetes 允许你配置 kubelet 以基于 CPU、内存和 IO 的使用情况收集 Linux
+作为一个 Beta 阶段的特性，Kubernetes 允许你配置 kubelet 以基于 CPU、内存和 I/O 的使用情况收集 Linux
 内核的[压力阻塞信息（PSI）](https://docs.kernel.org/accounting/psi.html)。
 此信息是在节点、Pod 和容器级别进行收集的。
 这些指标通过 `/metrics/cadvisor` 端点暴露，指标名称如下：
@@ -341,13 +341,19 @@ container_pressure_io_waiting_seconds_total
 ```
 
 <!--
-You must enable the `KubeletPSI` [feature gate](/docs/reference/command-line-tools-reference/feature-gates/)
-to use this feature. The information is also exposed in the
+This feature is enabled by default, by setting the `KubeletPSI` [feature gate](/docs/reference/command-line-tools-reference/feature-gates/). The information is also exposed in the
 [Summary API](/docs/reference/instrumentation/node-metrics#psi).
 -->
-要使用此特性，你必须启用 `KubeletPSI` [特性门控](/zh-cn/docs/reference/command-line-tools-reference/feature-gates/)。
+此特性默认启用，通过 `KubeletPSI`
+[特性门控](/zh-cn/docs/reference/command-line-tools-reference/feature-gates/)管理。
 此信息也会通过 [Summary API](/zh-cn/docs/reference/instrumentation/node-metrics#psi) 暴露。
 
+<!--
+You can learn how to interpret the PSI metrics in [Understand PSI Metrics](/docs/reference/instrumentation/understand-psi-metrics/).
+-->
+参见[了解 PSI 指标](/zh-cn/docs/reference/instrumentation/understand-psi-metrics/)，
+学习如何解读 PSI 指标。
+
 <!--
 #### Requirements
 
@@ -361,7 +367,7 @@ Pressure Stall Information requires:
 启用压力阻塞信息需满足以下条件：
 
 - [Linux 内核版本为 4.20 或更高](/zh-cn/docs/reference/node/kernel-version-requirements#requirements-psi)
-- [cgroup v2](/zh-cn/docs/concepts/architecture/cgroups)
+- [CGroup v2](/zh-cn/docs/concepts/architecture/cgroups)
 
 <!--
 ## Disabling metrics
diff --git a/content/zh-cn/docs/concepts/cluster-administration/system-traces.md b/content/zh-cn/docs/concepts/cluster-administration/system-traces.md
@@ -157,7 +157,7 @@ with `--tracing-config-file=<path-to-config>`. This is an example config that re
 spans for 1 in 10000 requests, and uses the default OpenTelemetry endpoint:
 
 ```yaml
-apiVersion: apiserver.config.k8s.io/v1beta1
+apiVersion: apiserver.config.k8s.io/v1
 kind: TracingConfiguration
 # default value
 #endpoint: localhost:4317
@@ -169,7 +169,7 @@ kube-apiserver 提供追踪配置文件。下面是一个示例配置，它为
 span，并使用了默认的 OpenTelemetry 端点。
 
 ```yaml
-apiVersion: apiserver.config.k8s.io/v1beta1
+apiVersion: apiserver.config.k8s.io/v1
 kind: TracingConfiguration
 # 默认值
 #endpoint: localhost:4317
@@ -178,10 +178,10 @@ samplingRatePerMillion: 100
 
 <!-- 
 For more information about the `TracingConfiguration` struct, see
-[API server config API (v1beta1)](/docs/reference/config-api/apiserver-config.v1beta1/#apiserver-k8s-io-v1beta1-TracingConfiguration).
+[API server config API (v1)](/docs/reference/config-api/apiserver-config.v1/#apiserver-k8s-io-v1-TracingConfiguration).
 -->
 有关 TracingConfiguration 结构体的更多信息，请参阅
-[API 服务器配置 API (v1beta1)](/zh-cn/docs/reference/config-api/apiserver-config.v1beta1/#apiserver-k8s-io-v1beta1-TracingConfiguration)。
+[API 服务器配置 API](/zh-cn/docs/reference/config-api/apiserver-config.v1/#apiserver-k8s-io-v1-TracingConfiguration)。
 
 <!--
 ### kubelet traces
@@ -213,8 +213,6 @@ This is an example snippet of a kubelet config that records spans for 1 in 10000
 ```yaml
 apiVersion: kubelet.config.k8s.io/v1beta1
 kind: KubeletConfiguration
-featureGates:
-  KubeletTracing: true
 tracing:
   # default value
   #endpoint: localhost:4317
@@ -230,8 +228,6 @@ span，并使用默认的 OpenTelemetry 端点：
 ```yaml
 apiVersion: kubelet.config.k8s.io/v1beta1
 kind: KubeletConfiguration
-featureGates:
-  KubeletTracing: true
 tracing:
   # 默认值
   #endpoint: localhost:4317
@@ -242,7 +238,8 @@ tracing:
 If the `samplingRatePerMillion` is set to one million (`1000000`), then every
 span will be sent to the exporter.
 -->
-如果 `samplingRatePerMillion` 被设置为一百万 (`1000000`)，则所有 span 都将被发送到导出器。
+如果 `samplingRatePerMillion` 被设置为一百万（`1000000`），
+则所有 span 都将被发送到导出器。
 
 <!--
 The kubelet in Kubernetes v{{< skew currentVersion >}} collects spans from
diff --git a/content/zh-cn/docs/concepts/extend-kubernetes/compute-storage-net/device-plugins.md b/content/zh-cn/docs/concepts/extend-kubernetes/compute-storage-net/device-plugins.md
@@ -62,13 +62,6 @@ During the registration, the device plugin needs to send:
   [extended resource naming scheme](/docs/concepts/configuration/manage-resources-containers/#extended-resources)
   as `vendor-domain/resourcetype`.
   (For example, an NVIDIA GPU is advertised as `nvidia.com/gpu`.)
-
-Following a successful registration, the device plugin sends the kubelet the
-list of devices it manages, and the kubelet is then in charge of advertising those
-resources to the API server as part of the kubelet node status update.
-For example, after a device plugin registers `hardware-vendor.example/foo` with the kubelet
-and reports two healthy devices on a node, the node status is updated
-to advertise that the node has 2 "Foo" devices installed and available.
 -->
 设备插件可以通过此 gRPC 服务在 kubelet 进行注册。在注册期间，设备插件需要发送下面几样内容：
 
@@ -78,6 +71,14 @@ to advertise that the node has 2 "Foo" devices installed and available.
   需要遵循[扩展资源命名方案](/zh-cn/docs/concepts/configuration/manage-resources-containers/#extended-resources)，
   类似于 `vendor-domain/resourcetype`。（比如 NVIDIA GPU 就被公布为 `nvidia.com/gpu`。）
 
+<!--
+Following a successful registration, the device plugin sends the kubelet the
+list of devices it manages, and the kubelet is then in charge of advertising those
+resources to the API server as part of the kubelet node status update.
+For example, after a device plugin registers `hardware-vendor.example/foo` with the kubelet
+and reports two healthy devices on a node, the node status is updated
+to advertise that the node has 2 "Foo" devices installed and available.
+-->
 成功注册后，设备插件就向 kubelet 发送它所管理的设备列表，然后 kubelet
 负责将这些资源发布到 API 服务器，作为 kubelet 节点状态更新的一部分。
 
@@ -114,13 +115,27 @@ on certain nodes. Here is an example of a pod requesting this resource to run a
 下面就是一个 Pod 示例，请求此资源以运行一个工作负载的示例：
 
 <!--
+```yaml
+---
+apiVersion: v1
+kind: Pod
+metadata:
+  name: demo-pod
+spec:
+  containers:
+    - name: demo-container-1
+      image: registry.k8s.io/pause:3.8
+      resources:
+        limits:
+          hardware-vendor.example/foo: 2
 #
 # This Pod needs 2 of the hardware-vendor.example/foo devices
 # and can only schedule onto a Node that's able to satisfy
 # that need.
 #
 # If the Node has more than 2 of those devices available, the
 # remainder would be available for other Pods to use.
+```
 -->
 ```yaml
 ---
@@ -511,15 +526,17 @@ CPU ID、设备插件所报告的设备 ID 以及这些设备分配所处的 NUM
 
 <!--
 Starting from Kubernetes v1.27, the `List` endpoint can provide information on resources
-of running pods allocated in `ResourceClaims` by the `DynamicResourceAllocation` API. To enable
-this feature `kubelet` must be started with the following flags:
+of running pods allocated in `ResourceClaims` by the `DynamicResourceAllocation` API.
+Starting from Kubernetes v1.34, this feature is enabled by default.
+To disable, `kubelet` must be started with the following flags:
 -->
 从 Kubernetes v1.27 开始，`List` 端点可以通过 `DynamicResourceAllocation` API 提供在
 `ResourceClaims` 中分配的当前运行 Pod 的资源信息。
-要启用此特性，必须使用以下标志启动 `kubelet`：
+从 Kubernetes v1.34 开始，此特性默认启用。
+要禁用此特性，必须使用以下标志启动 `kubelet`：
 
 ```
---feature-gates=DynamicResourceAllocation=true,KubeletPodResourcesDynamicResources=true
+--feature-gates=KubeletPodResourcesDynamicResources=false
 ```
 
 <!--
@@ -785,7 +802,7 @@ will continue working.
 -->
 ### `Get` gRPC 端点   {#grpc-endpoint-get}
 
-{{< feature-state state="alpha" for_k8s_version="v1.27" >}}
+{{< feature-state state="beta" for_k8s_version="v1.34" >}}
 
 <!--
 The `Get` endpoint provides information on resources of a running Pod. It exposes information
@@ -813,24 +830,26 @@ message GetPodResourcesRequest {
 ```
 
 <!--
-To enable this feature, you must start your kubelet services with the following flag:
+To disable this feature, you must start your kubelet services with the following flag:
 -->
-要启用此特性，你必须使用以下标志启动 kubelet 服务：
+要禁用此特性，你必须使用以下标志启动 kubelet 服务：
 
 ```
---feature-gates=KubeletPodResourcesGet=true
+--feature-gates=KubeletPodResourcesGet=false
 ```
 
 <!--
 The `Get` endpoint can provide Pod information related to dynamic resources
-allocated by the dynamic resource allocation API. To enable this feature, you must
-ensure your kubelet services are started with the following flags:
+allocated by the dynamic resource allocation API.
+Starting from Kubernetes v1.34, this feature is enabled by default.
+To disable, `kubelet` must be started with the following flags:
 -->
 `Get` 端点可以提供与动态资源分配 API 所分配的动态资源相关的 Pod 信息。
-要启用此特性，你必须确保使用以下标志启动 kubelet 服务：
+从 Kubernetes v1.34 开始，此特性已默认启用。
+要禁用此特性，你必须确保使用以下标志启动 kubelet 服务：
 
 ```
---feature-gates=KubeletPodResourcesGet=true,DynamicResourceAllocation=true,KubeletPodResourcesDynamicResources=true
+--feature-gates=KubeletPodResourcesDynamicResources=false
 ```
 
 <!--
@@ -919,11 +938,13 @@ Here are some examples of device plugin implementations:
 * [Akri](https://github.com/project-akri/akri)，它可以让你轻松公开异构叶子设备（例如 IP 摄像机和 USB 设备）。
 * [AMD GPU 设备插件](https://github.com/ROCm/k8s-device-plugin)
 * 适用于通用 Linux 设备和 USB 设备的[通用设备插件](https://github.com/squat/generic-device-plugin)
-* 用于异构 AI 计算虚拟化中间件（例如 NVIDIA、Cambricon、Hygon、Iluvatar、MThreads、Ascend、Metax 设备）的 [HAMi](https://github.com/Project-HAMi/HAMi)
+* 用于异构 AI 计算虚拟化中间件（例如 NVIDIA、Cambricon、Hygon、Iluvatar、MThreads、Ascend、Metax 设备）的
+  [HAMi](https://github.com/Project-HAMi/HAMi)
 * [Intel 设备插件](https://github.com/intel/intel-device-plugins-for-kubernetes)支持
   Intel GPU、FPGA、QAT、VPU、SGX、DSA、DLB 和 IAA 设备
 * [KubeVirt 设备插件](https://github.com/kubevirt/kubernetes-device-plugins)用于硬件辅助的虚拟化
-* [NVIDIA GPU 设备插件](https://github.com/NVIDIA/k8s-device-plugin)NVIDIA 的官方设备插件，用于公布 NVIDIA GPU 和监控 GPU 健康状态。
+* [NVIDIA GPU 设备插件](https://github.com/NVIDIA/k8s-device-plugin)NVIDIA 的官方设备插件，
+  用于公布 NVIDIA GPU 和监控 GPU 健康状态。
 * [为 Container-Optimized OS 所提供的 NVIDIA GPU 设备插件](https://github.com/GoogleCloudPlatform/container-engine-accelerators/tree/master/cmd/nvidia_gpu)
 * [RDMA 设备插件](https://github.com/hustcat/k8s-rdma-device-plugin)
 * [SocketCAN 设备插件](https://github.com/collabora/k8s-socketcan)
@@ -941,8 +962,10 @@ Here are some examples of device plugin implementations:
 * Learn about the [Topology Manager](/docs/tasks/administer-cluster/topology-manager/)
 * Read about using [hardware acceleration for TLS ingress](/blog/2019/04/24/hardware-accelerated-ssl/tls-termination-in-ingress-controllers-using-kubernetes-device-plugins-and-runtimeclass/)
   with Kubernetes
+* Read more about [Extended Resource allocation by DRA](/docs/concepts/scheduling-eviction/dynamic-resource-allocation/#extended-resource)
 -->
 * 查看[调度 GPU 资源](/zh-cn/docs/tasks/manage-gpus/scheduling-gpus/)来学习使用设备插件
 * 查看在节点上如何[公布扩展资源](/zh-cn/docs/tasks/administer-cluster/extended-resource-node/)
 * 学习[拓扑管理器](/zh-cn/docs/tasks/administer-cluster/topology-manager/)
 * 阅读如何在 Kubernetes 中使用 [TLS Ingress 的硬件加速](/zh-cn/blog/2019/04/24/hardware-accelerated-ssl/tls-termination-in-ingress-controllers-using-kubernetes-device-plugins-and-runtimeclass/)
+* 阅读更多关于[使用 DRA 分配扩展资源](/zh-cn/docs/concepts/scheduling-eviction/dynamic-resource-allocation/#extended-resource)