Merge pull request #50635 from windsonsea/temtri

k8s-ci-robot · web-flow · commit 4d50f0d837e8 · 2025-04-26T18:45:24.000-07:00
[zh] Sync cluster-administration/system-metrics.md
diff --git a/content/zh-cn/docs/concepts/cluster-administration/system-metrics.md b/content/zh-cn/docs/concepts/cluster-administration/system-metrics.md
@@ -40,8 +40,8 @@ Examples of those components:
 -->
 ## Kubernetes 中组件的指标  {#metrics-in-kubernetes}
 
-在大多数情况下，可以通过 HTTP 访问组件的 `/metrics` 端点来获取组件的度量值。
-对于那些默认情况下不暴露端点的组件，可以使用 `--bind-address` 标志启用。
+在大多数情况下，可以通过 HTTP 服务器的 `/metrics` 端点来获取组件的度量值。
+对于那些默认情况下不暴露端点的组件，可以使用 `--bind-address` 参数来启用。
 
 这些组件的示例：
 
@@ -64,16 +64,16 @@ If your cluster uses {{< glossary_tooltip term_id="rbac" text="RBAC" >}}, readin
 authorization via a user, group or ServiceAccount with a ClusterRole that allows accessing
 `/metrics`. For example:
 -->
-在生产环境中，你可能需要配置 [Prometheus 服务器](https://prometheus.io/)或
-某些其他指标搜集器以定期收集这些指标，并使它们在某种时间序列数据库中可用。
+在生产环境中，你可能需要配置
+[Prometheus 服务器](https://prometheus.io/)或某些其他指标搜集器以定期收集这些指标，
+并使它们在某种时间序列数据库中可用。
 
 请注意，{{< glossary_tooltip term_id="kubelet" text="kubelet" >}} 还会在 `/metrics/cadvisor`，
 `/metrics/resource` 和 `/metrics/probes` 端点中公开度量值。这些度量值的生命周期各不相同。
 
 如果你的集群使用了 {{< glossary_tooltip term_id="rbac" text="RBAC" >}}，
 则读取指标需要通过基于用户、组或 ServiceAccount 的鉴权，要求具有允许访问
-`/metrics` 的 ClusterRole。
-例如：
+`/metrics` 的 ClusterRole。例如：
 
 ```yaml
 apiVersion: rbac.authorization.k8s.io/v1
@@ -164,26 +164,27 @@ The flag `show-hidden-metrics-for-version` takes a version for which you want to
 deprecated in that release. The version is expressed as x.y, where x is the major version, y is
 the minor version. The patch version is not needed even though a metrics can be deprecated in a
 patch release, the reason for that is the metrics deprecation policy runs against the minor release.
-
-The flag can only take the previous minor version as it's value. All metrics hidden in previous
-will be emitted if admins set the previous version to `show-hidden-metrics-for-version`. The too
-old version is not allowed because this violates the metrics deprecated policy.
-
-Take metric `A` as an example, here assumed that `A` is deprecated in 1.n. According to metrics
-deprecated policy, we can reach the following conclusion:
 -->
 ## 显示隐藏指标   {#show-hidden-metrics}
 
 如上所述，管理员可以通过设置可执行文件的命令行参数来启用隐藏指标，
 如果管理员错过了上一版本中已经弃用的指标的迁移，则可以把这个用作管理员的逃生门。
 
-`show-hidden-metrics-for-version` 标志接受版本号作为取值，版本号给出
-你希望显示该发行版本中已弃用的指标。
+`show-hidden-metrics-for-version` 参数接受版本号作为取值，
+版本号给出你希望显示该发行版本中已弃用的指标。
 版本表示为 x.y，其中 x 是主要版本，y 是次要版本。补丁程序版本不是必须的，
 即使指标可能会在补丁程序发行版中弃用，原因是指标弃用策略规定仅针对次要版本。
 
-该参数只能使用前一个次要版本。如果管理员将先前版本设置为 `show-hidden-metrics-for-version`，
-则先前版本中隐藏的度量值会再度生成。不允许使用过旧的版本，因为那样会违反指标弃用策略。
+<!--
+The flag can only take the previous minor version as it's value. All metrics hidden in previous
+will be emitted if admins set the previous version to `show-hidden-metrics-for-version`. The too
+old version is not allowed because this violates the metrics deprecated policy.
+
+Take metric `A` as an example, here assumed that `A` is deprecated in 1.n. According to metrics
+deprecated policy, we can reach the following conclusion:
+-->
+此参数的取值只能使用前一个次要版本。如果管理员将前一个版本设置为 `show-hidden-metrics-for-version`，
+则前一个版本中隐藏的度量值会再度生成。不允许使用过旧的版本，因为那样会违反指标弃用策略。
 
 以指标 `A` 为例，此处假设 `A` 在 1.n 中已弃用。根据指标弃用策略，我们可以得出以下结论：
 
@@ -202,7 +203,7 @@ to remove this metric dependency before upgrading to `1.14`
 * 在版本 `1.n+2` 中，这个指标就将被从代码中移除，不会再有任何逃生窗口。
 
 如果你要从版本 `1.12` 升级到 `1.13`，但仍依赖于 `1.12` 中弃用的指标 `A`，则应通过命令行设置隐藏指标：
-`--show-hidden-metrics=1.12`，并记住在升级到 `1.14` 版本之前删除此指标依赖项。
+`--show-hidden-metrics=1.12`，并记住在升级到 `1.14` 版本之前移除此指标依赖项。
 
 <!--
 ## Component metrics
@@ -213,12 +214,6 @@ Controller manager metrics provide important insight into the performance and he
 controller manager. These metrics include common Go language runtime metrics such as go_routine
 count and controller specific metrics such as etcd request latencies or Cloudprovider (AWS, GCE,
 OpenStack) API latencies that can be used to gauge the health of a cluster.
-
-Starting from Kubernetes 1.7, detailed Cloudprovider metrics are available for storage operations
-for GCE, AWS, Vsphere and OpenStack.
-These metrics can be used to monitor health of persistent volume operations.
-
-For example, for GCE these metrics are called:
 -->
 ## 组件指标  {#component-metrics}
 
@@ -228,6 +223,13 @@ For example, for GCE these metrics are called:
 这些指标包括通用的 Go 语言运行时指标（例如 go_routine 数量）和控制器特定的度量指标，
 例如可用于评估集群运行状况的 etcd 请求延迟或云提供商（AWS、GCE、OpenStack）的 API 延迟等。
 
+<!--
+Starting from Kubernetes 1.7, detailed Cloudprovider metrics are available for storage operations
+for GCE, AWS, Vsphere and OpenStack.
+These metrics can be used to monitor health of persistent volume operations.
+
+For example, for GCE these metrics are called:
+-->
 从 Kubernetes 1.7 版本开始，详细的云提供商指标可用于 GCE、AWS、Vsphere 和 OpenStack 的存储操作。
 这些指标可用于监控持久卷操作的运行状况。
 
@@ -298,17 +300,68 @@ ClusterRole with the `get` verb for the `/metrics/resources` non-resource URL.
 On Kubernetes 1.21 you must use the `--show-hidden-metrics-for-version=1.20`
 flag to expose these alpha stability metrics.
 -->
-一旦 Pod 进入完成状态（其 `restartPolicy` 为 `Never` 或 `OnFailure`，且
-其处于 `Succeeded` 或 `Failed` Pod 阶段，或者已经被删除且所有容器都具有
-终止状态），该时间序列停止报告，因为调度器现在可以调度其它 Pod 来执行。
+一旦 Pod 进入完成状态（其 `restartPolicy` 为 `Never` 或 `OnFailure`，且其处于
+`Succeeded` 或 `Failed` Pod 阶段，或者已经被删除且所有容器都具有终止状态），
+该时间序列停止报告，因为调度器现在可以调度其它 Pod 来执行。
 这两个指标称作 `kube_pod_resource_request` 和 `kube_pod_resource_limit`。
 
 这些指标通过 HTTP 端点 `/metrics/resources` 公开出来。
 访问 `/metrics/resources` 端点需要鉴权，通常通过对
 `/metrics/resources` 非资源 URL 的 `get` 访问授予访问权限。  
 
 在 Kubernetes 1.21 中，你必须使用 `--show-hidden-metrics-for-version=1.20`
-标志来公开 Alpha 级稳定性的指标。
+参数来公开 Alpha 级稳定性的指标。
+
+<!--
+### kubelet Pressure Stall Information (PSI) metrics
+-->
+### kubelet 压力阻塞信息（PSI）指标
+
+{{< feature-state for_k8s_version="v1.33" state="alpha" >}}
+
+<!--
+As an alpha feature, Kubernetes lets you configure kubelet to collect Linux kernel
+[Pressure Stall Information](https://docs.kernel.org/accounting/psi.html)
+(PSI) for CPU, memory and IO usage.
+The information is collected at node, pod and container level.
+The metrics are exposed at the `/metrics/cadvisor` endpoint with the following names:
+-->
+作为一个 Alpha 阶段的特性，Kubernetes 允许你配置 kubelet 以基于 CPU、内存和 IO 的使用情况收集 Linux
+内核的[压力阻塞信息（PSI）](https://docs.kernel.org/accounting/psi.html)。
+此信息是在节点、Pod 和容器级别进行收集的。
+这些指标通过 `/metrics/cadvisor` 端点暴露，指标名称如下：
+
+```
+container_pressure_cpu_stalled_seconds_total
+container_pressure_cpu_waiting_seconds_total
+container_pressure_memory_stalled_seconds_total
+container_pressure_memory_waiting_seconds_total
+container_pressure_io_stalled_seconds_total
+container_pressure_io_waiting_seconds_total
+```
+
+<!--
+You must enable the `KubeletPSI` [feature gate](/docs/reference/command-line-tools-reference/feature-gates/)
+to use this feature. The information is also exposed in the
+[Summary API](/docs/reference/instrumentation/node-metrics#psi).
+-->
+要使用此特性，你必须启用 `KubeletPSI` [特性门控](/zh-cn/docs/reference/command-line-tools-reference/feature-gates/)。
+此信息也会通过 [Summary API](/zh-cn/docs/reference/instrumentation/node-metrics#psi) 暴露。
+
+<!--
+#### Requirements
+
+Pressure Stall Information requires:
+
+- [Linux kernel versions 4.20 or later](/docs/reference/node/kernel-version-requirements#requirements-psi).
+- [cgroup v2](/docs/concepts/architecture/cgroups)
+-->
+#### 要求
+
+启用压力阻塞信息需满足以下条件：
+
+- [Linux 内核版本为 4.20 或更高](/zh-cn/docs/reference/node/kernel-version-requirements#requirements-psi)
+- [cgroup v2](/zh-cn/docs/concepts/architecture/cgroups)
 
 <!--
 ## Disabling metrics
@@ -319,9 +372,9 @@ disabled metrics (i.e. `--disabled-metrics=metric1,metric2`).
 -->
 ## 禁用指标 {#disabling-metrics}
 
-你可以通过命令行标志 `--disabled-metrics` 来关闭某指标。
+你可以通过命令行参数 `--disabled-metrics` 来关闭某指标。
 在例如某指标会带来性能问题的情况下，这一操作可能是有用的。
-标志的参数值是一组被禁止的指标（例如：`--disabled-metrics=metric1,metric2`）。
+参数值是一组被禁用的指标（例如：`--disabled-metrics=metric1,metric2`）。
 
 <!--
 ## Metric cardinality enforcement