Skip to content

Commit 2b3babb

Browse files
authored
Merge pull request #26048 from tengqm/zh-sync-system-metrics
[zh] Resync concepts/cluster-administration/system-metrics.md
2 parents 697e683 + 0283963 commit 2b3babb

File tree

1 file changed

+125
-51
lines changed

1 file changed

+125
-51
lines changed

content/zh/docs/concepts/cluster-administration/system-metrics.md

Lines changed: 125 additions & 51 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,16 @@ content_type: concept
44
weight: 60
55
---
66

7+
<!--
8+
title: Metrics For Kubernetes System Components
9+
reviewers:
10+
- brancz
11+
- logicalhan
12+
- RainbowMango
13+
content_type: concept
14+
weight: 60
15+
-->
16+
717
<!-- overview -->
818

919
<!--
@@ -14,7 +24,8 @@ This format is structured plain text, designed so that people and machines can b
1424
-->
1525
系统组件指标可以更好地了解系统内部发生的情况。指标对于构建仪表板和告警特别有用。
1626

17-
Kubernetes 组件以 [Prometheus 格式](https://prometheus.io/docs/instrumenting/exposition_formats/)生成度量值。
27+
Kubernetes 组件以 [Prometheus 格式](https://prometheus.io/docs/instrumenting/exposition_formats/)
28+
生成度量值。
1829
这种格式是结构化的纯文本,旨在使人和机器都可以阅读。
1930

2031
<!-- body -->
@@ -25,13 +36,21 @@ Kubernetes 组件以 [Prometheus 格式](https://prometheus.io/docs/instrumentin
2536
In most cases metrics are available on `/metrics` endpoint of the HTTP server. For components that doesn't expose endpoint by default it can be enabled using `--bind-address` flag.
2637
2738
Examples of those components:
39+
-->
40+
## Kubernetes 中的指标
41+
42+
在大多数情况下,可以在 HTTP 服务器的 `/metrics` 端点上访问度量值。
43+
对于默认情况下不公开端点的组件,可以使用 `--bind-address` 标志启用。
44+
45+
这些组件的示例:
2846

2947
* {{< glossary_tooltip term_id="kube-controller-manager" text="kube-controller-manager" >}}
3048
* {{< glossary_tooltip term_id="kube-proxy" text="kube-proxy" >}}
3149
* {{< glossary_tooltip term_id="kube-apiserver" text="kube-apiserver" >}}
3250
* {{< glossary_tooltip term_id="kube-scheduler" text="kube-scheduler" >}}
3351
* {{< glossary_tooltip term_id="kubelet" text="kubelet" >}}
3452

53+
<!--
3554
In a production environment you may want to configure [Prometheus Server](https://prometheus.io/) or some other metrics scraper
3655
to periodically gather these metrics and make them available in some kind of time series database.
3756
@@ -40,27 +59,15 @@ Note that {{< glossary_tooltip term_id="kubelet" text="kubelet" >}} also exposes
4059
If your cluster uses {{< glossary_tooltip term_id="rbac" text="RBAC" >}}, reading metrics requires authorization via a user, group or ServiceAccount with a ClusterRole that allows accessing `/metrics`.
4160
For example:
4261
-->
43-
## Kubernetes 中的指标
44-
45-
在大多数情况下,可以在 HTTP 服务器的 `/metrics` 端点上访问度量值。
46-
对于默认情况下不公开端点的组件,可以使用 `--bind-address` 标志启用。
47-
48-
这些组件的示例:
49-
50-
* {{< glossary_tooltip term_id="kube-controller-manager" text="kube-controller-manager" >}}
51-
* {{< glossary_tooltip term_id="kube-proxy" text="kube-proxy" >}}
52-
* {{< glossary_tooltip term_id="kube-apiserver" text="kube-apiserver" >}}
53-
* {{< glossary_tooltip term_id="kube-scheduler" text="kube-scheduler" >}}
54-
* {{< glossary_tooltip term_id="kubelet" text="kubelet" >}}
55-
56-
在生产环境中,你可能需要配置 [Prometheus 服务](https://prometheus.io/)
62+
在生产环境中,你可能需要配置 [Prometheus 服务器](https://prometheus.io/)
5763
某些其他指标搜集器以定期收集这些指标,并使它们在某种时间序列数据库中可用。
5864

5965
请注意,{{< glossary_tooltip term_id="kubelet" text="kubelet" >}} 还会在 `/metrics/cadvisor`
6066
`/metrics/resource``/metrics/probes` 端点中公开度量值。这些度量值的生命周期各不相同。
6167

6268
如果你的集群使用了 {{< glossary_tooltip term_id="rbac" text="RBAC" >}},
63-
则读取指标需要通过基于用户、组或 ServiceAccount 的鉴权,要求具有允许访问 `/metrics` 的 ClusterRole。
69+
则读取指标需要通过基于用户、组或 ServiceAccount 的鉴权,要求具有允许访问
70+
`/metrics` 的 ClusterRole。
6471
例如:
6572

6673
```yaml
@@ -78,58 +85,69 @@ rules:
7885
<!--
7986
## Metric lifecycle
8087
81-
Alpha metric → Stable metric → Deprecated metric → Hidden metric → Deletion
82-
83-
Alpha metrics have no stability guarantees; as such they can be modified or deleted at any time.
88+
Alpha metric → Stable metric → Deprecated metric → Hidden metric → Deleted metric
8489
85-
Stable metrics can be guaranteed to not change; Specifically, stability means:
90+
Alpha metrics have no stability guarantees. These metrics can be modified or deleted at any time.
8691
87-
* the metric itself will not be deleted (or renamed)
88-
* the type of metric will not be modified
92+
Stable metrics are guaranteed to not change. This means:
93+
* A stable metric without a deprecated signature will not be deleted or renamed
94+
* A stable metric's type will not be modified
8995
90-
Deprecated metric signal that the metric will eventually be deleted; to find which version, you need to check annotation, which includes from which kubernetes version that metric will be considered deprecated.
96+
Deprecated metrics are slated for deletion, but are still available for use.
97+
These metrics include an annotation about the version in which they became deprecated.
9198
-->
9299
## 指标生命周期
93100
94-
Alpha 指标 → 稳定指标弃用指标隐藏指标删除
101+
Alpha 指标 → 稳定的指标弃用的指标隐藏的指标删除的指标
95102
96-
Alpha 指标没有稳定性保证,因此可以随时对其进行修改或者删除
103+
Alpha 指标没有稳定性保证。这些指标可以随时被修改或者删除
97104
98-
稳定指标可以保证不会改变;具体而言,稳定意味着
105+
稳定的指标可以保证不会改变。这意味着
99106
100-
* 指标本身不会被删除(或重命名)
101-
* 指标的类型不会被更改
107+
* 稳定的、不包含已弃用(deprecated)签名的指标不会被删除(或重命名)
108+
* 稳定的指标的类型不会被更改
102109
103-
已弃用的指标表明该指标最终将被删除;要搞清楚对应版本,你需要检查其注解,
104-
其中包括从哪个 kubernetes 版本开始,将不再考虑该指标
110+
已弃用的指标最终将被删除,不过仍然可用。
111+
这类指标包含注解,标明其被废弃的版本
105112
106-
过期前:
113+
<!--
114+
For example:
107115
108-
```
109-
# HELP some_counter this counts things
110-
# TYPE some_counter counter
111-
some_counter 0
112-
```
116+
* Before deprecation
117+
-->
118+
例如:
113119
114-
过期后:
120+
* 被弃用之前:
115121
116-
```
117-
# HELP some_counter (Deprecated since 1.15.0) this counts things
118-
# TYPE some_counter counter
119-
some_counter 0
120-
```
122+
```
123+
# HELP some_counter this counts things
124+
# TYPE some_counter counter
125+
some_counter 0
126+
```
121127

122128
<!--
123-
Once a metric is hidden then by default the metrics is not published for scraping. To use a hidden metric, you need to override the configuration for the relevant cluster component.
129+
* After deprecation
130+
-->
131+
* 被启用之后:
124132

125-
Once a metric is deleted, the metric is not published. You cannot change this using an override.
133+
```
134+
# HELP some_counter (Deprecated since 1.15.0) this counts things
135+
# TYPE some_counter counter
136+
some_counter 0
137+
```
138+
139+
<!--
140+
Hidden metrics are no longer published for scraping, but are still available for use. To use a hidden metric, please refer to the [Show hidden metrics](#show-hidden-metrics) section.
141+
142+
Deleted metrics are no longer published and cannot be used.
126143
-->
127-
隐藏指标后,默认情况下,该指标不会发布以供抓取。要使用隐藏指标,你需要覆盖相关集群组件的配置。
144+
隐藏的指标不会再被发布以供抓取,但仍然可用。
145+
要使用隐藏指标,请参阅[显式隐藏指标](#show-hidden-metrics)节。
128146

129-
指标一旦删除,就不会发布。你无法通过重载配置来改变这一点
147+
删除的指标不再被发布,亦无法使用
130148

131149
<!--
132-
## Show Hidden Metrics
150+
## Show hidden metrics
133151
134152
As described above, admins can enable hidden metrics through a command-line flag on a specific binary. This intends to be used as an escape hatch for admins if they missed the migration of the metrics deprecated in the last release.
135153
@@ -139,12 +157,13 @@ The flag can only take the previous minor version as it's value. All metrics hid
139157
140158
Take metric `A` as an example, here assumed that `A` is deprecated in 1.n. According to metrics deprecated policy, we can reach the following conclusion:
141159
-->
142-
## 显示隐藏指标
160+
## 显示隐藏指标 {#show-hidden-metrics}
143161

144-
综上所述,管理员可以通过设置可执行文件的命令行参数来启用隐藏指标,
162+
如上所述,管理员可以通过设置可执行文件的命令行参数来启用隐藏指标,
145163
如果管理员错过了上一版本中已经弃用的指标的迁移,则可以把这个用作管理员的逃生门。
146164

147-
`show-hidden-metrics-for-version` 标志接受版本号作为取值,版本号给出你希望显示该发行版本中已弃用的指标。
165+
`show-hidden-metrics-for-version` 标志接受版本号作为取值,版本号给出
166+
你希望显示该发行版本中已弃用的指标。
148167
版本表示为 x.y,其中 x 是主要版本,y 是次要版本。补丁程序版本不是必须的,
149168
即使指标可能会在补丁程序发行版中弃用,原因是指标弃用策略规定仅针对次要版本。
150169

@@ -226,6 +245,60 @@ cloudprovider_gce_api_request_duration_seconds { request = "detach_disk"}
226245
cloudprovider_gce_api_request_duration_seconds { request = "list_disk"}
227246
```
228247

248+
<!--
249+
### kube-scheduler metrics
250+
251+
{{< feature-state for_k8s_version="v1.20" state="alpha" >}}
252+
253+
The scheduler exposes optional metrics that reports the requested resources and the desired limits of all running pods. These metrics can be used to build capacity planning dashboards, assess current or historical scheduling limits, quickly identify workloads that cannot schedule due to lack of resources, and compare actual usage to the pod's request.
254+
-->
255+
### kube-scheduler 指标 {#kube-scheduler-metrics}
256+
257+
{{< feature-state for_k8s_version="v1.20" state="alpha" >}}
258+
259+
调度器会暴露一些可选的指标,报告所有运行中 Pods 所请求的资源和期望的约束值。
260+
这些指标可用来构造容量规划监控面板、访问调度约束的当前或历史数据、
261+
快速发现因为缺少资源而无法被调度的负载,或者将 Pod 的实际资源用量
262+
与其请求值进行比较。
263+
264+
<!--
265+
The kube-scheduler identifies the resource [requests and limits](/docs/concepts/configuration/manage-resources-containers/) configured for each Pod; when either a request or limit is non-zero, the kube-scheduler reports a metrics timeseries. The time series is labelled by:
266+
- namespace
267+
- pod name
268+
- the node where the pod is scheduled or an empty string if not yet scheduled
269+
- priority
270+
- the assigned scheduler for that pod
271+
- the name of the resource (for example, `cpu`)
272+
- the unit of the resource if known (for example, `cores`)
273+
-->
274+
kube-scheduler 组件能够辩识各个 Pod 所配置的资源
275+
[请求和约束](/zh/docs/concepts/configuration/manage-resources-containers/)
276+
在 Pod 的资源请求值或者约束值非零时,kube-scheduler 会以度量值时间序列的形式
277+
生成报告。该时间序列值包含以下标签:
278+
- 名字空间
279+
- Pod 名称
280+
- Pod 调度所处节点,或者当 Pod 未被调度时用空字符串表示
281+
- 优先级
282+
- 为 Pod 所指派的调度器
283+
- 资源的名称(例如,`cpu`
284+
- 资源的单位,如果知道的话(例如,`cores`
285+
286+
<!--
287+
Once a pod reaches completion (has a `restartPolicy` of `Never` or `OnFailure` and is in the `Succeeded` or `Failed` pod phase, or has been deleted and all containers have a terminated state) the series is no longer reported since the scheduler is now free to schedule other pods to run. The two metrics are called `kube_pod_resource_request` and `kube_pod_resource_limit`.
288+
289+
The metrics are exposed at the HTTP endpoint `/metrics/resources` and require the same authorization as the `/metrics`
290+
endpoint on the scheduler. You must use the `--show-hidden-metrics-for-version=1.20` flag to expose these alpha stability metrics.
291+
-->
292+
一旦 Pod 进入完成状态(其 `restartPolicy``Never``OnFailure`,且
293+
其处于 `Succeeded``Failed` Pod 阶段,或者已经被删除且所有容器都具有
294+
终止状态),该时间序列停止报告,因为调度器现在可以调度其它 Pod 来执行。
295+
这两个指标称作 `kube_pod_resource_request``kube_pod_resource_limit`
296+
297+
指标暴露在 HTTP 端点 `/metrics/resources`,与调度器上的 `/metrics` 端点
298+
一样要求相同的访问授权。你必须使用
299+
`--show-hidden-metrics-for-version=1.20` 标志才能暴露那些稳定性为 Alpha
300+
的指标。
301+
229302
## {{% heading "whatsnext" %}}
230303

231304
<!--
@@ -234,5 +307,6 @@ cloudprovider_gce_api_request_duration_seconds { request = "list_disk"}
234307
* Read about the [Kubernetes deprecation policy](/docs/reference/using-api/deprecation-policy/#deprecating-a-feature-or-behavior)
235308
-->
236309
* 阅读有关指标的 [Prometheus 文本格式](https://github.com/prometheus/docs/blob/master/content/docs/instrumenting/exposition_formats.md#text-based-format)
237-
* 查看 [Kubernetes 稳定指标](https://github.com/kubernetes/kubernetes/blob/master/test/instrumentation/testdata/stable-metrics-list.yaml)的列表
310+
* 查看 [Kubernetes 稳定指标](https://github.com/kubernetes/kubernetes/blob/master/test/instrumentation/testdata/stable-metrics-list.yaml)
311+
的列表
238312
* 阅读有关 [Kubernetes 弃用策略](/zh/docs/reference/using-api/deprecation-policy/#deprecating-a-feature-or-behavior)

0 commit comments

Comments
 (0)