You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: content/zh-cn/docs/concepts/cluster-administration/system-metrics.md
+87-39Lines changed: 87 additions & 39 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -17,7 +17,8 @@ weight: 70
17
17
<!-- overview -->
18
18
19
19
<!--
20
-
System component metrics can give a better look into what is happening inside them. Metrics are particularly useful for building dashboards and alerts.
20
+
System component metrics can give a better look into what is happening inside them. Metrics are
21
+
particularly useful for building dashboards and alerts.
21
22
22
23
Kubernetes components emit metrics in [Prometheus format](https://prometheus.io/docs/instrumenting/exposition_formats/).
23
24
This format is structured plain text, designed so that people and machines can both read it.
In most cases metrics are available on `/metrics` endpoint of the HTTP server. For components that doesn't expose endpoint by default it can be enabled using `--bind-address` flag.
37
+
In most cases metrics are available on `/metrics` endpoint of the HTTP server. For components that
38
+
doesn't expose endpoint by default it can be enabled using `--bind-address` flag.
In a production environment you may want to configure [Prometheus Server](https://prometheus.io/) or some other metrics scraper
55
-
to periodically gather these metrics and make them available in some kind of time series database.
56
+
In a production environment you may want to configure [Prometheus Server](https://prometheus.io/)
57
+
or some other metrics scraper to periodically gather these metrics and make them available in some
58
+
kind of time series database.
56
59
57
-
Note that {{< glossary_tooltip term_id="kubelet" text="kubelet" >}} also exposes metrics in `/metrics/cadvisor`, `/metrics/resource` and `/metrics/probes` endpoints. Those metrics do not have same lifecycle.
60
+
Note that {{< glossary_tooltip term_id="kubelet" text="kubelet" >}} also exposes metrics in
61
+
`/metrics/cadvisor`, `/metrics/resource` and `/metrics/probes` endpoints. Those metrics do not
62
+
have same lifecycle.
58
63
59
-
If your cluster uses {{< glossary_tooltip term_id="rbac" text="RBAC" >}}, reading metrics requires authorization via a user, group or ServiceAccount with a ClusterRole that allows accessing `/metrics`.
60
-
For example:
64
+
If your cluster uses {{< glossary_tooltip term_id="rbac" text="RBAC" >}}, reading metrics requires
65
+
authorization via a user, group or ServiceAccount with a ClusterRole that allows accessing
@@ -96,7 +102,7 @@ Stable metrics are guaranteed to not change. This means:
96
102
Deprecated metrics are slated for deletion, but are still available for use.
97
103
These metrics include an annotation about the version in which they became deprecated.
98
104
-->
99
-
## 指标生命周期
105
+
## 指标生命周期 {#metric-lifecycle}
100
106
101
107
Alpha 指标 → 稳定的指标 → 弃用的指标 → 隐藏的指标 → 删除的指标
102
108
@@ -137,7 +143,8 @@ For example:
137
143
```
138
144
139
145
<!--
140
-
Hidden metrics are no longer published for scraping, but are still available for use. To use a hidden metric, please refer to the [Show hidden metrics](#show-hidden-metrics) section.
146
+
Hidden metrics are no longer published for scraping, but are still available for use. To use a
147
+
hidden metric, please refer to the [Show hidden metrics](#show-hidden-metrics) section.
141
148
142
149
Deleted metrics are no longer published and cannot be used.
143
150
-->
@@ -149,13 +156,21 @@ Deleted metrics are no longer published and cannot be used.
149
156
<!--
150
157
## Show hidden metrics
151
158
152
-
As described above, admins can enable hidden metrics through a command-line flag on a specific binary. This intends to be used as an escape hatch for admins if they missed the migration of the metrics deprecated in the last release.
159
+
As described above, admins can enable hidden metrics through a command-line flag on a specific
160
+
binary. This intends to be used as an escape hatch for admins if they missed the migration of the
161
+
metrics deprecated in the last release.
153
162
154
-
The flag `show-hidden-metrics-for-version` takes a version for which you want to show metrics deprecated in that release. The version is expressed as x.y, where x is the major version, y is the minor version. The patch version is not needed even though a metrics can be deprecated in a patch release, the reason for that is the metrics deprecation policy runs against the minor release.
163
+
The flag `show-hidden-metrics-for-version` takes a version for which you want to show metrics
164
+
deprecated in that release. The version is expressed as x.y, where x is the major version, y is
165
+
the minor version. The patch version is not needed even though a metrics can be deprecated in a
166
+
patch release, the reason for that is the metrics deprecation policy runs against the minor release.
155
167
156
-
The flag can only take the previous minor version as it's value. All metrics hidden in previous will be emitted if admins set the previous version to `show-hidden-metrics-for-version`. The too old version is not allowed because this violates the metrics deprecated policy.
168
+
The flag can only take the previous minor version as it's value. All metrics hidden in previous
169
+
will be emitted if admins set the previous version to `show-hidden-metrics-for-version`. The too
170
+
old version is not allowed because this violates the metrics deprecated policy.
157
171
158
-
Take metric `A` as an example, here assumed that `A` is deprecated in 1.n. According to metrics deprecated policy, we can reach the following conclusion:
172
+
Take metric `A` as an example, here assumed that `A` is deprecated in 1.n. According to metrics
173
+
deprecated policy, we can reach the following conclusion:
159
174
-->
160
175
## 显示隐藏指标 {#show-hidden-metrics}
161
176
@@ -174,10 +189,13 @@ Take metric `A` as an example, here assumed that `A` is deprecated in 1.n. Accor
174
189
175
190
<!--
176
191
* In release `1.n`, the metric is deprecated, and it can be emitted by default.
177
-
* In release `1.n+1`, the metric is hidden by default and it can be emitted by command line `show-hidden-metrics-for-version=1.n`.
192
+
* In release `1.n+1`, the metric is hidden by default and it can be emitted by command line
193
+
`show-hidden-metrics-for-version=1.n`.
178
194
* In release `1.n+2`, the metric should be removed from the codebase. No escape hatch anymore.
179
195
180
-
If you're upgrading from release `1.12` to `1.13`, but still depend on a metric `A` deprecated in `1.12`, you should set hidden metrics via command line: `--show-hidden-metrics=1.12` and remember to remove this metric dependency before upgrading to `1.14`
196
+
If you're upgrading from release `1.12` to `1.13`, but still depend on a metric `A` deprecated in
197
+
`1.12`, you should set hidden metrics via command line: `--show-hidden-metrics=1.12` and remember
198
+
to remove this metric dependency before upgrading to `1.14`
@@ -189,13 +207,20 @@ If you're upgrading from release `1.12` to `1.13`, but still depend on a metric
189
207
<!--
190
208
## Disable accelerator metrics
191
209
192
-
The kubelet collects accelerator metrics through cAdvisor. To collect these metrics, for accelerators like NVIDIA GPUs, kubelet held an open handle on the driver. This meant that in order to perform infrastructure changes (for example, updating the driver), a cluster administrator needed to stop the kubelet agent.
210
+
The kubelet collects accelerator metrics through cAdvisor. To collect these metrics, for
211
+
accelerators like NVIDIA GPUs, kubelet held an open handle on the driver. This meant that in order
212
+
to perform infrastructure changes (for example, updating the driver), a cluster administrator
213
+
needed to stop the kubelet agent.
193
214
194
-
The responsibility for collecting accelerator metrics now belongs to the vendor rather than the kubelet. Vendors must provide a container that collects metrics and exposes them to the metrics service (for example, Prometheus).
215
+
The responsibility for collecting accelerator metrics now belongs to the vendor rather than the
216
+
kubelet. Vendors must provide a container that collects metrics and exposes them to the metrics
217
+
service (for example, Prometheus).
195
218
196
-
The [`DisableAcceleratorUsageMetrics` feature gate](/docs/reference/command-line-tools-reference/feature-gates/) disables metrics collected by the kubelet, with a [timeline for enabling this feature by default](https://github.com/kubernetes/enhancements/tree/411e51027db842355bd489691af897afc1a41a5e/keps/sig-node/1867-disable-accelerator-usage-metrics#graduation-criteria).
219
+
The [`DisableAcceleratorUsageMetrics` feature gate](/docs/reference/command-line-tools-reference/feature-gates/)
220
+
disables metrics collected by the kubelet, with a
221
+
[timeline for enabling this feature by default](https://github.com/kubernetes/enhancements/tree/411e51027db842355bd489691af897afc1a41a5e/keps/sig-node/1867-disable-accelerator-usage-metrics#graduation-criteria).
The scheduler exposes optional metrics that reports the requested resources and the desired limits of all running pods. These metrics can be used to build capacity planning dashboards, assess current or historical scheduling limits, quickly identify workloads that cannot schedule due to lack of resources, and compare actual usage to the pod's request.
282
+
The scheduler exposes optional metrics that reports the requested resources and the desired limits
283
+
of all running pods. These metrics can be used to build capacity planning dashboards, assess
284
+
current or historical scheduling limits, quickly identify workloads that cannot schedule due to
285
+
lack of resources, and compare actual usage to the pod's request.
257
286
-->
258
287
调度器会暴露一些可选的指标,报告所有运行中 Pods 所请求的资源和期望的约束值。
259
288
这些指标可用来构造容量规划监控面板、访问调度约束的当前或历史数据、
260
289
快速发现因为缺少资源而无法被调度的负载,或者将 Pod 的实际资源用量
261
290
与其请求值进行比较。
262
291
263
292
<!--
264
-
The kube-scheduler identifies the resource [requests and limits](/docs/concepts/configuration/manage-resources-containers/) configured for each Pod; when either a request or limit is non-zero, the kube-scheduler reports a metrics timeseries. The time series is labelled by:
293
+
The kube-scheduler identifies the resource [requests and limits](/docs/concepts/configuration/manage-resources-containers/)
294
+
configured for each Pod; when either a request or limit is non-zero, the kube-scheduler reports a
295
+
metrics timeseries. The time series is labelled by:
265
296
- namespace
266
297
- pod name
267
298
- the node where the pod is scheduled or an empty string if not yet scheduled
@@ -283,10 +314,14 @@ kube-scheduler 组件能够辩识各个 Pod 所配置的资源
283
314
- 资源的单位,如果知道的话(例如,`cores`)
284
315
285
316
<!--
286
-
Once a pod reaches completion (has a `restartPolicy` of `Never` or `OnFailure` and is in the `Succeeded` or `Failed` pod phase, or has been deleted and all containers have a terminated state) the series is no longer reported since the scheduler is now free to schedule other pods to run. The two metrics are called `kube_pod_resource_request` and `kube_pod_resource_limit`.
287
-
288
-
The metrics are exposed at the HTTP endpoint `/metrics/resources` and require the same authorization as the `/metrics`
289
-
endpoint on the scheduler. You must use the `-show-hidden-metrics-for-version=1.20` flag to expose these alpha stability metrics.
317
+
Once a pod reaches completion (has a `restartPolicy` of `Never` or `OnFailure` and is in the
318
+
`Succeeded` or `Failed` pod phase, or has been deleted and all containers have a terminated state)
319
+
the series is no longer reported since the scheduler is now free to schedule other pods to run.
320
+
The two metrics are called `kube_pod_resource_request` and `kube_pod_resource_limit`.
321
+
322
+
The metrics are exposed at the HTTP endpoint `/metrics/resources` and require the same
323
+
authorization as the `/metrics` endpoint on the scheduler. You must use the
324
+
`-show-hidden-metrics-for-version=1.20` flag to expose these alpha stability metrics.
290
325
-->
291
326
一旦 Pod 进入完成状态(其 `restartPolicy` 为 `Never` 或 `OnFailure`,且
292
327
其处于 `Succeeded` 或 `Failed` Pod 阶段,或者已经被删除且所有容器都具有
@@ -301,7 +336,9 @@ endpoint on the scheduler. You must use the `-show-hidden-metrics-for-version=1.
301
336
<!--
302
337
## Disabling metrics
303
338
304
-
You can explicitly turn off metrics via command line flag `--disabled-metrics`. This may be desired if, for example, a metric is causing a performance problem. The input is a list of disabled metrics (i.e. `--disabled-metrics=metric1,metric2`).
339
+
You can explicitly turn off metrics via command line flag `--disabled-metrics`. This may be
340
+
desired if, for example, a metric is causing a performance problem. The input is a list of
@@ -312,7 +349,9 @@ You can explicitly turn off metrics via command line flag `--disabled-metrics`.
312
349
<!--
313
350
## Metric cardinality enforcement
314
351
315
-
Metrics with unbounded dimensions could cause memory issues in the components they instrument. To limit resource use, you can use the `--allow-label-value` command line option to dynamically configure an allow-list of label values for a metric.
352
+
Metrics with unbounded dimensions could cause memory issues in the components they instrument. To
353
+
limit resource use, you can use the `--allow-label-value` command line option to dynamically
354
+
configure an allow-list of label values for a metric.
316
355
-->
317
356
## 指标顺序性保证 {#metric-cardinality-enforcement}
318
357
@@ -322,22 +361,31 @@ Metrics with unbounded dimensions could cause memory issues in the components th
* Read about the [Prometheus text format](https://github.com/prometheus/docs/blob/master/content/docs/instrumenting/exposition_formats.md#text-based-format) for metrics
387
+
* Read about the [Prometheus text format](https://github.com/prometheus/docs/blob/master/content/docs/instrumenting/exposition_formats.md#text-based-format)
388
+
for metrics
341
389
* Read about the [Kubernetes deprecation policy](/docs/reference/using-api/deprecation-policy/#deprecating-a-feature-or-behavior)
0 commit comments