Skip to content

Commit 4bcce78

Browse files
authored
Merge pull request kubernetes#1935 from ehashman/metrics-overhaul
Mark SIG Instrumentation metrics overhaul KEP as implemented
2 parents 9d7a75d + 7d11d53 commit 4bcce78

File tree

1 file changed

+68
-13
lines changed

1 file changed

+68
-13
lines changed

keps/sig-instrumentation/20181106-kubernetes-metrics-overhaul.md

Lines changed: 68 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@ approvers:
1313
editor: "@DirectXMan12"
1414
creation-date: 2018-11-06
1515
last-updated: 2019-03-21
16-
status: implementable
16+
status: implemented
1717
---
1818

1919
# Kubernetes Metrics Overhaul
@@ -31,8 +31,6 @@ status: implementable
3131
- [Changing API latency histogram buckets](#changing-api-latency-histogram-buckets)
3232
- [Kubelet metric changes](#kubelet-metric-changes)
3333
- [Make metrics aggregatable](#make-metrics-aggregatable)
34-
- [Export less metrics](#export-less-metrics)
35-
- [Prevent apiserver's metrics from accidental registration](#prevent-apiservers-metrics-from-accidental-registration)
3634
- [Prober metrics](#prober-metrics)
3735
- [Kube-scheduler metric changes](#kube-scheduler-metric-changes)
3836
- [Kube-proxy metric changes](#kube-proxy-metric-changes)
@@ -46,9 +44,15 @@ status: implementable
4644
- [Workqueue metrics](#workqueue-metrics)
4745
- [Convert latency/latencies in metrics name to duration](#convert-latencylatencies-in-metrics-name-to-duration)
4846
- [Risks and Mitigations](#risks-and-mitigations)
47+
- [Test Plan](#test-plan)
4948
- [Deprecation Plan](#deprecation-plan)
5049
- [Graduation Criteria](#graduation-criteria)
5150
- [Implementation History](#implementation-history)
51+
- [1.14](#114)
52+
- [1.15](#115)
53+
- [1.16](#116)
54+
- [1.17](#117)
55+
- [Not attached to a release milestone](#not-attached-to-a-release-milestone)
5256
<!-- /toc -->
5357

5458
## Summary
@@ -91,7 +95,7 @@ As Kubernetes currently rewrites meta labels of containers to “well-known” `
9195

9296
API server histogram latency buckets run from 125ms to 8s. This range does not accurately model most API server request latencies, which could run as low as 1ms for GETs or as high as 60s before hitting the API server global timeout.
9397

94-
https://github.com/kubernetes/kubernetes/pull/67476
98+
https://github.com/kubernetes/kubernetes/pull/73638
9599

96100
### Kubelet metric changes
97101

@@ -107,18 +111,12 @@ https://github.com/kubernetes/kubernetes/pull/72470
107111

108112
https://github.com/kubernetes/kubernetes/pull/73820
109113

110-
#### Export less metrics
111-
112-
https://github.com/kubernetes/kubernetes/issues/68522
113-
114-
#### Prevent apiserver's metrics from accidental registration
115-
116-
https://github.com/kubernetes/kubernetes/pull/63924
117-
118114
#### Prober metrics
119115

120116
Make prober metrics introduced in https://github.com/kubernetes/kubernetes/pull/61369 conform to the [Kubernetes instrumentation guidelines](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-instrumentation/instrumentation.md).
121117

118+
https://github.com/kubernetes/kubernetes/pull/76074
119+
122120
### Kube-scheduler metric changes
123121

124122
https://github.com/kubernetes/kubernetes/pull/72332
@@ -169,6 +167,10 @@ Risks include users upgrading Kubernetes, but not updating their usage of Kubern
169167

170168
To prevent this, we will implement recording rules for Prometheus that allow best effort backward compatibility as well as update uses of breaking metric usages in the [Kubernetes monitoring mixin](https://github.com/kubernetes-monitoring/kubernetes-mixin), a widely used collection of Prometheus alerts and Grafana dashboards for Kubernetes.
171169

170+
## Test Plan
171+
172+
Each individual change for this KEP must be accompanied with appropriate unit tests. As the scope of changes are provided are on the level of individual metrics, integration testing is not required.
173+
172174
## Deprecation Plan
173175

174176
In our efforts to change existing old metrics, we flag them `(Deprecated)` in the front of metrics help text.
@@ -177,10 +179,63 @@ These old metrics will be deprecated in v1.14 and coexist with the new replaceme
177179

178180
The release target of removing the deprecated metrics is v1.15.
179181

182+
Prior to removing deprecated metrics, we will attend appropriate community meetings (i.e. SIG Node) to provide sufficient notice.
183+
180184
## Graduation Criteria
181185

182186
All metrics exposed by components from kubernetes/kubernetes follow Prometheus best practices and (nice to have) tooling is built and enabled in CI to prevent simple violations of said best practices.
183187

184188
## Implementation History
185189

186-
Multiple pull requests have already been opened, but not merged as of writing of this document.
190+
As of release 1.17, this KEP is considered fully implemented.
191+
192+
### 1.14
193+
194+
- Use prometheus conventions for workqueue metrics [#71300](https://github.com/kubernetes/kubernetes/pull/71300)
195+
[@danielqsj](https://github.com/danielqsj) 2018-12-31
196+
- Change scheduler metrics to conform metrics guidelines [#72332](https://github.com/kubernetes/kubernetes/pull/72332)
197+
[@danielqsj](https://github.com/danielqsj) 2019-01-14
198+
- Change apiserver metrics to conform metrics guidelines [#72336](https://github.com/kubernetes/kubernetes/pull/72336)
199+
[@danielqsj](https://github.com/danielqsj) 2019-01-17
200+
- Change proxy metrics to conform metrics guidelines [#72334](https://github.com/kubernetes/kubernetes/pull/72334)
201+
[@danielqsj](https://github.com/danielqsj) 2019-01-25
202+
- Fix admission metrics in true units [#72343](https://github.com/kubernetes/kubernetes/pull/72343)
203+
[@danielqsj](https://github.com/danielqsj) 2019-01-28
204+
- Adjust buckets in apiserver request latency metrics [#73638](https://github.com/kubernetes/kubernetes/pull/73638)
205+
[@wojtek-t](https://github.com/wojtek-t) 2019-02-04
206+
- Change docker metrics to conform metrics guidelines [#72323](https://github.com/kubernetes/kubernetes/pull/72323)
207+
[@danielqsj](https://github.com/danielqsj) 2019-02-06
208+
- Change kubelet metrics to conform metrics guidelines [#72470](https://github.com/kubernetes/kubernetes/pull/72470)
209+
[@danielqsj](https://github.com/danielqsj) 2019-02-18
210+
- Rename cadvisor metric labels to match instrumentation guidelines [#69099](https://github.com/kubernetes/kubernetes/pull/69099)
211+
[@ehashman](https://github.com/ehashman) 2019-02-22
212+
- Fit RuntimeClass metrics to prometheus conventions [#73820](https://github.com/kubernetes/kubernetes/pull/73820)
213+
[@haiyanmeng](https://github.com/haiyanmeng) 2019-02-22
214+
- Convert latency/latencies in metrics name to duration [#74418](https://github.com/kubernetes/kubernetes/pull/74418)
215+
[@danielqsj](https://github.com/danielqsj) 2019-03-01
216+
- Clean the deprecated metrics which introduced recently [#75023](https://github.com/kubernetes/kubernetes/pull/75023)
217+
[@danielqsj](https://github.com/danielqsj) 2019-03-07
218+
219+
### 1.15
220+
221+
- Remove the deprecated admission metrics [#75279](https://github.com/kubernetes/kubernetes/pull/75279)
222+
[@danielqsj](https://github.com/danielqsj) 2019-03-20
223+
- Change kubelet probe metrics to counter [#76074](https://github.com/kubernetes/kubernetes/pull/76074)
224+
[@danielqsj](https://github.com/danielqsj) 2019-04-12
225+
226+
### 1.16
227+
228+
- Drop deprecated cadvisor metric labels [#80376](https://github.com/kubernetes/kubernetes/pull/80376)
229+
[@ehashman](https://github.com/ehashman) 2019-08-14
230+
231+
### 1.17
232+
233+
- Turn off apiserver deprecated metrics [#83837](https://github.com/kubernetes/kubernetes/pull/83837)
234+
[@RainbowMango](https://github.com/RainbowMango) 2019-11-16
235+
- Turn off kubelet deprecated metrics [#83841](https://github.com/kubernetes/kubernetes/pull/83841)
236+
[@RainbowMango](https://github.com/RainbowMango) 2019-12-09
237+
238+
### Not attached to a release milestone
239+
240+
- Introduce promlint to guarantee metrics follow Prometheus best practices [#86477](https://github.com/kubernetes/kubernetes/pull/86477)
241+
[@RainbowMango](https://github.com/RainbowMango) 2020-05-25

0 commit comments

Comments
 (0)