|
| 1 | +--- |
| 2 | +title: Metrics Stability Framework to Beta |
| 3 | +authors: |
| 4 | + - "@logicalhan" |
| 5 | + - "@RainbowMango" |
| 6 | +owning-sig: sig-instrumentation |
| 7 | +participating-sigs: |
| 8 | + - sig-instrumentation |
| 9 | +reviewers: |
| 10 | + - "@brancz" |
| 11 | +approvers: |
| 12 | + - "@brancz" |
| 13 | +editor: "@brancz" |
| 14 | +creation-date: 2019-10-28 |
| 15 | +last-updated: 2019-10-28 |
| 16 | +status: implementable |
| 17 | +see-also: |
| 18 | + - 20181106-kubernetes-metrics-overhaul |
| 19 | + - 20190404-kubernetes-control-plane-metrics-stability |
| 20 | + - 20190605-metrics-stability-migration |
| 21 | + - 20190605-metrics-validation-and-verification |
| 22 | +--- |
| 23 | + |
| 24 | +# Metrics Stability Framework to Beta |
| 25 | + |
| 26 | +## Table of Contents |
| 27 | + |
| 28 | +<!-- toc --> |
| 29 | +- [Summary](#summary) |
| 30 | +- [Motivation](#motivation) |
| 31 | + - [Goals](#goals) |
| 32 | + - [Non-Goals](#non-goals) |
| 33 | +- [Proposal](#proposal) |
| 34 | + - [Remove Prometheus Registry](#remove-prometheus-registry) |
| 35 | + - [Validated Import Restriction](#validated-import-restriction) |
| 36 | + - [Deprecate Metrics](#deprecate-metrics) |
| 37 | + - [Escape Hatch](#escape-hatch) |
| 38 | +- [Graduation Criteria](#graduation-criteria) |
| 39 | +- [Post-Beta tasks](#post-beta-tasks) |
| 40 | +- [Implementation History](#implementation-history) |
| 41 | + - [Metrics Stability Framework](#metrics-stability-framework) |
| 42 | + - [Metrics Stability Migration](#metrics-stability-migration) |
| 43 | + - [Metrics Validation And Restriction](#metrics-validation-and-restriction) |
| 44 | + - [Deprecate Metrics](#deprecate-metrics-1) |
| 45 | + - [Escape Flag](#escape-flag) |
| 46 | +<!-- /toc --> |
| 47 | + |
| 48 | +## Summary |
| 49 | + |
| 50 | +The metrics stability framework has been added to the Kubernetes project as a way to annotate metrics with a supported stability level. Depending on the stability level of a metric, there are some guarantees one can expect as a consumer (i.e. ingester) of a given metric. This document outline required steps to graduate it to Beta. |
| 51 | + |
| 52 | +## Motivation |
| 53 | + |
| 54 | +The metrics stability framework is currently used for defining metrics stability levels for metrics in OSS Kubernetes. The motivation |
| 55 | +of this KEP is to address those feature requests and bug reports to move this feature to the Beta level. |
| 56 | + |
| 57 | +### Goals |
| 58 | + |
| 59 | +These are the planned changes for Beta feature graduation: |
| 60 | + |
| 61 | +* No Kubernetes binaries register metrics to prometheus registries directly. |
| 62 | +* There is a validated import restriction on all kubernetes binaries (except `component-base/metrics`) such that we will fail, in a precommit phase, a direct import of prometheus in kubernetes. This forces all metrics related code to go through the metrics stability framework. |
| 63 | +* All currently deprecated metrics are deprecated using the `DeprecatedVersion` field of metrics options struct. |
| 64 | +* All Kubernetes binaries should have a command flag `--show-hidden-metrics` by which cluster admins can show metrics deprecated in last minor release. |
| 65 | + |
| 66 | +### Non-Goals |
| 67 | + |
| 68 | +These are the issues considered and rejected for Beta: |
| 69 | + |
| 70 | +* Being able to individually turn off a metric (this will be a GA feature). |
| 71 | + |
| 72 | +## Proposal |
| 73 | + |
| 74 | +### Remove Prometheus Registry |
| 75 | +In order to achieve the first goal: no binaries will register metrics to prometheus registries directly, we must have a plan for migrating metrics which are defined through the `prometheus.Collector` interface. These metrics currently do not have a way to express a stability level. @RainbowMango has a [PR with an implementation of how we may accomplish this](https://github.com/kubernetes/kubernetes/pull/83062/). Alternatively, we can just default all metrics which are defined through a custom `prometheus.Collector` as falling under stability level ALPHA, i.e. they do not offer stability guarantees. This buys us runway in bridging over to a solution like the one @RainbowMango proposes. |
| 76 | + |
| 77 | +### Validated Import Restriction |
| 78 | +We also want to validate that direct prometheus imports are no longer possible in Kubernetes outside of component-base/metrics. This will force metric definition to occur within the stability framework and allow us to provide the guarantees that we intend. @serathius has some ideas in a [PR here](https://github.com/kubernetes/kubernetes/pull/84302). |
| 79 | + |
| 80 | +### Deprecate Metrics |
| 81 | +The goal merely requires migrating over deprecated metrics from [PR](tdb). |
| 82 | + |
| 83 | +### Escape Hatch |
| 84 | +We should add a command flag, such as `--show-hidden-metrics`, to each Kubernetes binaries. |
| 85 | +This is to provide cluster admins an escape hatch to properly migrate off of a deprecated metric, if they were not able to react to the earlier deprecation warnings. |
| 86 | + |
| 87 | + |
| 88 | +## Graduation Criteria |
| 89 | + |
| 90 | +To mark these as complete, all of the above features need to be implemented. |
| 91 | +An [umbrella issue](https://github.com/kubernetes/kubernetes/issues/tdb) is tracking all of these changes. |
| 92 | +Also there need to be sufficient tests for any of these new features and all existing features and documentation should be completed for all features. |
| 93 | + |
| 94 | +There are still open questions that need to be addressed and updated in this KEP before graduation: |
| 95 | + |
| 96 | +## Post-Beta tasks |
| 97 | + |
| 98 | +These are related Post-GA tasks: |
| 99 | + |
| 100 | +* |
| 101 | + |
| 102 | +## Implementation History |
| 103 | + |
| 104 | +### Metrics Stability Framework |
| 105 | +- Setup framework |
| 106 | + - [x] https://github.com/kubernetes/kubernetes/pull/77037 (by @logicalhan) |
| 107 | + - [x] https://github.com/kubernetes/kubernetes/pull/77618 (by @logicalhan) |
| 108 | + - [x] https://github.com/kubernetes/kubernetes/pull/78773 (by @logicalhan) |
| 109 | + - [x] https://github.com/kubernetes/kubernetes/pull/78867 (by @logicalhan) |
| 110 | + - [x] https://github.com/kubernetes/kubernetes/pull/78877 (by @logicalhan) |
| 111 | + - [x] https://github.com/kubernetes/kubernetes/pull/79237 (by @logicalhan) |
| 112 | + - [x] https://github.com/kubernetes/kubernetes/pull/81190 (by @logicalhan) |
| 113 | + - [x] https://github.com/kubernetes/kubernetes/pull/81395 (by @logicalhan) |
| 114 | + - [x] https://github.com/kubernetes/kubernetes/pull/81579 (by @logicalhan) |
| 115 | + - [x] https://github.com/kubernetes/kubernetes/pull/81608 (by @logicalhan) |
| 116 | +- Introduce bucket functionality |
| 117 | + - [x] https://github.com/kubernetes/kubernetes/pull/82583 (by @RainbowMango) |
| 118 | +- Deal with stability default level |
| 119 | + - [x] https://github.com/kubernetes/kubernetes/pull/82957 (by @RainbowMango) |
| 120 | +- Introduce label functionality |
| 121 | + - [x] https://github.com/kubernetes/kubernetes/pull/83019 (by @RainbowMango) |
| 122 | +- Introduce test util: |
| 123 | + - [x] https://github.com/kubernetes/kubernetes/pull/83299 (by @RainbowMango) |
| 124 | + - [x] https://github.com/kubernetes/kubernetes/pull/83699 (by @RainbowMango) |
| 125 | +- Introduce http handler functionality |
| 126 | + - [x] https://github.com/kubernetes/kubernetes/pull/83722 (by @RainbowMango) |
| 127 | +- Introduce GaugeFunc |
| 128 | + - [X] https://github.com/kubernetes/kubernetes/pull/83830 (by @RainbowMango) |
| 129 | +- Introduce custom collector |
| 130 | + - [ ] https://github.com/kubernetes/kubernetes/pull/83062 (by @RainbowMango) |
| 131 | +- Cleanup |
| 132 | + - [ ] https://github.com/kubernetes/kubernetes/pull/84135 (by @RainbowMango) |
| 133 | + - [x] https://github.com/kubernetes/kubernetes/pull/81432 (by @logicalhan) |
| 134 | +- Bug fix |
| 135 | + - [x] https://github.com/kubernetes/kubernetes/pull/84395 (by @RainbowMango) |
| 136 | + |
| 137 | +### Metrics Stability Migration |
| 138 | +- General Migration |
| 139 | + - [x] for shared metrics: https://github.com/kubernetes/kubernetes/pull/81173 (by @logicalhan) |
| 140 | + - [x] for apiserver: https://github.com/kubernetes/kubernetes/pull/81531 (by @logicalhan) |
| 141 | + - [x] for kubelet: https://github.com/kubernetes/kubernetes/pull/81534 (by @logicalhan) |
| 142 | + - [x] for scheduler: https://github.com/kubernetes/kubernetes/pull/81576 (by @logicalhan) |
| 143 | + - [x] for controller manager: https://github.com/kubernetes/kubernetes/pull/81624 (by @logicalhan) |
| 144 | + - [x] for kube-proxy: https://github.com/kubernetes/kubernetes/pull/81626 (by @logicalhan) |
| 145 | + - [x] for etcd version monitor: https://github.com/kubernetes/kubernetes/pull/83283 (by @RainbowMango) |
| 146 | + - [ ] for metrics validation framework: https://github.com/kubernetes/kubernetes/pull/84500 (by @RainbowMango) |
| 147 | +- Migrate bucket functionality |
| 148 | + - [x] https://github.com/kubernetes/kubernetes/pull/82626 (by @RainbowMango) |
| 149 | + - [x] https://github.com/kubernetes/kubernetes/pull/82630 (by @RainbowMango) |
| 150 | + - [x] https://github.com/kubernetes/kubernetes/pull/82736 (by @RainbowMango) |
| 151 | + - [x] https://github.com/kubernetes/kubernetes/pull/82737 (by @RainbowMango) |
| 152 | + - [x] https://github.com/kubernetes/kubernetes/pull/82741 (by @RainbowMango) |
| 153 | + - [x] https://github.com/kubernetes/kubernetes/pull/82745 (by @RainbowMango) |
| 154 | +- Migrate bucket functionality |
| 155 | + - [x] https://github.com/kubernetes/kubernetes/pull/83159 (by @RainbowMango) |
| 156 | + - [x] https://github.com/kubernetes/kubernetes/pull/83220 (by @RainbowMango) |
| 157 | + - [x] https://github.com/kubernetes/kubernetes/pull/83223 (by @RainbowMango) |
| 158 | + - [x] https://github.com/kubernetes/kubernetes/pull/83269 (by @RainbowMango) |
| 159 | + - [x] https://github.com/kubernetes/kubernetes/pull/83278 (by @RainbowMango) |
| 160 | + - [x] https://github.com/kubernetes/kubernetes/pull/83279 (by @RainbowMango) |
| 161 | +- Migrate or refactor test case |
| 162 | + - [x] https://github.com/kubernetes/kubernetes/pull/83611 (by @RainbowMango) |
| 163 | + - [x] https://github.com/kubernetes/kubernetes/pull/83678 (by @RainbowMango) |
| 164 | + - [x] https://github.com/kubernetes/kubernetes/pull/83713 (by @RainbowMango) |
| 165 | + - [ ] https://github.com/kubernetes/kubernetes/pull/83664 (by @RainbowMango) |
| 166 | + - [x] https://github.com/kubernetes/kubernetes/pull/84283 (by @serathius) |
| 167 | +- Migrate promhttp |
| 168 | + - [ ] https://github.com/kubernetes/kubernetes/pull/84393 (by @wuyafang) |
| 169 | + - [x] https://github.com/kubernetes/kubernetes/pull/84221 (by @wuyafang) |
| 170 | + |
| 171 | +### Metrics Validation And Restriction |
| 172 | +- [x] https://github.com/kubernetes/kubernetes/pull/80803 (by @serathius) |
| 173 | +- [x] https://github.com/kubernetes/kubernetes/pull/80906 (by @serathius) |
| 174 | +- [x] https://github.com/kubernetes/kubernetes/pull/81510 (by @serathius) |
| 175 | +- [ ] https://github.com/kubernetes/kubernetes/pull/84302 (by @serathius) |
| 176 | +- [ ] https://github.com/kubernetes/kubernetes/pull/84373 (by @serathius) |
| 177 | +- [ ] https://github.com/kubernetes/kubernetes/pull/84378 (by @serathius) |
| 178 | + |
| 179 | +### Deprecate Metrics |
| 180 | +- [ ] https://github.com/kubernetes/kubernetes/pull/83836 (by @RainbowMango) |
| 181 | +- [ ] https://github.com/kubernetes/kubernetes/pull/83837 (by @RainbowMango) |
| 182 | +- [ ] https://github.com/kubernetes/kubernetes/pull/83838 (by @RainbowMango) |
| 183 | +- [ ] https://github.com/kubernetes/kubernetes/pull/83839 (by @RainbowMango) |
| 184 | +- [ ] https://github.com/kubernetes/kubernetes/pull/83841 (by @RainbowMango) |
| 185 | + |
| 186 | +### Escape Flag |
| 187 | +- [ ] https://github.com/kubernetes/kubernetes/pull/84292 (by @RainbowMango) |
0 commit comments