Skip to content

Commit 0465a51

Browse files
authored
Merge pull request kubernetes#2431 from logicalhan/metric-keps
expand on GA requirements for metrics stability framework
2 parents 77a84d2 + 8f23f87 commit 0465a51

File tree

3 files changed

+109
-44
lines changed

3 files changed

+109
-44
lines changed
Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,3 @@
11
kep-number: 1209
2-
beta:
2+
stable:
33
approver: "@johnbelamaric"

keps/sig-instrumentation/1209-metrics-stability/README.md

Lines changed: 105 additions & 41 deletions
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,19 @@
1313
- [Upgrade / Downgrade Strategy](#upgrade--downgrade-strategy)
1414
- [Version Skew Strategy](#version-skew-strategy)
1515
- [Production Readiness Review Questionnaire](#production-readiness-review-questionnaire)
16+
- [How can this feature be enabled / disabled in a live cluster?](#how-can-this-feature-be-enabled--disabled-in-a-live-cluster)
17+
- [What specific metrics should inform a rollback?](#what-specific-metrics-should-inform-a-rollback)
18+
- [Monitoring Requirements](#monitoring-requirements)
19+
- [How can an operator determine if the feature is in use by workloads?](#how-can-an-operator-determine-if-the-feature-is-in-use-by-workloads)
20+
- [What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service?](#what-are-the-slis-service-level-indicators-an-operator-can-use-to-determine-the-health-of-the-service)
21+
- [Metrics](#metrics)
22+
- [Dependencies](#dependencies)
23+
- [Does this feature depend on any specific services running in the cluster?](#does-this-feature-depend-on-any-specific-services-running-in-the-cluster)
24+
- [For GA, this section is required: approvers should be able to confirm the previous answers based on experience in the field.](#for-ga-this-section-is-required-approvers-should-be-able-to-confirm-the-previous-answers-based-on-experience-in-the-field)
25+
- [Will enabling / using this feature result in any new API calls? Describe them, providing:](#will-enabling--using-this-feature-result-in-any-new-api-calls-describe-them-providing)
26+
- [Troubleshooting](#troubleshooting)
27+
- [How does this feature react if the API server and/or etcd is unavailable?](#how-does-this-feature-react-if-the-api-server-andor-etcd-is-unavailable)
28+
- [What are other known failure modes?](#what-are-other-known-failure-modes)
1629
- [Implementation History](#implementation-history)
1730
<!-- /toc -->
1831

@@ -39,8 +52,8 @@ Items marked with (R) are required *prior to targeting to a milestone / release*
3952
- [X] (R) Design details are appropriately documented
4053
- [X] (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input
4154
- [X] (R) Graduation criteria is in place
42-
- \[Predates\] (R) Production readiness review completed
43-
- \[Predates\] (R) Production readiness review approved
55+
- [ ] (R) Production readiness review completed
56+
- [ ] (R) Production readiness review approved
4457
- [X] "Implementation History" section is up-to-date for milestone
4558
- [X] User-facing documentation has been created in [kubernetes/website], for publication to [kubernetes.io]
4659
- [X] Supporting documentation—e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes
@@ -56,17 +69,17 @@ This proposal covers the implementation of metrics stability in the kubernetes/k
5669

5770
Historically, the implementation was split into four documents:
5871

59-
1. [Metrics Stability Framework]
60-
1. [Metrics Stability Migration]
61-
1. [Metrics Validation and Verification]
62-
1. [Metrics Stability to Beta]
72+
1. [Metrics Stability Framework](https://github.com/kubernetes/enhancements/blob/77a84d2d55b5802a615f3fe98e7e7c9bd26c9efc/keps/sig-instrumentation/1209-metrics-stability/20190404-kubernetes-control-plane-metrics-stability.md)
73+
1. [Metrics Stability Migration](https://github.com/kubernetes/enhancements/blob/77a84d2d55b5802a615f3fe98e7e7c9bd26c9efc/keps/sig-instrumentation/1209-metrics-stability/20190605-metrics-stability-migration.md)
74+
1. [Metrics Validation and Verification](https://github.com/kubernetes/enhancements/blob/77a84d2d55b5802a615f3fe98e7e7c9bd26c9efc/keps/sig-instrumentation/1209-metrics-stability/20190605-metrics-validation-and-verification.md)
75+
1. [Metrics Stability to Beta](https://github.com/kubernetes/enhancements/blob/77a84d2d55b5802a615f3fe98e7e7c9bd26c9efc/keps/sig-instrumentation/1209-metrics-stability/20191028-metrics-stability-to-beta.md)
6376

6477
This document is not net new and ties the four together in order to document the lifecycle of this feature.
6578

66-
[Metrics Stability Framework]: keps/sig-instrumentation/1209-metrics-stability/20190404-kubernetes-control-plane-metrics-stability.md
67-
[Metrics Stability Migration]: keps/sig-instrumentation/1209-metrics-stability/20190605-metrics-stability-migration.md
68-
[Metrics Validation and Verification]: keps/sig-instrumentation/1209-metrics-stability/20190605-metrics-validation-and-verification.md
69-
[Metrics Stability to Beta]: keps/sig-instrumentation/1209-metrics-stability/20191028-metrics-stability-to-beta.md
79+
[Metrics Stability Framework]: 20190404-kubernetes-control-plane-metrics-stability.md
80+
[Metrics Stability Migration]: 20190605-metrics-stability-migration.md
81+
[Metrics Validation and Verification]: 20190605-metrics-validation-and-verification.md
82+
[Metrics Stability to Beta]: 20191028-metrics-stability-to-beta.md
7083

7184
## Motivation
7285

@@ -77,10 +90,10 @@ See:
7790
1. [Metrics Validation and Verification#Motivation]
7891
1. [Metrics Stability to Beta#Motivation]
7992

80-
[Metrics Stability Framework#Motivation]: keps/sig-instrumentation/1209-metrics-stability/20190404-kubernetes-control-plane-metrics-stability.md#motivation
81-
[Metrics Stability Migration#Motivation]: keps/sig-instrumentation/1209-metrics-stability/20190605-metrics-stability-migration.md#motivation
82-
[Metrics Validation and Verification#Motivation]: keps/sig-instrumentation/1209-metrics-stability/20190605-metrics-validation-and-verification.md#motivation
83-
[Metrics Stability to Beta#Motivation]: keps/sig-instrumentation/1209-metrics-stability/20191028-metrics-stability-to-beta.md#motivation
93+
[Metrics Stability Framework#Motivation]: 20190404-kubernetes-control-plane-metrics-stability.md#motivation
94+
[Metrics Stability Migration#Motivation]: 20190605-metrics-stability-migration.md#motivation
95+
[Metrics Validation and Verification#Motivation]: 20190605-metrics-validation-and-verification.md#motivation
96+
[Metrics Stability to Beta#Motivation]: 20191028-metrics-stability-to-beta.md#motivation
8497

8598
## Proposal
8699

@@ -91,10 +104,12 @@ See:
91104
1. [Metrics Validation and Verification#Proposal]
92105
1. [Metrics Stability to Beta#Proposal]
93106

94-
[Metrics Stability Framework#Proposal]: keps/sig-instrumentation/1209-metrics-stability/20190404-kubernetes-control-plane-metrics-stability.md#proposal
95-
[Metrics Stability Migration#General Migration Strategy]: keps/sig-instrumentation/1209-metrics-stability/20190605-metrics-stability-migration.md#general-migration-strategy
96-
[Metrics Validation and Verification#Proposal]: keps/sig-instrumentation/1209-metrics-stability/20190605-metrics-validation-and-verification.md#proposal
97-
[Metrics Stability to Beta#Proposal]: keps/sig-instrumentation/1209-metrics-stability/20191028-metrics-stability-to-beta.md#proposal
107+
https://github.com/kubernetes/enhancements/blob/77a84d2d55b5802a615f3fe98e7e7c9bd26c9efc/keps/sig-instrumentation/1209-metrics-stability/keps/sig-instrumentation/1209-metrics-stability/20190404-kubernetes-control-plane-metrics-stability.md#implementation-history
108+
109+
[Metrics Stability Framework#Proposal]: 20190404-kubernetes-control-plane-metrics-stability.md#proposal
110+
[Metrics Stability Migration#General Migration Strategy]: 20190605-metrics-stability-migration.md#general-migration-strategy
111+
[Metrics Validation and Verification#Proposal]: 20190605-metrics-validation-and-verification.md#proposal
112+
[Metrics Stability to Beta#Proposal]: 20191028-metrics-stability-to-beta.md#proposal
98113

99114
## Design Details
100115

@@ -103,8 +118,8 @@ See:
103118
1. [Metrics Stability Framework#Design Details]
104119
1. [Metrics Validation and Verification#Design Details]
105120

106-
[Metrics Stability Framework#Design Details]: keps/sig-instrumentation/1209-metrics-stability/20190404-kubernetes-control-plane-metrics-stability.md#design-details
107-
[Metrics Validation and Verification#Design Details]: keps/sig-instrumentation/1209-metrics-stability/20190605-metrics-validation-and-verification.md#design-details
121+
[Metrics Stability Framework#Design Details]: 20190404-kubernetes-control-plane-metrics-stability.md#design-details
122+
[Metrics Validation and Verification#Design Details]: 20190605-metrics-validation-and-verification.md#design-details
108123

109124
### Graduation Criteria
110125

@@ -115,8 +130,8 @@ See:
115130
1. [Metrics Stability Framework#Graduation Criteria]
116131
1. [Metrics Stability Migration#Graduation Criteria]
117132

118-
[Metrics Stability Framework#Graduation Criteria]: keps/sig-instrumentation/1209-metrics-stability/20190404-kubernetes-control-plane-metrics-stability.md#graduation-criteria
119-
[Metrics Stability Migration#Graduation Criteria]: keps/sig-instrumentation/1209-metrics-stability/20190605-metrics-stability-migration.md#graduation-criteria
133+
[Metrics Stability Framework#Graduation Criteria]: 20190404-kubernetes-control-plane-metrics-stability.md#graduation-criteria
134+
[Metrics Stability Migration#Graduation Criteria]: 20190605-metrics-stability-migration.md#graduation-criteria
120135

121136
#### Alpha -> Beta Graduation
122137

@@ -125,36 +140,85 @@ See:
125140
1. [Metrics Validation and Verification#Graduation Criteria]
126141
1. [Metrics Stability to Beta#Graduation Criteria]
127142

128-
[Metrics Validation and Verification#Graduation Criteria]: keps/sig-instrumentation/1209-metrics-stability/20190605-metrics-validation-and-verification.md#graduation-criteria
129-
[Metrics Stability to Beta#Graduation Criteria]: keps/sig-instrumentation/1209-metrics-stability/20191028-metrics-stability-to-beta.md#graduation-criteria
143+
[Metrics Validation and Verification#Graduation Criteria]: 20190605-metrics-validation-and-verification.md#graduation-criteria
144+
[Metrics Stability to Beta#Graduation Criteria]: 20191028-metrics-stability-to-beta.md#graduation-criteria
130145

131146
#### Beta -> GA Graduation
132147

133-
- Select stable metrics from control plane components
134-
- Implement the ability to turn off individual metrics (see [here](keps/sig-instrumentation/1209-metrics-stability/20191028-metrics-stability-to-beta.md#non-goals))
135-
136-
**For non-optional features moving to GA, the graduation criteria must include
137-
[conformance tests].**
138-
139-
TODO(@logicalhan): ^
140-
141-
[conformance tests]: https://git.k8s.io/community/contributors/devel/sig-architecture/conformance-tests.md
148+
- Metrics are now eligible to be promoted to STABLE status (we have some candidates in kube-apiserver).
149+
- [apiserver_storage_object_counts](https://github.com/kubernetes/kubernetes/issues/98270)
150+
- `apiserver_request_total` will also be promoted (as discussed in biweekly SIG apimachinery meeting)
151+
- Implement the ability to turn off individual metrics (see [here](20191028-metrics-stability-to-beta.md#non-goals))
152+
- We need this because of stuff like this: [Unbounded valuesets for metric labels](https://github.com/kubernetes/kubernetes/issues/76302)
142153

143154
### Upgrade / Downgrade Strategy
144155

145156
See:
146157

147-
- [Deprecation Lifecycle](keps/sig-instrumentation/1209-metrics-stability/20190404-kubernetes-control-plane-metrics-stability.md#deprecation-lifecycle)
148-
- [Deprecation of modified metrics from metrics overhaul KEP](keps/sig-instrumentation/1209-metrics-stability/20190605-metrics-stability-migration.md#deprecation-of-modified-metrics-from-metrics-overhaul-kep)
149-
- [Escape Hatch](keps/sig-instrumentation/1209-metrics-stability/20191028-metrics-stability-to-beta.md#escape-hatch)
158+
- [Deprecation Lifecycle](20190404-kubernetes-control-plane-metrics-stability.md#deprecation-lifecycle)
159+
- [Deprecation of modified metrics from metrics overhaul KEP](20190605-metrics-stability-migration.md#deprecation-of-modified-metrics-from-metrics-overhaul-kep)
160+
- [Escape Hatch](20191028-metrics-stability-to-beta.md#escape-hatch)
161+
162+
https://github.com/kubernetes/enhancements/blob/0f5bb1138a6dfd7f3d52fa901c2fba7abb7fb731/keps/sig-instrumentation/1209-metrics-stability/keps/sig-instrumentation/1209-metrics-stability/20190404-kubernetes-control-plane-metrics-stability.md#implementation-history
150163

151164
### Version Skew Strategy
152165

153166
N/A
154167

155168
## Production Readiness Review Questionnaire
156169

157-
N/A - this KEP predates PRR. @logicalhan to fill this in later if desired.
170+
#### How can this feature be enabled / disabled in a live cluster?
171+
172+
The metrics stability framework adds developer tooling around commit pipelines and is not a user-facing feature per se. The part that is user-facing is the annotation on metrics with a stability level.
173+
174+
This framework intends to increase reliability in control-plane management and so features in the metrics stability framework tend to 'fix' aspects of dev processes which lead to downstream breakages.
175+
176+
Rollout, Upgrade and Rollback Planning
177+
This section must be completed when targeting beta graduation to a release.
178+
179+
N/A, this isn't a feature per se.
180+
181+
#### What specific metrics should inform a rollback?
182+
183+
N/A
184+
185+
### Monitoring Requirements
186+
187+
#### How can an operator determine if the feature is in use by workloads?
188+
189+
N/A
190+
191+
#### What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service?
192+
193+
N/A
194+
195+
#### Metrics
196+
197+
The stability framework applies to all metrics which originate directly from the control-plane.
198+
199+
### Dependencies
200+
201+
This section must be completed when targeting beta graduation to a release.
202+
203+
#### Does this feature depend on any specific services running in the cluster?
204+
205+
N/A
206+
207+
#### For GA, this section is required: approvers should be able to confirm the previous answers based on experience in the field.
208+
209+
#### Will enabling / using this feature result in any new API calls? Describe them, providing:
210+
211+
No.
212+
213+
### Troubleshooting
214+
215+
#### How does this feature react if the API server and/or etcd is unavailable?
216+
217+
N/A (but if the component isn't available, no metrics are being scraped).
218+
219+
#### What are other known failure modes?
220+
221+
At worst, this thing can clog the commit pipeline (since it is effectively a conformance test for ensuring metric stability guarantees). In that case, we can simply turn off the verification and validation mechanism (i.e. the `hack/verify_generated_stable_metrics.sh` script) which effectively puts us back to where we were before the framework. Note that this basically allows developers to commit breaking changes to metrics and violate guarantees though.
158222

159223
## Implementation History
160224

@@ -165,7 +229,7 @@ See:
165229
1. [Metrics Validation and Verification#Implementation History]
166230
1. [Metrics Stability to Beta#Implementation History]
167231

168-
[Metrics Stability Framework#Implementation History]: keps/sig-instrumentation/1209-metrics-stability/20190404-kubernetes-control-plane-metrics-stability.md#implementation-history
169-
[Metrics Stability Migration#Implementation History]: keps/sig-instrumentation/1209-metrics-stability/20190605-metrics-stability-migration.md#implementation-history
170-
[Metrics Validation and Verification#Implementation History]: keps/sig-instrumentation/1209-metrics-stability/20190605-metrics-validation-and-verification.md#implementation-history
171-
[Metrics Stability to Beta#Implementation History]: keps/sig-instrumentation/1209-metrics-stability/20191028-metrics-stability-to-beta.md#implementation-history
232+
[Metrics Stability Framework#Implementation History]: 20190404-kubernetes-control-plane-metrics-stability.md#implementation-history
233+
[Metrics Stability Migration#Implementation History]: 20190605-metrics-stability-migration.md#implementation-history
234+
[Metrics Validation and Verification#Implementation History]: 20190605-metrics-validation-and-verification.md#implementation-history
235+
[Metrics Stability to Beta#Implementation History]: 20191028-metrics-stability-to-beta.md#implementation-history

keps/sig-instrumentation/1209-metrics-stability/kep.yaml

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -21,18 +21,19 @@ reviewers:
2121
- "@lavalamp" # api-machinery
2222
- "@dashpole" # node
2323
- "@ehashman" # instrumentation
24-
- "@mml"
24+
- "@mml"
2525
- "@sttts" # api-machinery
2626
- "@bsalamat" # scheduling
2727
- "@andrewsykim" # cloud-provider
2828
approvers:
2929
- "@brancz"
3030
prr-approvers:
31+
- "@johnbelamaric" # PRR-reviewer
3132
see-also:
3233
- "/keps/sig-instrumentation/1206-metrics-overhaul"
3334

3435
# The target maturity stage in the current dev cycle for this KEP.
35-
stage: beta
36+
stage: stable
3637

3738
# The most recent milestone for which work toward delivery of this KEP has been
3839
# done. This can be the current (upcoming) milestone, if it is being actively

0 commit comments

Comments
 (0)