You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
-[How can an operator determine if the feature is in use by workloads?](#how-can-an-operator-determine-if-the-feature-is-in-use-by-workloads)
20
+
-[What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service?](#what-are-the-slis-service-level-indicators-an-operator-can-use-to-determine-the-health-of-the-service)
21
+
-[Metrics](#metrics)
22
+
-[Dependencies](#dependencies)
23
+
-[Does this feature depend on any specific services running in the cluster?](#does-this-feature-depend-on-any-specific-services-running-in-the-cluster)
24
+
-[For GA, this section is required: approvers should be able to confirm the previous answers based on experience in the field.](#for-ga-this-section-is-required-approvers-should-be-able-to-confirm-the-previous-answers-based-on-experience-in-the-field)
25
+
-[Will enabling / using this feature result in any new API calls? Describe them, providing:](#will-enabling--using-this-feature-result-in-any-new-api-calls-describe-them-providing)
26
+
-[Troubleshooting](#troubleshooting)
27
+
-[How does this feature react if the API server and/or etcd is unavailable?](#how-does-this-feature-react-if-the-api-server-andor-etcd-is-unavailable)
28
+
-[What are other known failure modes?](#what-are-other-known-failure-modes)
1.[Metrics Validation and Verification](https://github.com/kubernetes/enhancements/blob/77a84d2d55b5802a615f3fe98e7e7c9bd26c9efc/keps/sig-instrumentation/1209-metrics-stability/20190605-metrics-validation-and-verification.md)
75
+
1.[Metrics Stability to Beta](https://github.com/kubernetes/enhancements/blob/77a84d2d55b5802a615f3fe98e7e7c9bd26c9efc/keps/sig-instrumentation/1209-metrics-stability/20191028-metrics-stability-to-beta.md)
63
76
64
77
This document is not net new and ties the four together in order to document the lifecycle of this feature.
[Metrics Validation and Verification#Motivation]: keps/sig-instrumentation/1209-metrics-stability/20190605-metrics-validation-and-verification.md#motivation
83
-
[Metrics Stability to Beta#Motivation]: keps/sig-instrumentation/1209-metrics-stability/20191028-metrics-stability-to-beta.md#motivation
[Metrics Validation and Verification#Proposal]: keps/sig-instrumentation/1209-metrics-stability/20190605-metrics-validation-and-verification.md#proposal
97
-
[Metrics Stability to Beta#Proposal]: keps/sig-instrumentation/1209-metrics-stability/20191028-metrics-stability-to-beta.md#proposal
-`apiserver_request_total` will also be promoted (as discussed in biweekly SIG apimachinery meeting)
136
-
- Implement the ability to turn off individual metrics (see [here](keps/sig-instrumentation/1209-metrics-stability/20191028-metrics-stability-to-beta.md#non-goals))
137
-
-[Unbounded valuesets for metric labels](https://github.com/kubernetes/kubernetes/issues/76302)
151
+
- Implement the ability to turn off individual metrics (see [here](20191028-metrics-stability-to-beta.md#non-goals))
152
+
-We need this because of stuff like this: [Unbounded valuesets for metric labels](https://github.com/kubernetes/kubernetes/issues/76302)
-[Deprecation of modified metrics from metrics overhaul KEP](keps/sig-instrumentation/1209-metrics-stability/20190605-metrics-stability-migration.md#deprecation-of-modified-metrics-from-metrics-overhaul-kep)
-[Deprecation of modified metrics from metrics overhaul KEP](20190605-metrics-stability-migration.md#deprecation-of-modified-metrics-from-metrics-overhaul-kep)
N/A - this KEP predates PRR. @logicalhan to fill this in later if desired.
170
+
#### How can this feature be enabled / disabled in a live cluster?
171
+
172
+
The metrics stability framework adds developer tooling around commit pipelines and is not a user-facing feature per se. The part that is user-facing is the annotation on metrics with a stability level.
173
+
174
+
This framework intends to increase reliability in control-plane management and so features in the metrics stability framework tend to 'fix' aspects of dev processes which lead to downstream breakages.
175
+
176
+
Rollout, Upgrade and Rollback Planning
177
+
This section must be completed when targeting beta graduation to a release.
178
+
179
+
N/A, this isn't a feature per se.
180
+
181
+
#### What specific metrics should inform a rollback?
182
+
183
+
N/A
184
+
185
+
### Monitoring Requirements
186
+
187
+
#### How can an operator determine if the feature is in use by workloads?
188
+
189
+
N/A
190
+
191
+
#### What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service?
192
+
193
+
N/A
194
+
195
+
#### Metrics
196
+
197
+
The stability framework applies to all metrics which originate directly from the control-plane.
198
+
199
+
### Dependencies
200
+
201
+
This section must be completed when targeting beta graduation to a release.
202
+
203
+
#### Does this feature depend on any specific services running in the cluster?
204
+
205
+
N/A
206
+
207
+
#### For GA, this section is required: approvers should be able to confirm the previous answers based on experience in the field.
208
+
209
+
#### Will enabling / using this feature result in any new API calls? Describe them, providing:
210
+
211
+
No.
212
+
213
+
### Troubleshooting
214
+
215
+
#### How does this feature react if the API server and/or etcd is unavailable?
216
+
217
+
N/A (but if the component isn't available, no metrics are being scraped).
218
+
219
+
#### What are other known failure modes?
220
+
221
+
At worst, this thing can clog the commit pipeline (since it is effectively a conformance test for ensuring metric stability guarantees). In that case, we can simply turn off the verification and validation mechanism (i.e. the `hack/verify_generated_stable_metrics.sh` script) which effectively puts us back to where we were before the framework. Note that this basically allows developers to commit breaking changes to metrics and violate guarantees though.
154
222
155
223
## Implementation History
156
224
@@ -161,7 +229,7 @@ See:
161
229
1.[Metrics Validation and Verification#Implementation History]
162
230
1.[Metrics Stability to Beta#Implementation History]
0 commit comments