Skip to content

Commit 88917f8

Browse files
authored
Merge pull request kubernetes#3752 from logicalhan/stabilityv2
KEP-3498: adding beta graduation criteria for extending metric stability
2 parents 728fd24 + 37bdf1a commit 88917f8

File tree

2 files changed

+59
-25
lines changed
  • keps
    • prod-readiness/sig-instrumentation
    • sig-instrumentation/3498-extending-stability

2 files changed

+59
-25
lines changed

keps/prod-readiness/sig-instrumentation/3498.yaml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,3 +4,5 @@
44
kep-number: 3498
55
alpha:
66
approver: "@wojtek-t"
7+
beta:
8+
approver: "@wojtek-t"

keps/sig-instrumentation/3498-extending-stability/README.md

Lines changed: 57 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -85,6 +85,11 @@ tags, and then generate with `hack/update-toc.sh`.
8585
- [Proposal](#proposal)
8686
- [Risks and Mitigations](#risks-and-mitigations)
8787
- [Design Details](#design-details)
88+
- [Semantic of Stability Levels](#semantic-of-stability-levels)
89+
- [Internal Metrics](#internal-metrics)
90+
- [Alpha Metrics](#alpha-metrics)
91+
- [Beta Metrics](#beta-metrics)
92+
- [Stable Metrics](#stable-metrics)
8893
- [Test Plan](#test-plan)
8994
- [Prerequisite testing updates](#prerequisite-testing-updates)
9095
- [Unit tests](#unit-tests)
@@ -128,20 +133,20 @@ checklist items _must_ be updated for the enhancement to be released.
128133

129134
Items marked with (R) are required *prior to targeting to a milestone / release*.
130135

131-
- [ ] (R) Enhancement issue in release milestone, which links to KEP dir in [kubernetes/enhancements] (not the initial KEP PR)
132-
- [ ] (R) KEP approvers have approved the KEP status as `implementable`
133-
- [ ] (R) Design details are appropriately documented
134-
- [ ] (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input (including test refactors)
135-
- [ ] e2e Tests for all Beta API Operations (endpoints)
136+
- [X] (R) Enhancement issue in release milestone, which links to KEP dir in [kubernetes/enhancements] (not the initial KEP PR)
137+
- [X] (R) KEP approvers have approved the KEP status as `implementable`
138+
- [X] (R) Design details are appropriately documented
139+
- [X] (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input (including test refactors)
140+
- [X] e2e Tests for all Beta API Operations (endpoints)
136141
- [ ] (R) Ensure GA e2e tests meet requirements for [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md)
137142
- [ ] (R) Minimum Two Week Window for GA e2e tests to prove flake free
138143
- [ ] (R) Graduation criteria is in place
139144
- [ ] (R) [all GA Endpoints](https://github.com/kubernetes/community/pull/1806) must be hit by [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md)
140-
- [ ] (R) Production readiness review completed
141-
- [ ] (R) Production readiness review approved
142-
- [ ] "Implementation History" section is up-to-date for milestone
143-
- [ ] User-facing documentation has been created in [kubernetes/website], for publication to [kubernetes.io]
144-
- [ ] Supporting documentation—e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes
145+
- [X] (R) Production readiness review completed
146+
- [X] (R) Production readiness review approved
147+
- [X] "Implementation History" section is up-to-date for milestone
148+
- [X] User-facing documentation has been created in [kubernetes/website], for publication to [kubernetes.io]
149+
- [X] Supporting documentation—e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes
145150

146151
<!--
147152
**Note:** This checklist is iterative and should be reviewed and updated every time this enhancement is being considered for a milestone.
@@ -185,6 +190,7 @@ Additionally we propose forced upgrades of metrics stability classes in the simi
185190
### Risks and Mitigations
186191

187192
The primary risk is that these changes break our existing (and working) metrics infrastructure. The mitigation should straightfoward, i.e. rollback the changes to the metrics framework.
193+
188194
## Design Details
189195

190196
Our plan is to add functionality to our static analysis framework which is hosted in the main `k8s/k8s` repo, under `test/instrumentation`. Specifically, we will need to support:
@@ -203,6 +209,35 @@ We will not attempt to parse metrics which:
203209

204210
As an aside, much of this work has already been done, but is stashed in a local repo.
205211

212+
### Semantic of Stability Levels
213+
214+
#### Internal Metrics
215+
216+
`Internal` metrics have no stability guarantees and are **not** parseable by the static analysis framework. As such, `Internal` metrics will NOT be included in metric auto-documentation.
217+
218+
#### Alpha Metrics
219+
220+
Alpha metrics have no stability guarantees but are parseable by the static analysis framework. As such, `Alpha` metrics will be included in metric auto-documentation.
221+
222+
#### Beta Metrics
223+
224+
`Beta` metrics have *some* stability guarantees. Specifically, we guarantee that:
225+
226+
- `Beta` metrics will not be removed without first being explicitly deprecated.
227+
+ you can deprecate Beta metrics at any point:
228+
* if because of changes in underlying code/feature it's impossible to compute such metric the metric can be removed after one release
229+
* if the metric is still possible to expose (we just think it's not the right one, e.g. we want to remove some label), but technically can still expose it, we leave it deprecated for 3 releases
230+
- Furthermore, `Beta` metrics are guaranteed to be **forward compatible** in respect to alerts and queries which may be written against them. By "forward compatible", we mean that queries and alerts which are written against the metric and its labels will continue to work in the future. We ensure forward compatibility by ensuring that **labels can only be added**, *and not removed*, from `Beta` metrics.
231+
- `Beta` metrics will be included in metric auto-documentation
232+
233+
#### Stable Metrics
234+
235+
`Stable` metrics have stability guarantees. Specifically, we guarantee that:
236+
237+
- `Stable` metrics will not be removed without first being explicitly deprecated. After deprecation, the metric will be removed in 12 months or 3 releases.
238+
- Furthermore, `Stable` metrics are guaranteed to **not change** in respect to labels. This means labels can neither be added nor removed from a `Stable` metric.
239+
- `Stable` metrics will be included in metric auto-documentation
240+
206241
### Test Plan
207242

208243
We have static analysis testing for stable metrics, we will extend our test coverage
@@ -218,12 +253,12 @@ We already have thorough testing for the stability framework which has been GA f
218253

219254
##### Unit tests
220255

221-
[ ] parsing variables
222-
[ ] multi-line strings
223-
[ ] evaluating buckets
224-
[ ] buckets which are defined via variables and consts
225-
[ ] evaluation of simple consts
226-
[ ] evaluation of simple variables
256+
[X] parsing variables
257+
[X] multi-line strings
258+
[X] evaluating buckets
259+
[X] buckets which are defined via variables and consts
260+
[X] evaluation of simple consts
261+
[X] evaluation of simple variables
227262

228263
- `test/instrumentation`: `09/20/2022` - `full coverage of existing stability framework`
229264

@@ -245,11 +280,9 @@ The statis analysis tooling runs in a precommit pipeline and is therefore exempt
245280

246281
#### Beta
247282

248-
- All instances of `Alpha` metrics will be converted to `Internal`
249-
- Kubernetes metrics framework will be enhanced to support marking `Alpha` and `Beta` metrics with a date. The semantics of this are yet to be determined. This date will be used to statically determine whether or not that metric should be decrepated automatically or promoted.
250-
- Kubernetes metrics framework will be enhanced with a script to auto-deprecate metrics which have passed their window of existence as an `Alpha` or `Beta` metric
251-
- We will determine the semantics for `Alpha` and `Beta` metrics
252-
- The `beta` stage for this framework will be a few releases. During this time, we will evaluate the utility and the ergonomics of the framework, making adjustments as necessary
283+
- Kubernetes metrics framework will be enhanced to support marking `Alpha` and `Beta` metrics with release version. The semantics of this are yet to be determined. This version will be used to statically determine whether or not that metric should be deprecated automatically or promoted.
284+
285+
For the beta version of this KEP, we begin permitting metrics to be promoted to the `Beta` stability class.
253286

254287
#### GA
255288

@@ -317,13 +350,12 @@ This should not affect upgrade/rollback paths.
317350

318351
###### How can an operator determine if the feature is in use by workloads?
319352

320-
You can determine this by seeing if workloads depend on any Kubernetes control-plane metrics. If they do, they are using this feature.
353+
We've introduced a metric (i.e. `registered_metrics_total`) which should serve to indicate this feature is enabled.
321354

322355
###### How can someone using this feature know that it is working for their instance?
323356

324357
They will be able to see metrics.
325358

326-
327359
###### What are the reasonable SLOs (Service Level Objectives) for the enhancement?
328360

329361
This tooling runs in precommit. It does not affect runtime SLOs.
@@ -334,15 +366,15 @@ N/A
334366

335367
###### Are there any missing metrics that would be useful to have to improve observability of this feature?
336368

337-
`registered_metrics_total` will be used to calculate the number of registered stable metrics.
369+
No.
338370

339371
### Dependencies
340372

341373
Prometheus and the Kubernetes metric framework.
342374

343375
###### Does this feature depend on any specific services running in the cluster?
344376

345-
In order to ingest these metrics, one needs a prometheus scraping agent and some backend to persist the metric data.
377+
No.
346378

347379
### Scalability
348380

0 commit comments

Comments
 (0)