You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: keps/sig-instrumentation/3498-extending-stability/README.md
+48-24Lines changed: 48 additions & 24 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -128,20 +128,20 @@ checklist items _must_ be updated for the enhancement to be released.
128
128
129
129
Items marked with (R) are required *prior to targeting to a milestone / release*.
130
130
131
-
-[] (R) Enhancement issue in release milestone, which links to KEP dir in [kubernetes/enhancements] (not the initial KEP PR)
132
-
-[] (R) KEP approvers have approved the KEP status as `implementable`
133
-
-[] (R) Design details are appropriately documented
134
-
-[] (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input (including test refactors)
135
-
-[] e2e Tests for all Beta API Operations (endpoints)
131
+
-[X] (R) Enhancement issue in release milestone, which links to KEP dir in [kubernetes/enhancements] (not the initial KEP PR)
132
+
-[X] (R) KEP approvers have approved the KEP status as `implementable`
133
+
-[X] (R) Design details are appropriately documented
134
+
-[X] (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input (including test refactors)
135
+
-[X] e2e Tests for all Beta API Operations (endpoints)
136
136
-[ ] (R) Ensure GA e2e tests meet requirements for [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md)
137
137
-[ ] (R) Minimum Two Week Window for GA e2e tests to prove flake free
138
138
-[ ] (R) Graduation criteria is in place
139
139
-[ ] (R) [all GA Endpoints](https://github.com/kubernetes/community/pull/1806) must be hit by [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md)
140
-
-[] (R) Production readiness review completed
141
-
-[] (R) Production readiness review approved
142
-
-[] "Implementation History" section is up-to-date for milestone
143
-
-[] User-facing documentation has been created in [kubernetes/website], for publication to [kubernetes.io]
144
-
-[] Supporting documentation—e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes
140
+
-[X] (R) Production readiness review completed
141
+
-[X] (R) Production readiness review approved
142
+
-[X] "Implementation History" section is up-to-date for milestone
143
+
-[X] User-facing documentation has been created in [kubernetes/website], for publication to [kubernetes.io]
144
+
-[X] Supporting documentation—e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes
145
145
146
146
<!--
147
147
**Note:** This checklist is iterative and should be reviewed and updated every time this enhancement is being considered for a milestone.
@@ -185,6 +185,7 @@ Additionally we propose forced upgrades of metrics stability classes in the simi
185
185
### Risks and Mitigations
186
186
187
187
The primary risk is that these changes break our existing (and working) metrics infrastructure. The mitigation should straightfoward, i.e. rollback the changes to the metrics framework.
188
+
188
189
## Design Details
189
190
190
191
Our plan is to add functionality to our static analysis framework which is hosted in the main `k8s/k8s` repo, under `test/instrumentation`. Specifically, we will need to support:
@@ -203,6 +204,32 @@ We will not attempt to parse metrics which:
203
204
204
205
As an aside, much of this work has already been done, but is stashed in a local repo.
205
206
207
+
### Semantic of Stability Levels
208
+
209
+
#### Internal Metrics
210
+
211
+
`Internal` metrics have no stability guarantees and are **not** parseable by the static analysis framework. As such, `Internal` metrics will NOT be included in metric auto-documentation.
212
+
213
+
#### Alpha Metrics
214
+
215
+
Alpha metrics have no stability guarantees but are parseable by the static analysis framework. As such, `Alpha` metrics will be included in metric auto-documentation.
216
+
217
+
#### Beta Metrics
218
+
219
+
`Beta` metrics have *some* stability guarantees. Specifically, we guarantee that:
220
+
221
+
-`Beta` metrics will not be removed without first being explicitly deprecated. After deprecation, the metric will be removed in 4 months or 1 release.
222
+
- Furthermore, `Beta` metrics are guaranteed to be **forward compatible** in respect to alerts and queries which may be written against them. By "forward compatible", we mean that queries and alerts which are written against the metric and its labels will continue to work in the future. We ensure forward compatibility by ensuring that **labels can only be added**, *and not removed*, from `Beta` metrics.
223
+
-`Beta` metrics will be included in metric auto-documentation
224
+
225
+
#### Stable Metrics
226
+
227
+
`Stable` metrics have stability guarantees. Specifically, we guarantee that:
228
+
229
+
-`Stable` metrics will not be removed without first being explicitly deprecated. After deprecation, the metric will be removed in 12 months or 3 releases.
230
+
- Furthermore, `Stable` metrics are guaranteed to **not change** in respect to labels. This means labels can neither be added nor removed from a `Stable` metric.
231
+
-`Stable` metrics will be included in metric auto-documentation
232
+
206
233
### Test Plan
207
234
208
235
We have static analysis testing for stable metrics, we will extend our test coverage
@@ -218,12 +245,12 @@ We already have thorough testing for the stability framework which has been GA f
218
245
219
246
##### Unit tests
220
247
221
-
[] parsing variables
222
-
[] multi-line strings
223
-
[] evaluating buckets
224
-
[] buckets which are defined via variables and consts
225
-
[] evaluation of simple consts
226
-
[] evaluation of simple variables
248
+
[X] parsing variables
249
+
[X] multi-line strings
250
+
[X] evaluating buckets
251
+
[X] buckets which are defined via variables and consts
252
+
[X] evaluation of simple consts
253
+
[X] evaluation of simple variables
227
254
228
255
-`test/instrumentation`: `09/20/2022` - `full coverage of existing stability framework`
229
256
@@ -245,11 +272,9 @@ The statis analysis tooling runs in a precommit pipeline and is therefore exempt
245
272
246
273
#### Beta
247
274
248
-
- All instances of `Alpha` metrics will be converted to `Internal`
249
-
- Kubernetes metrics framework will be enhanced to support marking `Alpha` and `Beta` metrics with a date. The semantics of this are yet to be determined. This date will be used to statically determine whether or not that metric should be decrepated automatically or promoted.
250
-
- Kubernetes metrics framework will be enhanced with a script to auto-deprecate metrics which have passed their window of existence as an `Alpha` or `Beta` metric
251
-
- We will determine the semantics for `Alpha` and `Beta` metrics
252
-
- The `beta` stage for this framework will be a few releases. During this time, we will evaluate the utility and the ergonomics of the framework, making adjustments as necessary
275
+
- Kubernetes metrics framework will be enhanced to support marking `Alpha` and `Beta` metrics with release version. The semantics of this are yet to be determined. This version will be used to statically determine whether or not that metric should be deprecated automatically or promoted.
276
+
277
+
For the beta version of this KEP, we begin permitting metrics to be promoted to the `Beta` stability class.
253
278
254
279
#### GA
255
280
@@ -311,19 +336,18 @@ This should not affect upgrade/rollback paths.
311
336
312
337
###### Is the rollout accompanied by any deprecations and/or removals of features, APIs, fields of API types, flags, etc.?
313
338
314
-
`Alpha` metrics will be recategorized as `Internal`.
339
+
No.
315
340
316
341
### Monitoring Requirements
317
342
318
343
###### How can an operator determine if the feature is in use by workloads?
319
344
320
-
You can determine this by seeing if workloads depend on any Kubernetes control-plane metrics. If they do, they are using this feature.
345
+
Dependence on any Kubernetes control-plane metrics implies that they are using this feature.
321
346
322
347
###### How can someone using this feature know that it is working for their instance?
323
348
324
349
They will be able to see metrics.
325
350
326
-
327
351
###### What are the reasonable SLOs (Service Level Objectives) for the enhancement?
328
352
329
353
This tooling runs in precommit. It does not affect runtime SLOs.
0 commit comments