You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@@ -128,20 +133,20 @@ checklist items _must_ be updated for the enhancement to be released.
128
133
129
134
Items marked with (R) are required *prior to targeting to a milestone / release*.
130
135
131
-
-[] (R) Enhancement issue in release milestone, which links to KEP dir in [kubernetes/enhancements] (not the initial KEP PR)
132
-
-[] (R) KEP approvers have approved the KEP status as `implementable`
133
-
-[] (R) Design details are appropriately documented
134
-
-[] (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input (including test refactors)
135
-
-[] e2e Tests for all Beta API Operations (endpoints)
136
+
-[X] (R) Enhancement issue in release milestone, which links to KEP dir in [kubernetes/enhancements] (not the initial KEP PR)
137
+
-[X] (R) KEP approvers have approved the KEP status as `implementable`
138
+
-[X] (R) Design details are appropriately documented
139
+
-[X] (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input (including test refactors)
140
+
-[X] e2e Tests for all Beta API Operations (endpoints)
136
141
-[ ] (R) Ensure GA e2e tests meet requirements for [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md)
137
142
-[ ] (R) Minimum Two Week Window for GA e2e tests to prove flake free
138
143
-[ ] (R) Graduation criteria is in place
139
144
-[ ] (R) [all GA Endpoints](https://github.com/kubernetes/community/pull/1806) must be hit by [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md)
140
-
-[] (R) Production readiness review completed
141
-
-[] (R) Production readiness review approved
142
-
-[] "Implementation History" section is up-to-date for milestone
143
-
-[] User-facing documentation has been created in [kubernetes/website], for publication to [kubernetes.io]
144
-
-[] Supporting documentation—e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes
145
+
-[X] (R) Production readiness review completed
146
+
-[X] (R) Production readiness review approved
147
+
-[X] "Implementation History" section is up-to-date for milestone
148
+
-[X] User-facing documentation has been created in [kubernetes/website], for publication to [kubernetes.io]
149
+
-[X] Supporting documentation—e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes
145
150
146
151
<!--
147
152
**Note:** This checklist is iterative and should be reviewed and updated every time this enhancement is being considered for a milestone.
@@ -185,6 +190,7 @@ Additionally we propose forced upgrades of metrics stability classes in the simi
185
190
### Risks and Mitigations
186
191
187
192
The primary risk is that these changes break our existing (and working) metrics infrastructure. The mitigation should straightfoward, i.e. rollback the changes to the metrics framework.
193
+
188
194
## Design Details
189
195
190
196
Our plan is to add functionality to our static analysis framework which is hosted in the main `k8s/k8s` repo, under `test/instrumentation`. Specifically, we will need to support:
@@ -203,6 +209,35 @@ We will not attempt to parse metrics which:
203
209
204
210
As an aside, much of this work has already been done, but is stashed in a local repo.
205
211
212
+
### Semantic of Stability Levels
213
+
214
+
#### Internal Metrics
215
+
216
+
`Internal` metrics have no stability guarantees and are **not** parseable by the static analysis framework. As such, `Internal` metrics will NOT be included in metric auto-documentation.
217
+
218
+
#### Alpha Metrics
219
+
220
+
Alpha metrics have no stability guarantees but are parseable by the static analysis framework. As such, `Alpha` metrics will be included in metric auto-documentation.
221
+
222
+
#### Beta Metrics
223
+
224
+
`Beta` metrics have *some* stability guarantees. Specifically, we guarantee that:
225
+
226
+
-`Beta` metrics will not be removed without first being explicitly deprecated.
227
+
+ you can deprecate Beta metrics at any point:
228
+
* if because of changes in underlying code/feature it's impossible to compute such metric the metric can be removed after one release
229
+
* if the metric is still possible to expose (we just think it's not the right one, e.g. we want to remove some label), but technically can still expose it, we leave it deprecated for 3 releases
230
+
- Furthermore, `Beta` metrics are guaranteed to be **forward compatible** in respect to alerts and queries which may be written against them. By "forward compatible", we mean that queries and alerts which are written against the metric and its labels will continue to work in the future. We ensure forward compatibility by ensuring that **labels can only be added**, *and not removed*, from `Beta` metrics.
231
+
-`Beta` metrics will be included in metric auto-documentation
232
+
233
+
#### Stable Metrics
234
+
235
+
`Stable` metrics have stability guarantees. Specifically, we guarantee that:
236
+
237
+
-`Stable` metrics will not be removed without first being explicitly deprecated. After deprecation, the metric will be removed in 12 months or 3 releases.
238
+
- Furthermore, `Stable` metrics are guaranteed to **not change** in respect to labels. This means labels can neither be added nor removed from a `Stable` metric.
239
+
-`Stable` metrics will be included in metric auto-documentation
240
+
206
241
### Test Plan
207
242
208
243
We have static analysis testing for stable metrics, we will extend our test coverage
@@ -218,12 +253,12 @@ We already have thorough testing for the stability framework which has been GA f
218
253
219
254
##### Unit tests
220
255
221
-
[] parsing variables
222
-
[] multi-line strings
223
-
[] evaluating buckets
224
-
[] buckets which are defined via variables and consts
225
-
[] evaluation of simple consts
226
-
[] evaluation of simple variables
256
+
[X] parsing variables
257
+
[X] multi-line strings
258
+
[X] evaluating buckets
259
+
[X] buckets which are defined via variables and consts
260
+
[X] evaluation of simple consts
261
+
[X] evaluation of simple variables
227
262
228
263
-`test/instrumentation`: `09/20/2022` - `full coverage of existing stability framework`
229
264
@@ -245,11 +280,9 @@ The statis analysis tooling runs in a precommit pipeline and is therefore exempt
245
280
246
281
#### Beta
247
282
248
-
- All instances of `Alpha` metrics will be converted to `Internal`
249
-
- Kubernetes metrics framework will be enhanced to support marking `Alpha` and `Beta` metrics with a date. The semantics of this are yet to be determined. This date will be used to statically determine whether or not that metric should be decrepated automatically or promoted.
250
-
- Kubernetes metrics framework will be enhanced with a script to auto-deprecate metrics which have passed their window of existence as an `Alpha` or `Beta` metric
251
-
- We will determine the semantics for `Alpha` and `Beta` metrics
252
-
- The `beta` stage for this framework will be a few releases. During this time, we will evaluate the utility and the ergonomics of the framework, making adjustments as necessary
283
+
- Kubernetes metrics framework will be enhanced to support marking `Alpha` and `Beta` metrics with release version. The semantics of this are yet to be determined. This version will be used to statically determine whether or not that metric should be deprecated automatically or promoted.
284
+
285
+
For the beta version of this KEP, we begin permitting metrics to be promoted to the `Beta` stability class.
253
286
254
287
#### GA
255
288
@@ -317,13 +350,12 @@ This should not affect upgrade/rollback paths.
317
350
318
351
###### How can an operator determine if the feature is in use by workloads?
319
352
320
-
You can determine this by seeing if workloads depend on any Kubernetes control-plane metrics. If they do, they are using this feature.
353
+
We've introduced a metric (i.e. `registered_metrics_total`) which should serve to indicate this feature is enabled.
321
354
322
355
###### How can someone using this feature know that it is working for their instance?
323
356
324
357
They will be able to see metrics.
325
358
326
-
327
359
###### What are the reasonable SLOs (Service Level Objectives) for the enhancement?
328
360
329
361
This tooling runs in precommit. It does not affect runtime SLOs.
@@ -334,15 +366,15 @@ N/A
334
366
335
367
###### Are there any missing metrics that would be useful to have to improve observability of this feature?
336
368
337
-
`registered_metrics_total` will be used to calculate the number of registered stable metrics.
369
+
No.
338
370
339
371
### Dependencies
340
372
341
373
Prometheus and the Kubernetes metric framework.
342
374
343
375
###### Does this feature depend on any specific services running in the cluster?
344
376
345
-
In order to ingest these metrics, one needs a prometheus scraping agent and some backend to persist the metric data.
0 commit comments