Skip to content

Commit 467b387

Browse files
authored
Merge pull request #4039 from rexagod/kep-2035
KEP-2305: Metric cardinality enforcement
2 parents 6115971 + d5fa1a0 commit 467b387

File tree

3 files changed

+67
-12
lines changed

3 files changed

+67
-12
lines changed
Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,5 @@
11
kep-number: 2305
22
alpha:
33
approver: "@johnbelamaric"
4+
beta:
5+
approver: "@johnbelamaric"

keps/sig-instrumentation/2305-metrics-cardinality-enforcement/README.md

Lines changed: 59 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,12 @@
1111
- [Proposal](#proposal)
1212
- [Design Details](#design-details)
1313
- [Test Plan](#test-plan)
14+
- [Prerequisite testing updates](#prerequisite-testing-updates)
15+
- [Unit tests](#unit-tests)
1416
- [Graduation Criteria](#graduation-criteria)
17+
- [Alpha](#alpha)
18+
- [Beta](#beta)
19+
- [GA](#ga)
1520
- [Upgrade / Downgrade strategy](#upgrade--downgrade-strategy)
1621
- [Version Skew Strategy](#version-skew-strategy)
1722
- [Production Readiness Review Questionnaire](#production-readiness-review-questionnaire)
@@ -266,14 +271,47 @@ This would then be interpreted by our machinery as this:
266271
```
267272

268273
## Design Details
274+
269275
### Test Plan
270-
For `Alpha`, unit test to verify that the metric label will be set to "unexpected" if the metric encounters label values outside our explicit allowlist of values.
276+
277+
[x] I/we understand the owners of the involved components may require updates to
278+
existing tests to make this code solid enough prior to committing the changes necessary
279+
to implement this enhancement.
280+
281+
##### Prerequisite testing updates
282+
283+
N/A
284+
285+
##### Unit tests
286+
287+
For `Alpha`, unit test to .
288+
289+
- `staging/src/k8s.io/component-base/metrics/counter_test.go`: `3/3/2021` - `verify that the metric label will be set to "unexpected" for counters if the metric encounters label values outside our explicit allowlist of values`
290+
- `staging/src/k8s.io/component-base/metrics/gauge_test.go`: `4/3/21` - `verify that the metric label will be set to "unexpected" for gauges if the metric encounters label values outside our explicit allowlist of values`
291+
- `staging/src/k8s.io/component-base/metrics/histogram_test.go`: `4/3/21` - `verify that the metric label will be set to "unexpected" for histograms if the metric encounters label values outside our explicit allowlist of values`
292+
- `staging/src/k8s.io/component-base/metrics/summary_test.go`: `4/3/21` - `verify that the metric label will be set to "unexpected" for summaries if the metric encounters label values outside our explicit allowlist of values`
293+
271294
### Graduation Criteria
272-
For `Alpha`, the allowlist of metrics can be configured via the exposed flag and the unit test is passed.
273-
For `Beta`, the allowlist can be configured from a input file(e.g. yaml file).
295+
296+
#### Alpha
297+
298+
- Feature implemented behind a feature flag
299+
- The allowlist of metrics can be configured via the exposed flag and the unit test is passed.
300+
301+
#### Beta
302+
303+
- The allowlist can be configured from a manifest.
304+
305+
#### GA
306+
307+
- Allow pattern-matching for labels in the allowlist.
308+
274309
### Upgrade / Downgrade strategy
310+
275311
N/A
312+
276313
### Version Skew Strategy
314+
277315
N/A
278316

279317
## Production Readiness Review Questionnaire
@@ -284,7 +322,16 @@ _This section must be completed when targeting alpha to a release._
284322
* **How can this feature be enabled / disabled in a live cluster?**
285323
- [x] Feature gate (also fill in values in `kep.yaml`)
286324
- Feature gate name: MetricCardinalityEnforcement
287-
- Components depending on the feature gate: All components that emit metrics
325+
- Components depending on the feature gate: All components that emit metrics, i.e. (at the time of writing),
326+
- cmd/kube-apiserver
327+
- cmd/kube-controller-manager
328+
- cmd/kubelet
329+
- pkg/kubelet/metrics
330+
- pkg/kubelet/prober
331+
- pkg/kubelet/server
332+
- pkg/proxy/metrics
333+
- cmd/kube-scheduler
334+
- pkg/volume/util
288335

289336
* **Does enabling the feature change any default behavior?**
290337
Any change of default behavior may be surprising to users or break existing
@@ -298,8 +345,8 @@ _This section must be completed when targeting alpha to a release._
298345
feature, can it break the existing applications?).
299346
Yes, disabling the feature gate can revert it back to existing behavior
300347

301-
* **What happens if we reenable the feature if it was previously rolled back?**
302-
The enable-disable-enable process will not cause problem. But it may be problematic during the rolled back period with the unbounded metrics value.
348+
* **What happens if we re-enable the feature if it was previously rolled back?**
349+
The enable-disable-enable process will not cause problem. But it may be problematic during the rolled back period with the unbounded metrics value. Note that metrics are a memory-only construct and do not persist, but re-generated across restarts.
303350

304351
* **Are there any tests for feature enablement/disablement?**
305352
Using unit tests to cover the combination cases w/wo feature and w/wo allowlist.
@@ -322,6 +369,7 @@ _This section must be completed when targeting beta graduation to a release._
322369
* **Is the rollout accompanied by any deprecations and/or removals of features, APIs,
323370
fields of API types, flags, etc.?**
324371
A component metric flag for ingesting allowlist to be added.
372+
325373
### Monitoring Requirements
326374

327375
_This section must be completed when targeting beta graduation to a release._
@@ -337,7 +385,7 @@ the health of the service?**
337385

338386
* **Are there any missing metrics that would be useful to have to improve observability
339387
of this feature?**
340-
None.
388+
- `cardinality_enforcement_unexpected_categorizations_total`: Increments whenever any metric falls into the "unexpected" case (i.e., goes out of the defined bounds).
341389

342390
### Dependencies
343391

@@ -346,7 +394,6 @@ _This section must be completed when targeting beta graduation to a release._
346394
* **Does this feature depend on any specific services running in the cluster?**
347395
No.
348396

349-
350397
### Scalability
351398

352399
_For alpha, this section is encouraged: reviewers should consider these questions
@@ -379,6 +426,10 @@ operations covered by [existing SLIs/SLOs]?**
379426
resource usage (CPU, RAM, disk, IO, ...) in any components?**
380427
No.
381428

429+
* **Can enabling / using this feature result in resource exhaustion of some
430+
node resources (PIDs, sockets, inodes, etc.)?**
431+
No.
432+
382433
### Troubleshooting
383434

384435
The Troubleshooting section currently serves the `Playbook` role. We may consider

keps/sig-instrumentation/2305-metrics-cardinality-enforcement/kep.yaml

Lines changed: 6 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,7 @@ authors:
44
- "@logicalhan"
55
- "@lilic"
66
- "@yoyinzyc"
7+
- "@rexagod"
78
owning-sig: sig-instrumentation
89
participating-sigs:
910
- sig-instrumentation
@@ -15,9 +16,10 @@ reviewers:
1516
approvers:
1617
- "@ehashman"
1718
creation-date: 2020-04-15
18-
last-updated: 2021-02-08
19-
stage: alpha
19+
last-updated: 2023-05-28
20+
stage: beta
2021
status: implementable
21-
latest-milestone: "v1.21"
22+
latest-milestone: "v1.28"
2223
milestone:
23-
alpha: "v1.21"
24+
beta: "v1.28"
25+
disable-supported: true

0 commit comments

Comments
 (0)