Skip to content

Commit ce17c8b

Browse files
authored
Supplement for KEP-4346 (#4504)
* use feature gate instead of env * add Integration tests * add reviewers and approvers * use HistogramMetric instead of summary
1 parent cecf3c4 commit ce17c8b

File tree

2 files changed

+16
-11
lines changed

2 files changed

+16
-11
lines changed

keps/sig-api-machinery/4346-informer-metrics/README.md

Lines changed: 11 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -237,8 +237,8 @@ nitty-gritty.
237237

238238
- Introduce the informer metrics struct `informerMetrics` contains queue/eventHandler metrics
239239
- Introduce the informer metrics provider interface `informerMetricsProvider`, implement in `k8s.io/component-base/metrics`
240-
- Add an environment to enable informer metrics
241240
- Revert the deleted `reflectorMetrics`
241+
- Add a feature gate `InformerMetrics` to enable informer/reflector metrics
242242

243243
### User Stories (Optional)
244244

@@ -361,20 +361,19 @@ Each reflector metrics contains 3 counter, 4 summary and 1 gauge.
361361
```
362362
type reflectorMetrics struct {
363363
numberOfLists CounterMetric
364-
listDuration SummaryMetric
365-
numberOfItemsInList SummaryMetric
364+
listDuration HistogramMetric
365+
numberOfItemsInList HistogramMetric
366366
367367
numberOfWatches CounterMetric
368368
numberOfShortWatches CounterMetric
369-
watchDuration SummaryMetric
370-
numberOfItemsInWatch SummaryMetric
369+
watchDuration HistogramMetric
370+
numberOfItemsInWatch HistogramMetric
371371
372372
lastResourceVersion GaugeMetric
373373
}
374374
```
375-
According to kubernetes/kubernetes#73587, the memory leak is caused by summary. Every summary contains `AgeBuckets`of buckets which contains an array with 500 quantile.Sample. Reduce summary `ageBuckets` to mitigate memory usage.
375+
According to kubernetes/kubernetes#73587, the memory leak is caused by summary. It'd be better to use histograms instead. HistogramMetrics are aggregatable and it will reduce memory usage.
376376

377-
The listDuration/numberOfItemsInList/watchDuration/numberOfItemsInWatch will not `Observe` very frequently. According to [prometheus client](https://github.com/prometheus/client_golang/commit/15c9ded5a3b7ae08886e51ba26a97270ca23fecd#diff-4c7dc681b10f8e28a3d6a8cfd115c37cc473e6fc9b0949bd6b41eaedbdead438R132), it is enough to set `ageBuckets` to 1.
378377

379378
### Remove Metrics
380379
When the informers and reflectors stopped, the reference metrics will be removed.
@@ -446,8 +445,11 @@ For Beta and GA, add links to added tests together with links to k8s-triage for
446445
https://storage.googleapis.com/k8s-triage/index.html
447446
-->
448447

449-
- <test>: <link to test coverage>
450-
When the informers and reflectors are stopped, ensure the reference metrics will be removed.
448+
We will have extensive integration testing of the union code in the
449+
`test/integration/metrics` package.
450+
451+
- When enabling `InformerMetrics` feature gate, ensure the metrics will be exposed. Ensure the metrics subsystem/label/granularity is correct.
452+
- When the informers and reflectors are stopped, ensure the reference metrics will be removed.
451453

452454
##### e2e tests
453455

keps/sig-api-machinery/4346-informer-metrics/kep.yaml

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -7,9 +7,12 @@ participating-sigs: []
77
status: provisional
88
creation-date: 2023-11-27
99
reviewers:
10-
- TBD
10+
- "@deads2k"
11+
- "@logicalhan"
12+
- "@aojea"
13+
- "@dgrisonnet"
1114
approvers:
12-
- TBD
15+
- "@deads2k"
1316

1417
see-also: []
1518
replaces: []

0 commit comments

Comments
 (0)