Skip to content

Commit 3e06a2a

Browse files
authored
Merge pull request #4518 from deads2k/paperwork-will-be-the-death-of-me
KEP 4346: Correct sig in kep.yaml
2 parents b075098 + 59f06ff commit 3e06a2a

File tree

2 files changed

+49
-19
lines changed

2 files changed

+49
-19
lines changed

keps/sig-api-machinery/4346-informer-metrics/README.md

Lines changed: 47 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -99,6 +99,9 @@ tags, and then generate with `hack/update-toc.sh`.
9999
- [Integration tests](#integration-tests)
100100
- [e2e tests](#e2e-tests)
101101
- [Graduation Criteria](#graduation-criteria)
102+
- [Alpha](#alpha)
103+
- [Beta](#beta)
104+
- [GA](#ga)
102105
- [Upgrade / Downgrade Strategy](#upgrade--downgrade-strategy)
103106
- [Version Skew Strategy](#version-skew-strategy)
104107
- [Production Readiness Review Questionnaire](#production-readiness-review-questionnaire)
@@ -428,6 +431,9 @@ extending the production code to implement this enhancement.
428431

429432
- `<package>`: `<date>` - `<test coverage>`
430433

434+
- Unit tests to ensure that the metrics output meets expectations.
435+
- Unit tests to ensure that the metrics deletion is functioning properly.
436+
431437
##### Integration tests
432438

433439
<!--
@@ -529,6 +535,21 @@ in back-to-back releases.
529535
- Deprecate the flag
530536
-->
531537

538+
#### Alpha
539+
540+
- Feature implemented behind a feature gate flag
541+
- Add related integration and unit tests to ensure functionality and make sure there is no memory leak in
542+
existing behavior
543+
544+
#### Beta
545+
546+
- Gather feedback from developers and surveys
547+
- Work on feedback and add additional tests as needed
548+
549+
#### GA
550+
551+
- Decision on GA will be made based on beta feedback
552+
532553
### Upgrade / Downgrade Strategy
533554

534555
<!--
@@ -543,6 +564,8 @@ enhancement:
543564
cluster required to make on upgrade, in order to make use of the enhancement?
544565
-->
545566

567+
N/A
568+
546569
### Version Skew Strategy
547570

548571
<!--
@@ -602,16 +625,10 @@ well as the [existing list] of feature gates.
602625
[existing list]: https://kubernetes.io/docs/reference/command-line-tools-reference/feature-gates/
603626
-->
604627

605-
- [ ] Feature gate (also fill in values in `kep.yaml`)
628+
- [X] Feature gate (also fill in values in `kep.yaml`)
606629
- Feature gate name: InformerMetrics
607630
- Components depending on the feature gate:
608631
- components via client-go library
609-
- [ ] Other
610-
- Describe the mechanism:
611-
- Will enabling / disabling the feature require downtime of the control
612-
plane?
613-
- Will enabling / disabling the feature require downtime or reprovisioning
614-
of a node?
615632

616633
###### Does enabling the feature change any default behavior?
617634

@@ -655,7 +672,7 @@ You can take a look at one potential example of such test in:
655672
https://github.com/kubernetes/kubernetes/pull/97058/files#diff-7826f7adbc1996a05ab52e3f5f02429e94b68ce6bce0dc534d1be636154fded3R246-R282
656673
-->
657674

658-
For now, there is no tests for feature enablement/disablement. The unit tests will be added.
675+
For now, there is no tests for feature enablement/disablement. The unit / integration tests will be added.
659676

660677
### Rollout, Upgrade and Rollback Planning
661678

@@ -675,13 +692,17 @@ rollout. Similarly, consider large clusters and how enablement/disablement
675692
will rollout across nodes.
676693
-->
677694

695+
Feature has no impact on rollout/rollback, and no impact on running workloads.
696+
678697
###### What specific metrics should inform a rollback?
679698

680699
<!--
681700
What signals should users be paying attention to when the feature is young
682701
that might indicate a serious problem?
683702
-->
684703

704+
The memory used by this metrics continues to grow, consuming a significant amount
705+
685706
###### Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested?
686707

687708
<!--
@@ -690,12 +711,16 @@ Longer term, we may want to require automated upgrade/rollback tests, but we
690711
are missing a bunch of machinery and tooling and can't do that now.
691712
-->
692713

714+
Not yet. In the alpha releases, we could test this.
715+
693716
###### Is the rollout accompanied by any deprecations and/or removals of features, APIs, fields of API types, flags, etc.?
694717

695718
<!--
696719
Even if applying deprecation policies, they may still surprise some users.
697720
-->
698721

722+
This feature does not deprecate or remove any features/APIs/fields/flags/etc.
723+
699724
### Monitoring Requirements
700725

701726
<!--
@@ -713,6 +738,8 @@ checking if there are objects with field X set) may be a last resort. Avoid
713738
logs or events for this purpose.
714739
-->
715740

741+
- [x] Informer / Reflector (e.g., `lists_total`, `watches_total`) metrics returned by the operator are populated
742+
716743
###### How can someone using this feature know that it is working for their instance?
717744

718745
<!--
@@ -724,13 +751,13 @@ and operation of this feature.
724751
Recall that end users cannot usually observe component logs or access metrics.
725752
-->
726753

727-
- [ ] Events
728-
- Event Reason:
729-
- [ ] API .status
730-
- Condition name:
731-
- Other field:
732-
- [ ] Other (treat as last resort)
754+
- [X] Other (treat as last resort)
733755
- Details:
756+
- The following metrics are available when `InformerMetrics` is enabled:
757+
- lists_total
758+
- watches_total
759+
- last_resource_version
760+
- etc.
734761

735762
###### What are the reasonable SLOs (Service Level Objectives) for the enhancement?
736763

@@ -749,18 +776,19 @@ These goals will help you determine what you need to measure (SLIs) in the next
749776
question.
750777
-->
751778

779+
The feature gate will increase memory usage. The memory usage should not continuously grow.
780+
The informerMetrics / eventHandlerMetrics / reflectorMetrics memory consumption is in a stable state.
781+
752782
###### What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service?
753783

754784
<!--
755785
Pick one more of these and delete the rest.
756786
-->
757787

758-
- [ ] Metrics
788+
- [X] Metrics
759789
- Metric name: Memory usage
760790
- [Optional] Aggregation method:
761791
- Components exposing the metric: Operating System/golang pprof
762-
- [ ] Other (treat as last resort)
763-
- Details:
764792

765793
###### Are there any missing metrics that would be useful to have to improve observability of this feature?
766794

@@ -769,6 +797,8 @@ Describe the metrics themselves and the reasons why they weren't added (e.g., co
769797
implementation difficulties, etc.).
770798
-->
771799

800+
Not at the moment.
801+
772802
### Dependencies
773803

774804
<!--

keps/sig-api-machinery/4346-informer-metrics/kep.yaml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2,9 +2,9 @@ title: Add Informer Metrics
22
kep-number: 4346
33
authors:
44
- "@chenk008"
5-
owning-sig: api-machinery
5+
owning-sig: sig-api-machinery
66
participating-sigs: []
7-
status: provisional
7+
status: implementable
88
creation-date: 2023-11-27
99
reviewers:
1010
- "@deads2k"

0 commit comments

Comments
 (0)