You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@@ -106,16 +107,16 @@ Items marked with (R) are required *prior to targeting to a milestone / release*
106
107
-[x] (R) KEP approvers have approved the KEP status as `implementable`
107
108
-[x] (R) Design details are appropriately documented
108
109
-[ ] (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input (including test refactors)
109
-
-[] e2e Tests for all Beta API Operations (endpoints)
110
+
-[x] e2e Tests for all Beta API Operations (endpoints)
110
111
-[ ] (R) Ensure GA e2e tests meet requirements for [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md)
111
112
-[ ] (R) Minimum Two Week Window for GA e2e tests to prove flake free
112
113
-[ ] (R) Graduation criteria is in place
113
114
-[ ] (R) [all GA Endpoints](https://github.com/kubernetes/community/pull/1806) must be hit by [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md)
114
-
-[] (R) Production readiness review completed
115
-
-[] (R) Production readiness review approved
115
+
-[x] (R) Production readiness review completed
116
+
-[x] (R) Production readiness review approved
116
117
-[ ] "Implementation History" section is up-to-date for milestone
117
-
-[] User-facing documentation has been created in [kubernetes/website], for publication to [kubernetes.io]
118
-
-[] Supporting documentation—e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes
118
+
-[x] User-facing documentation has been created in [kubernetes/website], for publication to [kubernetes.io]
119
+
-[x] Supporting documentation—e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes
We will add a unit test verifying that HPAs with and without the new fields are
554
+
[Unit tests have been added](https://github.com/kubernetes/kubernetes/pull/130797/commits/a41284d9fa3a3d5a5e8760db6e9fd4f7e5e6fca6#diff-98f8520444a477d01c5cc2e56f92939d5fb07893a234b8fee5b67c7c147a20e0) to verify that HPAs with and without the new fields are
547
555
properly validated, both when the feature gate is enabled or not.
548
556
549
557
### Rollout, Upgrade and Rollback Planning
@@ -564,13 +572,20 @@ rollout. Similarly, consider large clusters and how enablement/disablement
564
572
will rollout across nodes.
565
573
-->
566
574
575
+
This feature does not introduce new failure modes: during rollout/rollback, some
576
+
API servers will allow or disallow setting the new 'tolerance' field. The new
577
+
field is possibly ignored until the controller manager is fully updated.
578
+
567
579
###### What specific metrics should inform a rollback?
568
580
569
581
<!--
570
582
What signals should users be paying attention to when the feature is young
571
583
that might indicate a serious problem?
572
584
-->
573
585
586
+
A high `horizontal_pod_autoscaler_controller_metric_computation_duration_seconds`
587
+
metric can indicate a problem related to this feature.
588
+
574
589
###### Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested?
575
590
576
591
<!--
@@ -579,12 +594,18 @@ Longer term, we may want to require automated upgrade/rollback tests, but we
579
594
are missing a bunch of machinery and tooling and can't do that now.
580
595
-->
581
596
597
+
I have manually tested a cluster upgrade, and this feature is in alpha without
598
+
(to the best of our knowledge) any user reporting an issue. GKE has automated
599
+
upgrade/downgrade tests that did not report any issue.
600
+
582
601
###### Is the rollout accompanied by any deprecations and/or removals of features, APIs, fields of API types, flags, etc.?
583
602
584
603
<!--
585
604
Even if applying deprecation policies, they may still surprise some users.
586
605
-->
587
606
607
+
No.
608
+
588
609
### Monitoring Requirements
589
610
590
611
<!--
@@ -625,9 +646,9 @@ values. Users can get both values using
625
646
and use them to verify that scaling events are triggered when their ratio is out
626
647
of tolerance.
627
648
628
-
We will update the controller-manager logs to help users understand the behavior
629
-
of the autoscaler. The data added to the logs will include the tolerance used
630
-
for each scaling decision.
649
+
The [controller-manager logs have been updated](https://github.com/kubernetes/kubernetes/pull/130797/commits/2dd9eda47ffd5556ff90446e91d22ddbecc05d2c#diff-f1c5a31aa8fb8e3fd64b6aa13d3358b504e6e25030f249f1652e244c105eafc7R846)
650
+
to help users understand the behavior of the autoscaler. The data added to the
651
+
logs includes the tolerance used for each scaling decision.
631
652
632
653
###### What are the reasonable SLOs (Service Level Objectives) for the enhancement?
633
654
@@ -698,6 +719,8 @@ and creating new ones, as well as about cluster-level services (e.g. DNS):
698
719
- Impact of its degraded performance or high-error rates on the feature:
699
720
-->
700
721
722
+
No, this feature does not depend on any specific service.
723
+
701
724
### Scalability
702
725
703
726
<!--
@@ -817,6 +840,8 @@ details). For now, we leave it here.
817
840
818
841
###### How does this feature react if the API server and/or etcd is unavailable?
819
842
843
+
API server or etcd issues do not impact this feature.
844
+
820
845
###### What are other known failure modes?
821
846
822
847
<!--
@@ -832,8 +857,14 @@ For each of them, fill in the following information by copying the below templat
832
857
- Testing: Are there any tests for failure mode? If not, describe why.
833
858
-->
834
859
860
+
We do not expect any new failure mode. (While setting inappropriate `tolerance`
861
+
values may cause HPAs to react too slowly or too fast, the feature is working as
862
+
intended.)
863
+
835
864
###### What steps should be taken if SLOs are not being met to determine the problem?
836
865
866
+
N/A.
867
+
837
868
## Implementation History
838
869
839
870
<!--
@@ -848,13 +879,17 @@ Major milestones might include:
0 commit comments