Skip to content

Commit b7f9d25

Browse files
committed
Set beta graduation to v1.34.
1 parent 3a4c03f commit b7f9d25

File tree

3 files changed

+55
-13
lines changed

3 files changed

+55
-13
lines changed
Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,5 @@
11
kep-number: 4951
22
alpha:
33
approver: "@soltysh"
4+
beta:
5+
approver: "@soltysh"

keps/sig-autoscaling/4951-configurable-hpa-tolerance/README.md

Lines changed: 50 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -67,6 +67,7 @@ tags, and then generate with `hack/update-toc.sh`.
6767
- [e2e tests](#e2e-tests)
6868
- [Graduation Criteria](#graduation-criteria)
6969
- [Alpha](#alpha)
70+
- [Beta](#beta)
7071
- [Upgrade / Downgrade Strategy](#upgrade--downgrade-strategy)
7172
- [Upgrade](#upgrade)
7273
- [Downgrade](#downgrade)
@@ -106,16 +107,16 @@ Items marked with (R) are required *prior to targeting to a milestone / release*
106107
- [x] (R) KEP approvers have approved the KEP status as `implementable`
107108
- [x] (R) Design details are appropriately documented
108109
- [ ] (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input (including test refactors)
109-
- [ ] e2e Tests for all Beta API Operations (endpoints)
110+
- [x] e2e Tests for all Beta API Operations (endpoints)
110111
- [ ] (R) Ensure GA e2e tests meet requirements for [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md)
111112
- [ ] (R) Minimum Two Week Window for GA e2e tests to prove flake free
112113
- [ ] (R) Graduation criteria is in place
113114
- [ ] (R) [all GA Endpoints](https://github.com/kubernetes/community/pull/1806) must be hit by [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md)
114-
- [ ] (R) Production readiness review completed
115-
- [ ] (R) Production readiness review approved
115+
- [x] (R) Production readiness review completed
116+
- [x] (R) Production readiness review approved
116117
- [ ] "Implementation History" section is up-to-date for milestone
117-
- [ ] User-facing documentation has been created in [kubernetes/website], for publication to [kubernetes.io]
118-
- [ ] Supporting documentation—e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes
118+
- [x] User-facing documentation has been created in [kubernetes/website], for publication to [kubernetes.io]
119+
- [x] Supporting documentation—e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes
119120

120121
[kubernetes.io]: https://kubernetes.io/
121122
[kubernetes/enhancements]: https://git.k8s.io/enhancements
@@ -355,7 +356,8 @@ We expect no non-infra related flakes in the last month as a GA graduation crite
355356
We will add the follow [e2e autoscaling tests]:
356357

357358
- For both scale up and scale down:
358-
- Workload does not scale because the metric ratio is in tolerance.
359+
- Workload does not scale because the metric ratio is in tolerance
360+
([PR](https://github.com/kubernetes/kubernetes/pull/130797/commits/4db8e8cc1dc2e5683c878b3ef29cb2e0fbe70f80#diff-832ab9989fa2683f7848ae1607c9a9aaa2bd245e5374efa0c5a87ba8edab464a)).
359361
- Workload scales successfully because the metric ratio is out of tolerance.
360362
- Autoscaling uses the default when no tolerances are set.
361363

@@ -430,6 +432,12 @@ in back-to-back releases.
430432
- Feature implemented behind a `HPAConfigurableTolerance` feature flag
431433
- Initial e2e tests completed and enabled
432434

435+
#### Beta
436+
437+
- All tests described in the [`e2e tests` section](#e2e-tests) are implemented
438+
and linked in this KEP.
439+
- We have monitored negative user feedback and addressed relevant concerns.
440+
433441
### Upgrade / Downgrade Strategy
434442

435443
#### Upgrade
@@ -543,7 +551,7 @@ You can take a look at one potential example of such test in:
543551
https://github.com/kubernetes/kubernetes/pull/97058/files#diff-7826f7adbc1996a05ab52e3f5f02429e94b68ce6bce0dc534d1be636154fded3R246-R282
544552
-->
545553

546-
We will add a unit test verifying that HPAs with and without the new fields are
554+
[Unit tests have been added](https://github.com/kubernetes/kubernetes/pull/130797/commits/a41284d9fa3a3d5a5e8760db6e9fd4f7e5e6fca6#diff-98f8520444a477d01c5cc2e56f92939d5fb07893a234b8fee5b67c7c147a20e0) to verify that HPAs with and without the new fields are
547555
properly validated, both when the feature gate is enabled or not.
548556

549557
### Rollout, Upgrade and Rollback Planning
@@ -564,13 +572,20 @@ rollout. Similarly, consider large clusters and how enablement/disablement
564572
will rollout across nodes.
565573
-->
566574

575+
This feature does not introduce new failure modes: during rollout/rollback, some
576+
API servers will allow or disallow setting the new 'tolerance' field. The new
577+
field is possibly ignored until the controller manager is fully updated.
578+
567579
###### What specific metrics should inform a rollback?
568580

569581
<!--
570582
What signals should users be paying attention to when the feature is young
571583
that might indicate a serious problem?
572584
-->
573585

586+
A high `horizontal_pod_autoscaler_controller_metric_computation_duration_seconds`
587+
metric can indicate a problem related to this feature.
588+
574589
###### Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested?
575590

576591
<!--
@@ -579,12 +594,18 @@ Longer term, we may want to require automated upgrade/rollback tests, but we
579594
are missing a bunch of machinery and tooling and can't do that now.
580595
-->
581596

597+
I have manually tested a cluster upgrade, and this feature is in alpha without
598+
(to the best of our knowledge) any user reporting an issue. GKE has automated
599+
upgrade/downgrade tests that did not report any issue.
600+
582601
###### Is the rollout accompanied by any deprecations and/or removals of features, APIs, fields of API types, flags, etc.?
583602

584603
<!--
585604
Even if applying deprecation policies, they may still surprise some users.
586605
-->
587606

607+
No.
608+
588609
### Monitoring Requirements
589610

590611
<!--
@@ -625,9 +646,9 @@ values. Users can get both values using
625646
and use them to verify that scaling events are triggered when their ratio is out
626647
of tolerance.
627648

628-
We will update the controller-manager logs to help users understand the behavior
629-
of the autoscaler. The data added to the logs will include the tolerance used
630-
for each scaling decision.
649+
The [controller-manager logs have been updated](https://github.com/kubernetes/kubernetes/pull/130797/commits/2dd9eda47ffd5556ff90446e91d22ddbecc05d2c#diff-f1c5a31aa8fb8e3fd64b6aa13d3358b504e6e25030f249f1652e244c105eafc7R846)
650+
to help users understand the behavior of the autoscaler. The data added to the
651+
logs includes the tolerance used for each scaling decision.
631652

632653
###### What are the reasonable SLOs (Service Level Objectives) for the enhancement?
633654

@@ -698,6 +719,8 @@ and creating new ones, as well as about cluster-level services (e.g. DNS):
698719
- Impact of its degraded performance or high-error rates on the feature:
699720
-->
700721

722+
No, this feature does not depend on any specific service.
723+
701724
### Scalability
702725

703726
<!--
@@ -817,6 +840,8 @@ details). For now, we leave it here.
817840

818841
###### How does this feature react if the API server and/or etcd is unavailable?
819842

843+
API server or etcd issues do not impact this feature.
844+
820845
###### What are other known failure modes?
821846

822847
<!--
@@ -832,8 +857,14 @@ For each of them, fill in the following information by copying the below templat
832857
- Testing: Are there any tests for failure mode? If not, describe why.
833858
-->
834859

860+
We do not expect any new failure mode. (While setting inappropriate `tolerance`
861+
values may cause HPAs to react too slowly or too fast, the feature is working as
862+
intended.)
863+
835864
###### What steps should be taken if SLOs are not being met to determine the problem?
836865

866+
N/A.
867+
837868
## Implementation History
838869

839870
<!--
@@ -848,13 +879,17 @@ Major milestones might include:
848879
-->
849880

850881
2025-01-21: KEP PR merged.
882+
2025-03-24: [Implementation PR](https://github.com/kubernetes/kubernetes/pull/130797) merged.
883+
2025-05-15: Kubernetes v1.33 released (includes this feature).
851884

852885
## Drawbacks
853886

854887
<!--
855888
Why should this KEP _not_ be implemented?
856889
-->
857890

891+
No major drawbacks have been identified.
892+
858893
## Alternatives
859894

860895
<!--
@@ -863,10 +898,15 @@ not need to be as detailed as the proposal, but should include enough
863898
information to express the idea and why it was not acceptable.
864899
-->
865900

901+
On non-managed Kubernetes instances, users can update the cluster-wide
902+
`--horizontal-pod-autoscaler-tolerance` tolerance parameter,
903+
866904
## Infrastructure Needed (Optional)
867905

868906
<!--
869907
Use this section if you need things from the project/SIG. Examples include a
870908
new subproject, repos requested, or GitHub details. Listing these here allows a
871909
SIG to get the process for these resources started right away.
872910
-->
911+
912+
N/A.

keps/sig-autoscaling/4951-configurable-hpa-tolerance/kep.yaml

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -17,17 +17,17 @@ see-also:
1717
replaces:
1818

1919
# The target maturity stage in the current dev cycle for this KEP.
20-
stage: alpha
20+
stage: beta
2121

2222
# The most recent milestone for which work toward delivery of this KEP has been
2323
# done. This can be the current (upcoming) milestone, if it is being actively
2424
# worked on.
25-
latest-milestone: "v1.33"
25+
latest-milestone: "v1.34"
2626

2727
# The milestone at which this feature was, or is targeted to be, at each stage.
2828
milestone:
2929
alpha: "v1.33"
30-
beta: TBD
30+
beta: "v1.34"
3131
stable: TBD
3232

3333
# The following PRR answers are required at alpha release

0 commit comments

Comments
 (0)