Skip to content

OCPSTRAT-2915: CPU-based control plane autoscaling#1946

Draft
csrwng wants to merge 1 commit intoopenshift:masterfrom
csrwng:cpu-autoscaling
Draft

OCPSTRAT-2915: CPU-based control plane autoscaling#1946
csrwng wants to merge 1 commit intoopenshift:masterfrom
csrwng:cpu-autoscaling

Conversation

@csrwng
Copy link
Contributor

@csrwng csrwng commented Feb 23, 2026

Summary

  • Adds enhancement proposal for extending HyperShift resource-based
    control plane autoscaling to consider CPU usage in addition to memory
  • Introduces per-size resource fraction overrides in
    ClusterSizingConfiguration, allowing different fractions for
    different cluster sizes
  • Sizing decision uses the maximum of CPU and memory VPA
    recommendations to prevent under-provisioning on either dimension

Test plan

  • Review enhancement content for completeness
  • Verify API extension design is backward compatible
  • Confirm open questions are addressed before implementation

🤖 Generated with Claude Code

@openshift-ci-robot
Copy link

openshift-ci-robot commented Feb 23, 2026

@csrwng: This pull request references OCPSTRAT-2915 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the feature to target the "4.22.0" version, but no target version was set.

Details

In response to this:

Summary

  • Adds enhancement proposal for extending HyperShift resource-based
    control plane autoscaling to consider CPU usage in addition to memory
  • Introduces per-size resource fraction overrides in
    ClusterSizingConfiguration, allowing different fractions for
    different cluster sizes
  • Sizing decision uses the maximum of CPU and memory VPA
    recommendations to prevent under-provisioning on either dimension

Test plan

  • Review enhancement content for completeness
  • Verify API extension design is backward compatible
  • Confirm open questions are addressed before implementation

🤖 Generated with Claude Code

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Feb 23, 2026
@openshift-ci openshift-ci bot requested review from enxebre and sjenning February 23, 2026 17:24
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Feb 23, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign derekwaynecarr for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Extends the existing resource-based control plane autoscaling in
HyperShift to consider CPU usage in addition to memory, and allows
per-size resource fraction overrides in ClusterSizingConfiguration.

Tracking: https://issues.redhat.com/browse/OCPSTRAT-2915

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@csrwng csrwng changed the title Enhancement OCPSTRAT-2915: CPU-based control plane autoscaling OCPSTRAT-2915: CPU-based control plane autoscaling Feb 23, 2026
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Feb 23, 2026

@csrwng: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/markdownlint 987afae link true /test markdownlint

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@csrwng csrwng marked this pull request as draft February 23, 2026 19:33
@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Feb 23, 2026
Copy link

@joshbranham joshbranham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excited to see this make progress, left some comments, let me know if I can help 👍

Where `effectiveMemoryFraction` returns the per-size memory
fraction if set, otherwise the global memory fraction, otherwise
the default (0.65). The same precedence applies to
`effectiveCPUFraction`.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the default will be 65% for CPU as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so far yes, because I had to put something, but it'd be great if we could base the default on real data

Comment on lines +139 to +140
be consistently ordered across sizes (i.e., a size with more
memory also has more CPU).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately as deployed today in ROSA this is not the case. We have sizes m5.4xlarge > r5.4xlarge which is only an increase in memory. I don't think it will matter much as both have 16 vCPU, and our issues with CPU are mostly isolated to the smaller 4/8 vCPU types.

Comment on lines +610 to +614
- **Disabling CPU-based sizing**: To revert to memory-only
sizing, remove the `kubeAPIServerCPUFraction` field from the
`ClusterSizingConfiguration` spec and remove any per-size CPU
fraction overrides. The controller will fall back to memory-only
sizing on its next reconciliation.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this true? Today we are not setting kubeAPIServerMemoryFraction and letting the default take place.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, let me update it. I think you're right in that we would have a default even if we don't put anything in there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants