[AKS HOBO] Add support for hosted-on-behalf-of systempool autoscaling #8596

wenxuan0923 · 2025-09-30T21:11:02Z

What type of PR is this?

/kind feature

What this PR does / why we need it:

Added Azure configuration environment variables to support autoscaling for managed (hosted-on-behalf-of) systempools. These systempools consist of mixed-SKU VM sizes and are hosted within an internal AKS tenant rather than the cluster’s subscription. Each SKU is registered with Cluster Autoscaler (CAS) as a distinct NodeGroup.

Which issue(s) this PR fixes:

Fixes #

Special notes for your reviewer:

Does this PR introduce a user-facing change?

azure: Add support for hosted-on-behalf-of systempool autoscaling

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

k8s-ci-robot · 2025-09-30T21:11:12Z

Hi @wenxuan0923. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

jackfrancis · 2025-09-30T21:48:29Z

/ok-to-test

jackfrancis · 2025-09-30T21:53:33Z

cluster-autoscaler/cloudprovider/azure/azure_config.go

 	// It can override the default public ARM endpoint for VMs pool scale operations.
 	ARMBaseURLForAPClient string `json:"armBaseURLForAPClient" yaml:"armBaseURLForAPClient"`

+	// Managed system pool configuration for automatic cluster.


Could we rename these to HoboSubscriptionID, HoboResourceGroup, and HoboResourceProxyURL to distinguish from the existing understanding of "managed" in the vanilla AKS sense?

(same comment for all the env vars, etc)

Thanks @jackfrancis, I initially named them HoboXXX, but @tallaxes felt that it’s too specific to AKS internals so may not be ideal for open-source usage. That’s why I went with the ManagedXXX naming. I'm open to suggestions @tallaxes

Agree "Managed" could be confusing here. I was proposing something like "AlternativeResourceGroup" or "ResourceGroupOverride" to indicate the effect without constraining it to a single purpose (like Hobo). This still feels better to me than Hobo, but if we feel Hobo is clear, and we can't imagine other possible uses for these - I am ok with Hobo

How about just Hosted* for short?

I like Hosted! I have updated the PR, let me know how you feel about it

jackfrancis · 2025-10-01T15:09:45Z

cluster-autoscaler/cloudprovider/azure/azure_config.go

 		}
 	}

+	// A proxy service is required to access resources for the managed system pool within automatic clusters.


micro nit: let's move this up one block, before we check for the ExtendedLocation config, so that all of our "override from default azClientConfig property settings" code is organized together

jackfrancis · 2025-10-01T15:15:38Z

cluster-autoscaler/cloudprovider/azure/azure_vms_pool.go

-		header.Set("Target-Count", fmt.Sprintf("%d", count))
-		updateCtx = policy.WithHTTPHeader(updateCtx, header)
 	}
+	header := make(http.Header)


With this change we appear to be setting HTTP headers that include the Target-Count key/value pair, and hoisting that into the context object that we send to the Azure API, for both self-hosted and AKS-managed. Whereas now we're only doing this in the AKS-managed case.

Why are we making this change to include this behavior for both cases going forward?

It is because Hobo systempool will only have manual scale profile, a simple if-else check is no longer sufficient to distinguish between self-hosted and managed CAS.

I just realized I could add an explicit check for HOBO to make the logic clearer, so I’ve updated the code accordingly. Let me know if that makes sense. Thanks!

+1, thanks!

jackfrancis · 2025-10-01T21:32:20Z

/lgtm
/approve
/hold

for @tallaxes to have a look

jackfrancis · 2025-10-01T21:33:04Z

/release-note-edit

azure: Add support for hosted-on-behalf-of systempool autoscaling

wenxuan0923 · 2025-10-03T21:48:57Z

@tallaxes please take a look when you get a chance! thank you!

comtalyst · 2025-10-06T17:21:11Z

cluster-autoscaler/cloudprovider/azure/azure_vms_pool.go

+	// hosted CAS will be using Autoscale scale profile
+	// HostedSystem will be using manual scale profile
+	// Both of them need to set the Target-Count and SKU headers
+	if len(versionedAP.Properties.VirtualMachinesProfile.Scale.Autoscale) > 0 ||


What are the new states that make us stop using else?

It is because Hobo systempool will only have manual scale profile, but its scaling request will be processed by NPS, so a simple if-else check on the scale profile type is no longer sufficient to distinguish between self-hosted and managed CAS.

Is my understanding below correct?

Before:

Self-hosted: have manual scale profile: go to the if only

Managed: don't have manual scale profile: go the else only

Now:

Self-hosted: have manual scale profile: go to the if only

Managed: don't have manual scale profile AND have auto scale profile: go the second if only

HOBO: have manual scale profile and HostedSystem mode: go to both

If not introducing the new if, it would go to the if only, while we want it to go to the else as well due to it being managed

comtalyst · 2025-10-06T17:24:21Z

cluster-autoscaler/cloudprovider/azure/azure_cache.go


 func newAzureCache(client *azClient, cacheTTL time.Duration, config Config) (*azureCache, error) {
+	nodeResourceGroup := config.ResourceGroup
+	if config.HostedResourceGroup != "" {


Similar to in azure_config.go, do you mind adding a comment on the purpose of this?

Comments added

comtalyst · 2025-10-06T23:07:29Z

/label tide/merge-method-squash

comtalyst · 2025-10-06T23:08:43Z

/lgtm
/approve

k8s-ci-robot · 2025-10-06T23:08:53Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: comtalyst, jackfrancis, wenxuan0923

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~cluster-autoscaler/cloudprovider/azure/OWNERS~~ [comtalyst,jackfrancis]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

jackfrancis · 2025-10-06T23:14:19Z

/hold cancel

k8s-ci-robot removed the do-not-merge/needs-area label Sep 30, 2025

k8s-ci-robot requested review from Bryce-Soghigian and feiskyer September 30, 2025 21:11

k8s-ci-robot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Sep 30, 2025

k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Sep 30, 2025

jackfrancis reviewed Sep 30, 2025

View reviewed changes

wenxuan0923 force-pushed the wenx/hobo-cas branch from abd5fd3 to 20fcc96 Compare September 30, 2025 23:13

jackfrancis reviewed Oct 1, 2025

View reviewed changes

wenxuan0923 force-pushed the wenx/hobo-cas branch from 20fcc96 to 7fcb519 Compare October 1, 2025 21:18

k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Oct 1, 2025

k8s-ci-robot assigned jackfrancis Oct 1, 2025

k8s-ci-robot added lgtm "Looks good to me", indicates that a PR is ready to be merged. approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Oct 1, 2025

comtalyst approved these changes Oct 6, 2025

View reviewed changes

k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 6, 2025

squash

e8ba5dc

wenxuan0923 force-pushed the wenx/hobo-cas branch from 20ef60f to e8ba5dc Compare October 6, 2025 17:35

k8s-ci-robot added the tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges. label Oct 6, 2025

k8s-ci-robot assigned comtalyst Oct 6, 2025

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 6, 2025

k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Oct 6, 2025

k8s-ci-robot merged commit 2944dcb into kubernetes:master Oct 6, 2025
8 checks passed

[AKS HOBO] Add support for hosted-on-behalf-of systempool autoscaling #8596

[AKS HOBO] Add support for hosted-on-behalf-of systempool autoscaling #8596

Conversation

wenxuan0923 commented Sep 30, 2025 • edited by k8s-ci-robot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What type of PR is this?

What this PR does / why we need it:

Which issue(s) this PR fixes:

Special notes for your reviewer:

Does this PR introduce a user-facing change?

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

Uh oh!

k8s-ci-robot commented Sep 30, 2025

Uh oh!

jackfrancis commented Sep 30, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wenxuan0923 Sep 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jackfrancis commented Oct 1, 2025

Uh oh!

jackfrancis commented Oct 1, 2025

Uh oh!

wenxuan0923 commented Oct 3, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

comtalyst commented Oct 6, 2025

Uh oh!

comtalyst commented Oct 6, 2025

Uh oh!

k8s-ci-robot commented Oct 6, 2025

Uh oh!

jackfrancis commented Oct 6, 2025

Uh oh!

Uh oh!

Uh oh!

wenxuan0923 commented Sep 30, 2025 •

edited by k8s-ci-robot

Loading

wenxuan0923 Sep 30, 2025 •

edited

Loading