-
Notifications
You must be signed in to change notification settings - Fork 4.2k
[AKS HOBO] Add support for hosted-on-behalf-of systempool autoscaling #8596
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Hi @wenxuan0923. Thanks for your PR. I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
/ok-to-test |
// It can override the default public ARM endpoint for VMs pool scale operations. | ||
ARMBaseURLForAPClient string `json:"armBaseURLForAPClient" yaml:"armBaseURLForAPClient"` | ||
|
||
// Managed system pool configuration for automatic cluster. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we rename these to HoboSubscriptionID
, HoboResourceGroup
, and HoboResourceProxyURL
to distinguish from the existing understanding of "managed" in the vanilla AKS sense?
(same comment for all the env vars, etc)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @jackfrancis, I initially named them HoboXXX, but @tallaxes felt that it’s too specific to AKS internals so may not be ideal for open-source usage. That’s why I went with the ManagedXXX naming. I'm open to suggestions @tallaxes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree "Managed" could be confusing here. I was proposing something like "AlternativeResourceGroup" or "ResourceGroupOverride" to indicate the effect without constraining it to a single purpose (like Hobo). This still feels better to me than Hobo, but if we feel Hobo is clear, and we can't imagine other possible uses for these - I am ok with Hobo
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about just Hosted* for short?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like Hosted! I have updated the PR, let me know how you feel about it
abd5fd3
to
20fcc96
Compare
} | ||
} | ||
|
||
// A proxy service is required to access resources for the managed system pool within automatic clusters. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
micro nit: let's move this up one block, before we check for the ExtendedLocation config, so that all of our "override from default azClientConfig property settings" code is organized together
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated!
header.Set("Target-Count", fmt.Sprintf("%d", count)) | ||
updateCtx = policy.WithHTTPHeader(updateCtx, header) | ||
} | ||
header := make(http.Header) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With this change we appear to be setting HTTP headers that include the Target-Count key/value pair, and hoisting that into the context object that we send to the Azure API, for both self-hosted and AKS-managed. Whereas now we're only doing this in the AKS-managed case.
Why are we making this change to include this behavior for both cases going forward?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is because Hobo systempool will only have manual scale profile, a simple if-else check is no longer sufficient to distinguish between self-hosted and managed CAS.
I just realized I could add an explicit check for HOBO to make the logic clearer, so I’ve updated the code accordingly. Let me know if that makes sense. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, thanks!
20fcc96
to
7fcb519
Compare
/lgtm for @tallaxes to have a look |
/release-note-edit
|
@tallaxes please take a look when you get a chance! thank you! |
// hosted CAS will be using Autoscale scale profile | ||
// HostedSystem will be using manual scale profile | ||
// Both of them need to set the Target-Count and SKU headers | ||
if len(versionedAP.Properties.VirtualMachinesProfile.Scale.Autoscale) > 0 || |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are the new states that make us stop using else
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is because Hobo systempool will only have manual scale profile, but its scaling request will be processed by NPS, so a simple if-else check on the scale profile type is no longer sufficient to distinguish between self-hosted and managed CAS.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is my understanding below correct?
Before:
- Self-hosted: have manual scale profile: go to the
if
only - Managed: don't have manual scale profile: go the
else
only
Now:
- Self-hosted: have manual scale profile: go to the
if
only - Managed: don't have manual scale profile AND have auto scale profile: go the
second if
only - HOBO: have manual scale profile and
HostedSystem
mode: go to both- If not introducing the new
if
, it would go to theif
only, while we want it to go to theelse
as well due to it being managed
- If not introducing the new
|
||
func newAzureCache(client *azClient, cacheTTL time.Duration, config Config) (*azureCache, error) { | ||
nodeResourceGroup := config.ResourceGroup | ||
if config.HostedResourceGroup != "" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar to in azure_config.go, do you mind adding a comment on the purpose of this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comments added
20ef60f
to
e8ba5dc
Compare
/label tide/merge-method-squash |
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: comtalyst, jackfrancis, wenxuan0923 The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/hold cancel |
What type of PR is this?
/kind feature
What this PR does / why we need it:
Added Azure configuration environment variables to support autoscaling for managed (hosted-on-behalf-of) systempools. These systempools consist of mixed-SKU VM sizes and are hosted within an internal AKS tenant rather than the cluster’s subscription. Each SKU is registered with Cluster Autoscaler (CAS) as a distinct NodeGroup.
Which issue(s) this PR fixes:
Fixes #
Special notes for your reviewer:
Does this PR introduce a user-facing change?
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.: