-
Notifications
You must be signed in to change notification settings - Fork 4.2k
Refactor cloud provider creation options #8583
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
this is an alternate solution instead of #8531 |
} | ||
|
||
// Check if any of the old MachineSets still have replicas | ||
replicas, found, err := unstructured.NestedInt64(ms.UnstructuredContent(), "spec", "replicas") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think checking spec first, then status, is the least expensive order, but wanna think out loud and double-check.
When a "new" MachineSet is replacing an "old" MachineSet in the service of a change to the corresponding MachineDeployment, it will first reduce the spec count, and then rely upon the infra provider to validate deletion, and then the MachineSet will apply the final reduction to its status.
If the above is ~true, then I think this is the right way to process through a MachineSet resource representation to definitively determine whether or not the "parent" MachineDeployment resource is in any active state of rolling out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i am less familiar with the upgrade logic inside of cluster-api, i would again defer to @sbueringer here for an answer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the above is ~true
It is
} | ||
|
||
if len(machineSets) == 0 { | ||
// No MachineSets => MD is not rolling out. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are the specific scenarios that would produce a MachineDeployment with zero corresponding MachineSet resources?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good question, this code is taken from a patch that @sbueringer developed. i'll let him answer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
MachineDeployment was created but MD controller did not create MS yet.
Should usually just occur for a very short period of time, but I would prefer handling it explicitly vs going through the code below without any MS
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACK, makes sense. Do we really want to tag a MachineDeployment in such a state as a candidate for scale down? I would think we would return true, nil
here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, in general there will be at most Machines of the new MS if there are even Machines. And it would be okay to downscale these Machines
f60b7f0
to
1afa484
Compare
/test pull-cluster-autoscaler-e2e-azure-master |
the circular import in the unit tests is giving me some problems, not quite sure where it's starting. |
i am not able to reproduce the unit test failure with the circular import locally. not sure the best course of action. /retest |
/retest-required |
/retest |
not sure why the tests aren't triggering. |
This change helps to prevent circular dependencies between the core and builder packages as we start to pass the AutoscalerOptions to the cloud provider builder functions.
this changes the options input to the cloud provider builder function so that the full autoscaler options are passed. This is being proposed so that cloud providers will have new options for injecting behavior into the core parts of the autoscaler.
this change adds a method to the AutoscalerOptions struct to allow registering a scale down node processor.
This change adds a custom scale down node processor for cluster api to reject nodes that are undergoing upgrade.
1afa484
to
a5f07fe
Compare
rebased on master |
i'm not sure how i've broken this, but these error lines are not making sense to me:
|
@towca does this error looks familiar to you at all? |
this change removes the import from the gce module in favor of using the string value directly.
ok, think my last commit on this chain fixed the circular import. |
We have a conversation going on here that will potentially improve the gpu node label concern that this PR encountered: /lgtm for @towca to give his thumbs up |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: elmiko, jackfrancis The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
i'm fine to change the label in the test for this PR, i was just trying to stop the circular import error. |
What type of PR is this?
/kind cleanup
/kind api-change
What this PR does / why we need it:
This patch series changes an argument to the
NewCloudProvider
function to use anAutoscalerOptions
struct instead ofAutoscalingOptions
. This change allows cloud providers to have more control over the core functionality of the cluster autoscaler.In specific, this patch series also adds a method named
RegisterScaleDownNodeProcessor
to theAutoscalerOptions
so that cloud providers can inject a custom scale down processor.Lastly, this change adds a custom scale down processor to the clusterapi provider to help it avoid removing the wrong instance during scale down operations that occur during a cluster upgrade.
Which issue(s) this PR fixes:
Fixes #8494
Special notes for your reviewer:
Does this PR introduce a user-facing change?
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.: