-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Open
Labels
area/autoscalinghelp wantedDenotes an issue that needs help from a contributor. Must meet "help wanted" guidelines.Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines.kind/featureCategorizes issue or PR as related to a new feature.Categorizes issue or PR as related to a new feature.priority/important-soonMust be staffed and worked on either currently, or very soon, ideally in time for the next release.Must be staffed and worked on either currently, or very soon, ideally in time for the next release.triage/acceptedIndicates an issue or PR is ready to be actively worked on.Indicates an issue or PR is ready to be actively worked on.
Description
While investigating an autoscaler issue we identified a few areas of improvement of the autoscaler - CAPI integration
We should look into the following areas:
- First temporary/stopgap solution for CAPI controller and autoscaler fighting over replicas during MD rollouts (see: CA ClusterAPI provider can delete wrong node when scale-down occurs during MachineDeployment upgrade kubernetes/autoscaler#8494)
- Improve behavior during Machine deletion (including Node drain etc.)
- Today autoscaler cordons/taints/drains Nodes before triggering Machine deletion
- This means that the CAPI Machine deletion logic is not respected (pre-drain hooks, MachineDrainRules, drain observability, ...)
- An idea: Maybe we want to disable cordon/taint/drain in autoscaler, options:
- Via a global flag to allow disabling drain
- Extend the
CloudProvider
interface with a new Method to allow disabling drain per node group - Extending the
GetOptions
method of theNodeGroup
interface to allow disabling drain per node group
- Double-check that autoscaler does not scale up to many Machines based on pending Pods
- Not entirely sure, but it looks like we observed autoscaler scaling up twice within 12 seconds because of just 1 pending Pod
- Find a final solution for autoscaling during MD rollouts
- Improve how autoscaler triggers Machine deletion (
delete-machine
annotation on MS-level + MD scale down is a weak/no API)
- Improve how autoscaler triggers Machine deletion (
Metadata
Metadata
Assignees
Labels
area/autoscalinghelp wantedDenotes an issue that needs help from a contributor. Must meet "help wanted" guidelines.Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines.kind/featureCategorizes issue or PR as related to a new feature.Categorizes issue or PR as related to a new feature.priority/important-soonMust be staffed and worked on either currently, or very soon, ideally in time for the next release.Must be staffed and worked on either currently, or very soon, ideally in time for the next release.triage/acceptedIndicates an issue or PR is ready to be actively worked on.Indicates an issue or PR is ready to be actively worked on.