Skip to content

Improve cluster-autoscaler integration #12762

@sbueringer

Description

@sbueringer

While investigating an autoscaler issue we identified a few areas of improvement of the autoscaler - CAPI integration

We should look into the following areas:

  • First temporary/stopgap solution for CAPI controller and autoscaler fighting over replicas during MD rollouts (see: CA ClusterAPI provider can delete wrong node when scale-down occurs during MachineDeployment upgrade kubernetes/autoscaler#8494)
  • Improve behavior during Machine deletion (including Node drain etc.)
    • Today autoscaler cordons/taints/drains Nodes before triggering Machine deletion
    • This means that the CAPI Machine deletion logic is not respected (pre-drain hooks, MachineDrainRules, drain observability, ...)
    • An idea: Maybe we want to disable cordon/taint/drain in autoscaler, options:
      • Via a global flag to allow disabling drain
      • Extend the CloudProvider interface with a new Method to allow disabling drain per node group
      • Extending the GetOptions method of the NodeGroup interface to allow disabling drain per node group
  • Double-check that autoscaler does not scale up to many Machines based on pending Pods
    • Not entirely sure, but it looks like we observed autoscaler scaling up twice within 12 seconds because of just 1 pending Pod
  • Find a final solution for autoscaling during MD rollouts
    • Improve how autoscaler triggers Machine deletion (delete-machine annotation on MS-level + MD scale down is a weak/no API)

Metadata

Metadata

Assignees

No one assigned

    Labels

    area/autoscalinghelp wantedDenotes an issue that needs help from a contributor. Must meet "help wanted" guidelines.kind/featureCategorizes issue or PR as related to a new feature.priority/important-soonMust be staffed and worked on either currently, or very soon, ideally in time for the next release.triage/acceptedIndicates an issue or PR is ready to be actively worked on.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions