CAPA Cluster Priority Expander Fails to Scale to Shared Tenancy When Dedicated Hosts Have Insufficient Capacity

/kind bug

**What steps did you take and what happened:**
Issue Summary: CAPA cluster with dual nodegroup setup (dedicated hosts + shared tenancy) fails to scale to shared tenancy instances when dedicated hosts reach capacity limits. The cluster autoscaler gets stuck and does not create new nodes on the low-priority shared tenancy nodegroup as expected.

Environment:

    Platform: CAPA-based Kubernetes clusters with  MachinePool Template for dedicated hosts and Shared tenancy
    ClusterAutoscaler Configuration: expander=priority,least-waste
    Priority Setup: Dedicated Hosts (Priority 100) → Shared Tenancy (Priority 10)
    AutoDiscover enabled on cluster Autoscalar


apiVersion: v1
kind: ConfigMap
metadata:
  name: cluster-autoscaler-priority-expander
  namespace: kube-system
data:
  priorities: |-
    100:
      - .*HostTenancy*
    10:
      - .*Shared*

Steps to Reproduce

    Setup CAPA cluster with dual MachinePool configuration:

    MachinePool 1: Dedicated hosts tenancy (nodegroup name contains "HostTenancy")
    MachinePool 2: Shared tenancy (nodegroup name contains "Shared")

    Configure ClusterAutoscaler with priority expander and ConfigMap as shown above

    Trigger scaling event when dedicated hosts are at or near capacity limit

    Observe behavior: Scaling attempts fail to create nodes on shared tenancy MachinePool


**What did you expect to happen:**

Expected Behavior:

    ClusterAutoscaler attempts to scale dedicated hosts nodegroup (priority 100)
    When dedicated hosts have insufficient capacity → automatic fallback to shared tenancy nodegroup (priority 10)
    New nodes successfully created on shared tenancy instances
    Workload scheduling continues seamlessly

Actual Behavior:

    ClusterAutoscaler attempts to scale dedicated hosts nodegroup ✅
    Dedicated hosts capacity insufficient ❌
    FAILURE: Scaling gets stuck - no fallback to shared tenancy nodegroup
    RESULT: No new nodes created, workload scheduling fails


**Anything else you would like to add:**

This solution works when used with autoscaling groups (non - Capa setup)
    Same priority expander configuration
    Same ConfigMap setup
    Successfully falls back to shared tenancy ASG when dedicated hosts at capacity
    Key Difference: Uses native ASGs instead of CAPA MachinePools



**Environment:**
- Cluster-api-provider-aws version: 1.24.16,
- Kubernetes version: (use `kubectl version`):1.24.16,
 - Cluster Autoscalar version:  v1.31.0


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CAPA Cluster Priority Expander Fails to Scale to Shared Tenancy When Dedicated Hosts Have Insufficient Capacity #5853

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

CAPA Cluster Priority Expander Fails to Scale to Shared Tenancy When Dedicated Hosts Have Insufficient Capacity #5853

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions