-
Notifications
You must be signed in to change notification settings - Fork 665
Description
/kind bug
What steps did you take and what happened:
Issue Summary: CAPA cluster with dual nodegroup setup (dedicated hosts + shared tenancy) fails to scale to shared tenancy instances when dedicated hosts reach capacity limits. The cluster autoscaler gets stuck and does not create new nodes on the low-priority shared tenancy nodegroup as expected.
Environment:
Platform: CAPA-based Kubernetes clusters with MachinePool Template for dedicated hosts and Shared tenancy
ClusterAutoscaler Configuration: expander=priority,least-waste
Priority Setup: Dedicated Hosts (Priority 100) → Shared Tenancy (Priority 10)
AutoDiscover enabled on cluster Autoscalar
apiVersion: v1
kind: ConfigMap
metadata:
name: cluster-autoscaler-priority-expander
namespace: kube-system
data:
priorities: |-
100:
- .HostTenancy
10:
- .Shared
Steps to Reproduce
Setup CAPA cluster with dual MachinePool configuration:
MachinePool 1: Dedicated hosts tenancy (nodegroup name contains "HostTenancy")
MachinePool 2: Shared tenancy (nodegroup name contains "Shared")
Configure ClusterAutoscaler with priority expander and ConfigMap as shown above
Trigger scaling event when dedicated hosts are at or near capacity limit
Observe behavior: Scaling attempts fail to create nodes on shared tenancy MachinePool
What did you expect to happen:
Expected Behavior:
ClusterAutoscaler attempts to scale dedicated hosts nodegroup (priority 100)
When dedicated hosts have insufficient capacity → automatic fallback to shared tenancy nodegroup (priority 10)
New nodes successfully created on shared tenancy instances
Workload scheduling continues seamlessly
Actual Behavior:
ClusterAutoscaler attempts to scale dedicated hosts nodegroup ✅
Dedicated hosts capacity insufficient ❌
FAILURE: Scaling gets stuck - no fallback to shared tenancy nodegroup
RESULT: No new nodes created, workload scheduling fails
Anything else you would like to add:
This solution works when used with autoscaling groups (non - Capa setup)
Same priority expander configuration
Same ConfigMap setup
Successfully falls back to shared tenancy ASG when dedicated hosts at capacity
Key Difference: Uses native ASGs instead of CAPA MachinePools
Environment:
- Cluster-api-provider-aws version: 1.24.16,
- Kubernetes version: (use
kubectl version):1.24.16, - Cluster Autoscalar version: v1.31.0