-
Notifications
You must be signed in to change notification settings - Fork 407
Open
Labels
area/koord-deschedulerkind/proposalCreate a report to help us improveCreate a report to help us improve
Milestone
Description
What is your proposal:
Introduce a new descheduler plugin named CustomPriority that continuously rebalances workloads from high‑priority (expensive) node pools to lower‑priority (cheaper) pools based on an user‑defined priority order and resource availability.
Why is this needed:
We have user stories like below:
- When scaling down at off-peak, I want pods of the same workload on a burstable node to be drained together so the elastic node can be released, not just scale down by per-workload priority, so that I actually save cost by removing the whole elastic node.
- Given an elastic node hosting several pods from the same workload
- And static nodes have spare capacity
- When a manual scale down or an autoscaler triggers downscale
- Then the system should drain those pods together from the elastic node and reschedule them onto static nodes, enabling node release for elastic nodes
- Mixed node billing: commit (monthly) pool plus on‑demand elastic pool
- Given a cluster with a committed static pool and an on‑demand elastic pool used only for peak bursts
- When the cluster is in off‑peak and the static pool has room
- Then pods should be proactively evicted from elastic to static according to the configured EvictionOrder, minimizing on‑demand spend
This Plugin can realize:
- Cost efficiency: Proactively vacate premium nodes when cheaper capacity is available, aligning placement with business priorities.
- Better binpacking: Evicts smaller, easily‑reschedulable pods first to minimize churn and improve fit rate on destination nodes.
- Safer operations: NodeFit prevents pathological evictions; DrainNode mode performs atomic, capacity‑aware draining with virtual reservations; AutoCordon prevents immediate re‑scheduling back.
Why not HighNodeUtilization?
- The
HighNodeUtilizationplugin does similar things, it can bin-pack pods between nodes by generic utilization thresholds; but it doesn’t understand business tiers/cost. It can’t say “drain elastic/expensive nodes first.” It moves from underutilized nodes broadly, not cost-prioritized pools. - Cannot guarantee node drain: It opportunistically evicts pods to raise utilization, but won’t virtually plan full placement of all pods from a source node. Some pods on a node might be evicted but some others not, this prevent autoscaler-like components to release a node at time. We'll need to reserve target capacity virtually, evicts only when a whole node can be emptied, enabling real scale‑in.
Is there a suggested solution, if so, please add it:
The plugin runs at the Balance extension point and supports:
- Configurable EvictionOrder of resource tiers via per‑tier NodeSelectors
- Optional global NodeSelector for narrowing the working set of nodes
- Pod selection via label‑based CustomPriorityPodSelectors and EvictableNamespaces include/exclude lists
- Safety checks via NodeFit (NodeAffinity, taints/tolerations, resource fit)
- Two modes: BestEffort and DrainNode (with optional AutoCordon)
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
area/koord-deschedulerkind/proposalCreate a report to help us improveCreate a report to help us improve