-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Description
Enhancement Description
Many production Kubernetes clusters blend on-demand (higher-SLA) and spot/preemptible (lower-SLA) nodes to optimize costs while maintaining reliability for critical workloads. Platform teams need a safe default that keeps most workloads away from risky capacity, while allowing specific workloads to opt-in with explicit thresholds like "SLA ≥ 95%".
Currently, NodeAffinity
supports numeric comparisons (Gt, Lt, etc.) but lacks the operational benefits that taints/tolerations provide:
- Policy orientation:
NodeAffinity
is per-pod; to keep most pods away from low-SLA nodes requires editing every workload. Taints invert control: nodes declare risk; only pods with matching tolerations may land. - Eviction semantics: Affinity has no eviction capability. Taints support
NoExecute
withtolerationSeconds
, enabling operators to drain/evict pods when a node's SLA degrades or spot instances are reclaimed. - Operational ergonomics: Centralized, node-side policy is consistent with other safety taints (e.g., disk-pressure, memory-pressure).
This enhancement extends core/v1
Toleration
to support numeric comparison operators (Lt, Le, Ge, Gt) when matching Node Taints. This preserves the well-understood safety model of taints/tolerations while enabling threshold-based placement for SLA-aware scheduling.
Benefits for DRA and AI Workloads
- Cost-reliability optimization: Bind resource claims to reliability tiers via taints with opt-in tolerations
- Stage-aware placement: Different pipeline stages can tolerate different risk levels explicitly
- Resilience after preemption: Use
NoExecute
/tolerationSeconds
for graceful drain and controlled failover - Multi-tenant fairness: Prevent monopolization of high-SLA resources by requiring explicit tolerations
- Smooth burst handling: Allow bursts to land on low-SLA pools with clear safety boundaries
The scheduler impact is limited to the existing TaintToleration Filter; no new scheduling stages or algorithms are required.
/sig/scheduling
/sig/node
/stage/alpha
/cc @ahg-g @alculquicondor @johnbelamaric @sanposhiho @kubernetes/sig-scheduling-misc
- One-line enhancement description (can be used as a release note): Add numeric comparison operators (Lt, Le, Ge, Gt) to Tolerations for SLA-based scheduling with threshold-based taint matching.
- Kubernetes Enhancement Proposal:
- Discussion Link: Allow nodes to declare failure probability/SLA kubernetes#118669
- PRs by stage and milestone:
- Alpha - v1.35
- KEP (
k/enhancements
) update PR(s): KEP-5471 Extended Toleration Operators for Threshold-Based Placement #5473 - Code (
k/k
) update PR(s): - Docs (
k/website
) update PR(s):
- KEP (
- Alpha - v1.35
Metadata
Metadata
Assignees
Labels
Type
Projects
Status