-
Notifications
You must be signed in to change notification settings - Fork 36
Open
Labels
enhancementNew feature or requestNew feature or request
Description
What you would like to be added?
We would like Grove to add first-class support for Kubernetes’ new Workload API kubernetes/enhancements#4671 as a gang scheduling backend.
Concretely:
- Convert PodCliqueSet into a native Workload
- Attach workloadRef to generated Pods
- Use upstream kube-scheduler’s native gang admission logic
- Provide a configuration option (e.g., schedulerMode: workload vs podgang)
- Allow Grove workflows to operate fully without depending on KAI Scheduler
This would allow Grove to orchestrate multi-role workloads using kube-scheduler + Workload as the underlying gang scheduler.
Why is this needed?
- Upstream Kubernetes has standardized gang scheduling via Workload. Grove should evolve alongside the Kubernetes ecosystem.
- Running additional schedulers like KAI is operationally expensive for large production clusters. Relying on the default kube-scheduler greatly simplifies adoption.
- Workload already provides the gang semantics Grove needs, including atomic admission and minMember guarantees.
- Training workloads (VCJob, PyTorchJob, TFJob, etc.) increasingly rely on Workload, especially when combined with Kueue. Native Workload support makes Grove far more practical for training-focused environments.
- Reducing scheduling stack fragmentation improves maintainability and encourages broader adoption of Grove in enterprise clusters.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request