-
Notifications
You must be signed in to change notification settings - Fork 20
Open
Labels
enhancementNew feature or requestNew feature or request
Description
Summary
Research and implement PoC for DRA, allocate Device ResourceSlice in none-discrete way and allow users write complex requirement expression.
Motivation
There is a limitation of Kubernetes CDI or Device Plugin, it can not retrieve Pod info on secondary device allocation after primary centralized kube-scheduler allocation, TensorFusion has to implement GPU allocation logic in very hack way, and so as for other GPU schedulers.
Reference
- https://github.com/kubernetes/enhancements/blob/master/keps%2Fsig-node%2F4381-dra-structured-parameters%2FREADME.md
- Kubelet device plugin to pass additional information about Pods to Allocate call kubernetes/kubernetes#59109
- Karpenter - Dynamic Resource Allocation (DRA) kubernetes-sigs/karpenter#1231
Another dependency is Karpenter, its provisioning and disruption modules don't support DRA yet, it may cause Karpenter integration part broken.
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request