Skip to content

Support Kubernetes Dynamic Resource Allocation API #320

@Code2Life

Description

@Code2Life

Summary

Research and implement PoC for DRA, allocate Device ResourceSlice in none-discrete way and allow users write complex requirement expression.

Motivation

There is a limitation of Kubernetes CDI or Device Plugin, it can not retrieve Pod info on secondary device allocation after primary centralized kube-scheduler allocation, TensorFusion has to implement GPU allocation logic in very hack way, and so as for other GPU schedulers.

Reference

Another dependency is Karpenter, its provisioning and disruption modules don't support DRA yet, it may cause Karpenter integration part broken.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions