Kubelet UnexpectedAdmissionError after successful scheduling by hami-scheduler on non-MIG GPU

**Summary** 
When using HAMi for fractional GPU sharing on a non-MIG NVIDIA GPU, a second pod requesting a fractional GPU fails to start with an UnexpectedAdmissionError. The hami-scheduler successfully binds the pod to the node, but the kubelet fails to allocate resources.

**Steps to reproduce** 
1. Start a Kubernetes cluster with a single node containing an NVIDIA RTX A4000 GPU (or a similar non-MIG GPU).
2. Deploy the hami-scheduler and hami-device-plugin. 
3. Deploy a pod with a GPU resource request lets say 30% of SM cores (gpu-workload-long).
4. Immediately after, deploy a second pod with a GPU resource request lets say 70% of SM cores (gpu-workload-short).

**Expected Behaviour**
The second pod (gpu-workload-short) should be successfully scheduled and started on the same node, running concurrently with the first pod.

**Actual Behavior**
1. The hami-scheduler successfully evaluates the second pod and binds it to the node.
2. The kubelet on the node attempts to start the second pod but fails.
3. The pod enters an UnexpectedAdmissionError state.
4. The kubectl describe output for the failing pod shows Requested: 1, Available: 0, indicating that the hami-device-plugin reported zero available devices at the time of allocation.

**Relevant logs**
<img width="2430" height="1426" alt="Image" src="https://github.com/user-attachments/assets/efbcc418-cb4a-4545-b89b-457add3ecbd3" />

**POD YAML**
<img width="970" height="414" alt="Image" src="https://github.com/user-attachments/assets/e82c1077-0b66-4f4e-9944-897e961d7845" />

<img width="2204" height="944" alt="Image" src="https://github.com/user-attachments/assets/85164ef8-11c2-4746-a2d3-fad6919e481e" />

**Environment**
Kubernetes Version:

 * Client Version: v1.33.2
 * Server Version: v1.21.14

GPU Model: [NVIDIA RTX A4000]

Driver Version: 575.57.08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Kubelet UnexpectedAdmissionError after successful scheduling by hami-scheduler on non-MIG GPU #110

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Kubelet UnexpectedAdmissionError after successful scheduling by hami-scheduler on non-MIG GPU #110

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions