-
Notifications
You must be signed in to change notification settings - Fork 488
Description
What happened:
In pkg/scheduler/policy/gpu_policy.go, the ComputeScore function computes a score for each device to determine scheduling order. However, there are two issues in how the usedScore component is calculated:
Issue 1: container.Nums is incorrectly used as slot consumption
func (ds *DeviceListsScore) ComputeScore(requests device.ContainerDeviceRequests) {
request, core, mem := int32(0), int32(0), int32(0)
for _, container := range requests {
request += container.Nums // ← BUG: Nums is the number of GPUs requested, not slot usage per card
// ...
}
usedScore := float32(request+ds.Device.Used) / float32(ds.Device.Count)
// ...
}container.Nums represents how many GPU cards the container requests (e.g., hami.io/gpu: 4 → Nums = 4). However, when a container is allocated to a specific card, it only occupies 1 time-slicing slot on that card (as seen in AddResourceUsage where n.Used++).
The current code adds Nums (e.g., 4) to the used count, implying this single container would consume 4 slots on one card, which is incorrect.
Issue 2: No device type filtering when iterating over requests
requests is of type ContainerDeviceRequests (map[string]ContainerDeviceRequest), where the key is the device type (e.g., "NVIDIA", "DCU"). The function iterates over all device types without filtering:
for _, container := range requests { // iterates over ALL device types
request += container.Nums
core += container.Coresreq
mem += container.Memreq
}When a container requests multiple device types (e.g., 2 NVIDIA GPUs + 1 Hygon DCU), the score for a single NVIDIA GPU card would incorrectly include the DCU request's Nums, Coresreq, and Memreq.
What you expected to happen:
- Each container should contribute at most 1 to the slot usage prediction per card (not
Nums). ComputeScoreshould only accumulate requests that matchds.Device.Type, ignoring requests for other device types.
A corrected version might look like:
func (ds *DeviceListsScore) ComputeScore(requests device.ContainerDeviceRequests) {
request, core, mem := int32(0), int32(0), int32(0)
for devType, container := range requests {
if devType != ds.Device.Type {
continue // only consider same-type requests
}
request += 1 // one container occupies one slot, regardless of Nums
core += container.Coresreq
if container.MemPercentagereq != 0 && container.MemPercentagereq != 101 {
mem += ds.Device.Totalmem * (container.MemPercentagereq / 100.0)
continue
}
mem += container.Memreq
}
// ...
}How to reproduce it (as minimally and precisely as possible):
- Deploy a pod that requests multiple device types, or requests more than 1 GPU (e.g.,
hami.io/gpu: 4) - Observe the computed
usedScorein scheduler logs (log level V(2)):device GPU-xxxx computer score is <value> - The
usedScorecomponent will be inflated becauserequestequalsNums(e.g., 4) instead of 1
Anything else we need to know?:
Practical impact is limited for single-type requests
Since request is a constant added to every card's score, the relative ordering between cards of the same type is still determined by ds.Device.Used differences. So the sorting result remains correct in most cases.
However, the inflated usedScore can cause the score to exceed 1.0, which breaks the implicit normalization assumption across the three scoring dimensions (slot usage, core usage, memory usage). This may cause the usedScore to disproportionately outweigh coreScore and memScore in the final weighted sum:
ds.Score = float32(util.Weight) * (usedScore + coreScore + memScore)For example, with Nums=4, Used=2, Count=10:
- Current (incorrect):
usedScore = (4 + 2) / 10 = 0.6 - Expected:
usedScore = (1 + 2) / 10 = 0.3
The comment // Here we are required to use the same type device also acknowledges the type-filtering assumption but does not enforce it.
Environment:
- HAMi version: master branch (commit 2ca2ae1)
- Affected file:
pkg/scheduler/policy/gpu_policy.go, functionComputeScore(line 59-78)