compute device score bug

**What happened**:

In `pkg/scheduler/policy/gpu_policy.go`, the `ComputeScore` function computes a score for each device to determine scheduling order. However, there are two issues in how the `usedScore` component is calculated:

### Issue 1: `container.Nums` is incorrectly used as slot consumption

```go
func (ds *DeviceListsScore) ComputeScore(requests device.ContainerDeviceRequests) {
    request, core, mem := int32(0), int32(0), int32(0)
    for _, container := range requests {
        request += container.Nums  // ← BUG: Nums is the number of GPUs requested, not slot usage per card
        // ...
    }
    usedScore := float32(request+ds.Device.Used) / float32(ds.Device.Count)
    // ...
}
```

`container.Nums` represents how many GPU cards the container requests (e.g., `hami.io/gpu: 4` → `Nums = 4`). However, when a container is allocated to a specific card, it only occupies **1 time-slicing slot** on that card (as seen in `AddResourceUsage` where `n.Used++`).

The current code adds `Nums` (e.g., 4) to the used count, implying this single container would consume 4 slots on one card, which is incorrect.

### Issue 2: No device type filtering when iterating over requests

`requests` is of type `ContainerDeviceRequests` (`map[string]ContainerDeviceRequest`), where the key is the device type (e.g., `"NVIDIA"`, `"DCU"`). The function iterates over **all** device types without filtering:

```go
for _, container := range requests {  // iterates over ALL device types
    request += container.Nums
    core += container.Coresreq
    mem += container.Memreq
}
```

When a container requests multiple device types (e.g., 2 NVIDIA GPUs + 1 Hygon DCU), the score for a single NVIDIA GPU card would incorrectly include the DCU request's `Nums`, `Coresreq`, and `Memreq`.

**What you expected to happen**:

1. Each container should contribute at most **1** to the slot usage prediction per card (not `Nums`).
2. `ComputeScore` should only accumulate requests that match `ds.Device.Type`, ignoring requests for other device types.

A corrected version might look like:

```go
func (ds *DeviceListsScore) ComputeScore(requests device.ContainerDeviceRequests) {
    request, core, mem := int32(0), int32(0), int32(0)
    for devType, container := range requests {
        if devType != ds.Device.Type {
            continue  // only consider same-type requests
        }
        request += 1  // one container occupies one slot, regardless of Nums
        core += container.Coresreq
        if container.MemPercentagereq != 0 && container.MemPercentagereq != 101 {
            mem += ds.Device.Totalmem * (container.MemPercentagereq / 100.0)
            continue
        }
        mem += container.Memreq
    }
    // ...
}
```

**How to reproduce it (as minimally and precisely as possible)**:

1. Deploy a pod that requests multiple device types, or requests more than 1 GPU (e.g., `hami.io/gpu: 4`)
2. Observe the computed `usedScore` in scheduler logs (log level V(2)):
   ```
   device GPU-xxxx computer score is <value>
   ```
3. The `usedScore` component will be inflated because `request` equals `Nums` (e.g., 4) instead of 1

**Anything else we need to know?**:

### Practical impact is limited for single-type requests

Since `request` is a constant added to every card's score, the **relative ordering** between cards of the same type is still determined by `ds.Device.Used` differences. So the sorting result remains correct in most cases.

However, the inflated `usedScore` can cause the score to exceed `1.0`, which breaks the implicit normalization assumption across the three scoring dimensions (slot usage, core usage, memory usage). This may cause the `usedScore` to disproportionately outweigh `coreScore` and `memScore` in the final weighted sum:

```go
ds.Score = float32(util.Weight) * (usedScore + coreScore + memScore)
```

For example, with `Nums=4`, `Used=2`, `Count=10`:
- Current (incorrect): `usedScore = (4 + 2) / 10 = 0.6`
- Expected: `usedScore = (1 + 2) / 10 = 0.3`

The comment `// Here we are required to use the same type device` also acknowledges the type-filtering assumption but does not enforce it.

**Environment**:
- HAMi version: master branch (commit 2ca2ae1)
- Affected file: `pkg/scheduler/policy/gpu_policy.go`, function `ComputeScore` (line 59-78)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

compute device score bug #1699

Issue 1: `container.Nums` is incorrectly used as slot consumption

Issue 2: No device type filtering when iterating over requests

Practical impact is limited for single-type requests

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

compute device score bug #1699

Description

Issue 1: container.Nums is incorrectly used as slot consumption

Issue 2: No device type filtering when iterating over requests

Practical impact is limited for single-type requests

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Issue 1: `container.Nums` is incorrectly used as slot consumption