GPU load balancing #66

sjmiller609 · 2026-01-22T18:30:15Z

Note

Introduces VRAM-aware vGPU allocation with TTL-cached profile metadata and a configurable cache TTL.

Adds TTL-based caching for GPU profile metadata in devices/mdev.go (SetGPUProfileCacheTTL, getCachedProfiles) and parses framebuffer sizes for profiles
Implements VRAM usage calculation and least-loaded GPU selection (calculateGPUVRAMUsage, selectLeastLoadedVF); CreateMdev now picks a VF from the least-loaded GPU
Speeds up and refines profile availability counting by grouping VFs per parent and summing per-VF available_instances
Adds GPU_PROFILE_CACHE_TTL to config and wires it in main.go via devices.SetGPUProfileCacheTTL
Minor import/order/test formatting tweaks

^{Written by Cursor Bugbot for commit d7e7aaa. This will update automatically on new commits. Configure here.}

sjmiller609 · 2026-01-22T18:50:46Z

looks good on staging:

+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 580.105.06             Driver Version: 580.105.06     CUDA Version: N/A      |
+-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA L40S                    On  |   00000000:82:00.0 Off |                    0 |
| N/A   18C    P8             36W /  350W |    3649MiB /  46068MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   1  NVIDIA L40S                    On  |   00000000:E3:00.0 Off |                    0 |
| N/A   18C    P8             35W /  350W |    3649MiB /  46068MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI              PID   Type   Process name                        GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A           13921    C+G   vgpu                                   1824MiB |
|    0   N/A  N/A           14018    C+G   vgpu                                   1824MiB |
|    1   N/A  N/A           13966    C+G   vgpu                                   1824MiB |
|    1   N/A  N/A           14068    C+G   vgpu                                   1824MiB |
+-----------------------------------------------------------------------------------------+
root@dev-yul-hypeman-1:~# hypeman resources
RESOURCE   CAPACITY       EFFECTIVE      ALLOCATED      AVAILABLE      OVERSUB
---------------------------------------------------------------------------
cpu        128            512            32             480            4.0x
memory     377.6 GB       377.6 GB       44.0 GB        333.6 GB       1.0x
disk       1.7 TB         1.7 TB         45.2 GB        1.7 TB         1.0x
network    1.2 Gbps       2.5 Gbps       15 Mbps        2.5 Gbps       2.0x

GPU: vgpu mode (4/64 slots used)
PROFILE        VRAM       AVAILABLE
----------------------------------------
NVIDIA L40S-1B 1.0 GB     0
NVIDIA L40S-2B 2.0 GB     60
NVIDIA L40S-1Q 1.0 GB     0
NVIDIA L40S-2Q 2.0 GB     60
NVIDIA L40S-3Q 3.0 GB     0
NVIDIA L40S-4Q 4.0 GB     0
NVIDIA L40S-6Q 6.0 GB     0
NVIDIA L40S-8Q 8.0 GB     0
NVIDIA L40S-12Q 12.0 GB    0
NVIDIA L40S-16Q 16.0 GB    0
NVIDIA L40S-24Q 24.0 GB    0
NVIDIA L40S-48Q 48.0 GB    0
NVIDIA L40S-1A 1.0 GB     0
NVIDIA L40S-2A 2.0 GB     60

above - we see 3649MiB / 46068MiB on both GPUs, so both are getting scheduled to evenly.
Also, we only see the 2Gb VRAM profiles as being available after we deploy one, correct for "non-heterogenous" gpu ode which doesn't allowing mixing profile sizes.

hiroTamada

Solid implementation of GPU-aware load balancing. The TTL-based caching with double-checked locking is well done, and the VRAM-based selection heuristic is a reasonable approach. One minor nit about the config wiring, but nothing blocking.

lib/devices/mdev.go

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.}

cursor · 2026-01-22T19:36:37Z

lib/devices/mdev.go

+		parentGPU := vfToParent[mdev.VFAddress]
+		if parentGPU == "" {
+			continue
+		}


VFs without parent GPU have VRAM usage ignored

Medium Severity

In calculateGPUVRAMUsage, mdevs on VFs with empty ParentGPU are skipped (if parentGPU == "" { continue }), so their VRAM is never counted. However, in selectLeastLoadedVF, these same VFs ARE included in allGPUs and freeVFsByGPU for selection. This means VFs without a physfn symlink are grouped under an empty-string "GPU" that always appears to have 0 VRAM usage, making them preferentially selected even when they already have active mdevs. This could cause load imbalance.

Additional Locations (1)

lib/devices/mdev.go#L448-L455

GPU load balancing

8e19d45

sjmiller609 requested a review from hiroTamada January 22, 2026 18:51

hiroTamada approved these changes Jan 22, 2026

View reviewed changes

lib/devices/mdev.go Outdated Show resolved Hide resolved

Use config

d7e7aaa

sjmiller609 merged commit 1616beb into main Jan 22, 2026
4 checks passed

sjmiller609 deleted the load-balance-gpu branch January 22, 2026 19:35

cursor bot reviewed Jan 22, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GPU load balancing #66

GPU load balancing #66

Uh oh!

sjmiller609 commented Jan 22, 2026 •

edited by cursor bot

Loading

Uh oh!

sjmiller609 commented Jan 22, 2026

Uh oh!

hiroTamada left a comment

Uh oh!

Uh oh!

Uh oh!

cursor bot left a comment

Uh oh!

cursor bot Jan 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

GPU load balancing #66

GPU load balancing #66

Uh oh!

Conversation

sjmiller609 commented Jan 22, 2026 • edited by cursor bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sjmiller609 commented Jan 22, 2026

Uh oh!

hiroTamada left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

cursor bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor bot Jan 22, 2026

Choose a reason for hiding this comment

VFs without parent GPU have VRAM usage ignored

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

sjmiller609 commented Jan 22, 2026 •

edited by cursor bot

Loading