Add vGPU support #52

sjmiller609 · 2026-01-05T16:50:04Z

No description provided.

github-actions · 2026-01-05T16:50:27Z

✱ Stainless preview builds

This PR will update the hypeman SDKs with the following commit message.

feat: Add vGPU support

Edit this comment to update it. It will appear in the SDK's changelogs.

✅ hypeman-typescript studio · code · diff

Your SDK built successfully.
generate ⚠️ → build ✅ → lint ✅ → test ✅
npm install https://pkg.stainless.com/s/hypeman-typescript/322e702cedb32a7b5119490d8b124481f18fbd6d/dist.tar.gz

✅ hypeman-go studio · code · diff

Your SDK built successfully.
generate ⚠️ → lint ✅ → test ✅
go get github.com/stainless-sdks/hypeman-go@c068f9b7358502cd722e6cc30a654c07044c0d53

⏳ hypeman-cli studio · conflict

⏳ These are partial results; builds are still running.

This comment is auto-generated by GitHub Actions and is automatically kept up to date as you push.
Last updated: 2026-01-05 22:28:29 UTC

tembo · 2026-01-05T16:51:44Z

lib/devices/mdev.go

+		if err != nil {
+			continue
+		}
+		instances, err := strconv.Atoi(strings.TrimSpace(string(data)))


The error from DestroyMdev is being silently ignored during cleanup. Consider logging this error to aid debugging when orphaned mdevs fail to clean up.

Suggested change

instances, err := strconv.Atoi(strings.TrimSpace(string(data)))

if err := DestroyMdev(mdev.UUID); err != nil {

// Log but continue - best effort cleanup

fmt.Fprintf(os.Stderr, "failed to destroy orphaned mdev %s: %v\n", mdev.UUID, err)

continue

}

tembo · 2026-01-05T16:51:52Z

lib/devices/mdev.go

+		for i, p := range parts {
+			if strings.HasPrefix(p, "0000:") && i+1 < len(parts) && parts[i+1] == uuid {
+				vfAddress = p
+				break


Consider wrapping this error with additional context about which VF was being targeted - this will help debugging when mdev creation fails in production.

Suggested change

break

if err := os.WriteFile(createPath, []byte(mdevUUID), 0200); err != nil {

return nil, fmt.Errorf("create mdev on VF %s: %w", targetVF, err)

}

tembo · 2026-01-05T16:51:59Z

lib/devices/mdev.go

+}
+
+// getProfileNameFromType resolves internal type (nvidia-556) to profile name (L40S-1Q)
+func getProfileNameFromType(profileType, vfAddress string) string {


The error from mdevctl undefine is silently discarded. While this is "best effort", if it fails unexpectedly (e.g., mdevctl not installed), subsequent sysfs removal might also fail. Consider logging the error at debug level if mdevctl is available but fails.

tembo · 2026-01-05T16:52:08Z

lib/instances/create.go

+			log.ErrorContext(ctx, "failed to create mdev", "profile", req.GPU.Profile, "error", err)
+			return nil, fmt.Errorf("create vGPU mdev for profile %s: %w", req.GPU.Profile, err)
+		}
+		gpuProfile = req.GPU.Profile


There's a potential race condition here: if multiple instances request the same profile concurrently, both could succeed at CreateMdev targeting the same available VF before either completes. Consider adding a mutex or using file-based locking around mdev creation to prevent this.

tembo · 2026-01-05T16:52:18Z

lib/devices/mdev.go

+		})
+	}
+
+	return vfs, nil


This calls ListMdevDevices() for every VF during discovery, which could result in O(n*m) operations where n is VFs and m is mdevs. Consider listing mdevs once and building a lookup map to improve performance on hosts with many VFs.

there is some performance issue when I try to call the resources endpoint so this is probably why, needs investigating

tembo · 2026-01-05T16:52:28Z

lib/resources/gpu.go

+	}
+
+	switch mode {
+	case devices.GPUModeVGPU:


DetectHostGPUMode() internally calls DiscoverAvailableDevices() which does filesystem I/O. Then for vGPU mode, getVGPUStatus() calls DiscoverVFs() doing more I/O. Consider caching the mode detection result or combining detection with status retrieval to reduce redundant syscalls on every /resources API call.

lib/resources/gpu.go

tembo bot approved these changes Jan 5, 2026

View reviewed changes

Base automatically changed from resources to main January 5, 2026 22:05

sjmiller609 added 3 commits January 5, 2026 22:10

Add vGPU support

cd178a5

Add logging, safer orphan cleanup

a613122

Add test

df18ef1

sjmiller609 force-pushed the vgpu branch from 0a7321e to df18ef1 Compare January 5, 2026 22:10

Fix performance issue with looking up profile types

6f74c81

sjmiller609 commented Jan 5, 2026

View reviewed changes

lib/resources/gpu.go Show resolved Hide resolved

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add vGPU support #52

Add vGPU support #52

Uh oh!

sjmiller609 commented Jan 5, 2026

Uh oh!

github-actions bot commented Jan 5, 2026 •

edited

Loading

Uh oh!

tembo bot Jan 5, 2026

Uh oh!

tembo bot Jan 5, 2026

Uh oh!

tembo bot Jan 5, 2026

Uh oh!

tembo bot Jan 5, 2026

Uh oh!

tembo bot Jan 5, 2026

Uh oh!

sjmiller609 Jan 5, 2026

Uh oh!

tembo bot Jan 5, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

-		instances, err := strconv.Atoi(strings.TrimSpace(string(data)))
+		if err := DestroyMdev(mdev.UUID); err != nil {
+			// Log but continue - best effort cleanup
+			fmt.Fprintf(os.Stderr, "failed to destroy orphaned mdev %s: %v\n", mdev.UUID, err)
+			continue
+		}

-				break
+	if err := os.WriteFile(createPath, []byte(mdevUUID), 0200); err != nil {
+		return nil, fmt.Errorf("create mdev on VF %s: %w", targetVF, err)
+	}

Add vGPU support #52

Are you sure you want to change the base?

Add vGPU support #52

Uh oh!

Conversation

sjmiller609 commented Jan 5, 2026

Uh oh!

github-actions bot commented Jan 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✱ Stainless preview builds

Uh oh!

tembo bot Jan 5, 2026

Choose a reason for hiding this comment

Uh oh!

tembo bot Jan 5, 2026

Choose a reason for hiding this comment

Uh oh!

tembo bot Jan 5, 2026

Choose a reason for hiding this comment

Uh oh!

tembo bot Jan 5, 2026

Choose a reason for hiding this comment

Uh oh!

tembo bot Jan 5, 2026

Choose a reason for hiding this comment

Uh oh!

sjmiller609 Jan 5, 2026

Choose a reason for hiding this comment

Uh oh!

tembo bot Jan 5, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

github-actions bot commented Jan 5, 2026 •

edited

Loading