Commit 4b0c8f3
gpu passthrough (#17)
* feat(devices): add lib/devices package types, errors, and paths
Add foundational types for GPU/PCI device passthrough:
- Device, AvailableDevice, CreateDeviceRequest structs
- Error types (ErrNotFound, ErrInUse, ErrAlreadyExists, etc.)
- Device path helpers in lib/paths
* feat(devices): add PCI device discovery and VFIO binding
Add low-level device operations:
- discovery.go: Scan PCI bus, detect IOMMU groups, identify GPU devices
- vfio.go: Bind/unbind devices to vfio-pci driver for VM passthrough
* feat(devices): add device manager core
Add the main device management logic:
- Manager interface with CRUD operations for devices
- CreateDevice, GetDevice, DeleteDevice, ListDevices
- MarkAttached/MarkDetached for instance lifecycle
- BindToVFIO/UnbindFromVFIO for driver management
- Persistence via JSON metadata files
* feat(system): add kernel/initrd NVIDIA GPU support
Add support for NVIDIA GPU passthrough in the VM boot chain:
- versions.go: Add Kernel_20251213 with NVIDIA module/driver lib URLs
- initrd.go: Download and extract NVIDIA kernel modules and driver libs
- init_script.go: Load NVIDIA modules at boot, inject driver libs into containers
This enables containers to use CUDA without bundling driver versions.
* feat(instances): add instance liveness checker for device reconciliation
Add InstanceLivenessChecker adapter to allow the devices package to query
instance state without circular imports. Used during startup to detect
orphaned device attachments from crashed VMs.
- liveness.go: Adapter implementing devices.InstanceLivenessChecker
- liveness_test.go: Unit tests
- reconcile_test.go: Device reconciliation tests
- types.go: Add Devices field to StoredMetadata and CreateInstanceRequest
* feat(instances): integrate devices with instance lifecycle
Wire up device management throughout the instance lifecycle:
- create.go: Validate devices, auto-bind to VFIO, pass to VM config
- delete.go: Detach devices, auto-unbind from VFIO
- configdisk.go: Add HAS_GPU config flag for GPU instances
- manager.go: Add deviceManager dependency
- providers.go: Add ProvideDeviceManager
- wire.go/wire_gen.go: Wire up DeviceManager in DI
- api.go: Add DeviceManager to ApiService struct
* feat(api): add devices API endpoints and documentation
Add REST API for device management and supporting documentation:
API endpoints:
- GET/POST /devices - List and register devices
- GET/DELETE /devices/{id} - Get and delete devices
- GET /devices/available - Discover passthrough-capable devices
- instances.go: Accept devices param in CreateInstance
Documentation:
- GPU.md: GPU passthrough architecture and driver injection
- README.md: Device management usage guide
- scripts/gpu-reset.sh: GPU reset utility
Tests and fixtures:
- gpu_e2e_test.go, gpu_inference_test.go, gpu_module_test.go
- testdata/ollama-cuda/ - CUDA test container
Also adds build-preview-cli Makefile target.
* test: increase VM memory to 2GB to accommodate large initrd
The initrd now includes NVIDIA kernel modules, firmware, and driver
libraries (~238MB total). With 512MB VMs, the kernel couldn't unpack
the initrd into tmpfs without running out of space.
Increase test VM memory from 512MB to 2GB to provide sufficient room
for the initrd contents plus normal VM operation.
* remove slop test
* remove outdated comment
* markattached bug
* remove preview script
* fix(configdisk): only set HAS_GPU=1 for actual GPU devices
The HAS_GPU flag was being set unconditionally when any device was
attached, regardless of device type. This would trigger NVIDIA module
loading in the VM init script even for non-GPU PCI devices.
Now iterates through attached devices and checks each device's type,
only setting HAS_GPU=1 if at least one device is DeviceTypeGPU.
* fix(devices): prevent false positive warnings for instances without GPU devices
detectSuspiciousVMMProcesses was using ListAllInstanceDevices to build the
set of known running instances, but that method only returns instances with
devices attached. This caused legitimate cloud-hypervisor processes for
instances without GPU passthrough to be incorrectly flagged as 'untracked'
with misleading advice to run gpu-reset.sh.
Fix: Call IsInstanceRunning directly for each discovered process instead of
pre-building a map from ListAllInstanceDevices. This correctly identifies
all running instances regardless of device attachment.
* devices: add startup validation warnings for GPU prerequisites
Check and warn on startup if:
- IOMMU is not enabled (no groups in /sys/kernel/iommu_groups)
- VFIO modules not loaded (vfio_pci, vfio_iommu_type1)
- Huge pages not configured (info hint when devices exist)
* instances: move detectSuspiciousVMMProcesses to liveness.go
This function is about instance lifecycle, not device management.
Moving it to the instances module where it belongs.
The implementation uses IsInstanceRunning (which queries all instances)
rather than ListAllInstanceDevices (which only returns instances with
devices) to avoid false positives for non-GPU VMs.
* system: use context loggers in initrd building
Replace fmt.Printf calls with proper context loggers so messages
appear in structured logs with consistent formatting.
---------
Co-authored-by: Rafael Garcia <[email protected]>1 parent bd634e5 commit 4b0c8f3
File tree
46 files changed
+7224
-168
lines changed- cmd/api
- api
- lib
- devices
- scripts
- testdata/ollama-cuda
- instances
- oapi
- paths
- providers
- system
Some content is hidden
Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.
46 files changed
+7224
-168
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | 1 | | |
2 | | - | |
| 2 | + | |
3 | 3 | | |
4 | 4 | | |
5 | 5 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
2 | 2 | | |
3 | 3 | | |
4 | 4 | | |
| 5 | + | |
5 | 6 | | |
6 | 7 | | |
7 | 8 | | |
| |||
17 | 18 | | |
18 | 19 | | |
19 | 20 | | |
| 21 | + | |
20 | 22 | | |
21 | 23 | | |
22 | 24 | | |
| |||
29 | 31 | | |
30 | 32 | | |
31 | 33 | | |
| 34 | + | |
32 | 35 | | |
33 | 36 | | |
34 | 37 | | |
| |||
37 | 40 | | |
38 | 41 | | |
39 | 42 | | |
| 43 | + | |
40 | 44 | | |
41 | 45 | | |
42 | 46 | | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
9 | 9 | | |
10 | 10 | | |
11 | 11 | | |
| 12 | + | |
12 | 13 | | |
13 | 14 | | |
14 | 15 | | |
| |||
34 | 35 | | |
35 | 36 | | |
36 | 37 | | |
| 38 | + | |
37 | 39 | | |
38 | 40 | | |
39 | 41 | | |
40 | 42 | | |
41 | | - | |
| 43 | + | |
42 | 44 | | |
43 | 45 | | |
44 | 46 | | |
| |||
50 | 52 | | |
51 | 53 | | |
52 | 54 | | |
| 55 | + | |
53 | 56 | | |
54 | 57 | | |
55 | 58 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
| 79 | + | |
| 80 | + | |
| 81 | + | |
| 82 | + | |
| 83 | + | |
| 84 | + | |
| 85 | + | |
| 86 | + | |
| 87 | + | |
| 88 | + | |
| 89 | + | |
| 90 | + | |
| 91 | + | |
| 92 | + | |
| 93 | + | |
| 94 | + | |
| 95 | + | |
| 96 | + | |
| 97 | + | |
| 98 | + | |
| 99 | + | |
| 100 | + | |
| 101 | + | |
| 102 | + | |
| 103 | + | |
| 104 | + | |
| 105 | + | |
| 106 | + | |
| 107 | + | |
| 108 | + | |
| 109 | + | |
| 110 | + | |
| 111 | + | |
| 112 | + | |
| 113 | + | |
| 114 | + | |
| 115 | + | |
| 116 | + | |
| 117 | + | |
| 118 | + | |
| 119 | + | |
| 120 | + | |
| 121 | + | |
| 122 | + | |
| 123 | + | |
| 124 | + | |
| 125 | + | |
| 126 | + | |
| 127 | + | |
| 128 | + | |
| 129 | + | |
| 130 | + | |
| 131 | + | |
| 132 | + | |
| 133 | + | |
| 134 | + | |
| 135 | + | |
| 136 | + | |
| 137 | + | |
| 138 | + | |
| 139 | + | |
| 140 | + | |
| 141 | + | |
| 142 | + | |
| 143 | + | |
| 144 | + | |
| 145 | + | |
| 146 | + | |
| 147 | + | |
| 148 | + | |
| 149 | + | |
| 150 | + | |
| 151 | + | |
| 152 | + | |
| 153 | + | |
| 154 | + | |
| 155 | + | |
| 156 | + | |
| 157 | + | |
| 158 | + | |
| 159 | + | |
| 160 | + | |
| 161 | + | |
| 162 | + | |
| 163 | + | |
| 164 | + | |
| 165 | + | |
| 166 | + | |
| 167 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
96 | 96 | | |
97 | 97 | | |
98 | 98 | | |
| 99 | + | |
| 100 | + | |
| 101 | + | |
| 102 | + | |
| 103 | + | |
| 104 | + | |
99 | 105 | | |
100 | 106 | | |
101 | 107 | | |
| |||
139 | 145 | | |
140 | 146 | | |
141 | 147 | | |
| 148 | + | |
142 | 149 | | |
143 | 150 | | |
144 | 151 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
172 | 172 | | |
173 | 173 | | |
174 | 174 | | |
| 175 | + | |
| 176 | + | |
| 177 | + | |
| 178 | + | |
| 179 | + | |
| 180 | + | |
| 181 | + | |
| 182 | + | |
| 183 | + | |
| 184 | + | |
| 185 | + | |
| 186 | + | |
175 | 187 | | |
176 | 188 | | |
177 | 189 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
9 | 9 | | |
10 | 10 | | |
11 | 11 | | |
| 12 | + | |
12 | 13 | | |
13 | 14 | | |
14 | 15 | | |
| |||
27 | 28 | | |
28 | 29 | | |
29 | 30 | | |
| 31 | + | |
30 | 32 | | |
31 | 33 | | |
32 | 34 | | |
| |||
44 | 46 | | |
45 | 47 | | |
46 | 48 | | |
| 49 | + | |
47 | 50 | | |
48 | 51 | | |
49 | 52 | | |
| |||
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.
0 commit comments