|
| 1 | +# GPU Isolation Considerations |
| 2 | + |
| 3 | +Nimbus currently provisions GPU workloads via the Docker-based executor, but several hardening tasks remain before multi-tenant usage can be considered safe. |
| 4 | + |
| 5 | +## Current State |
| 6 | + |
| 7 | +- All GPUs are discovered via `nvidia-smi` and exposed to the executor through Docker’s `device_requests`. There is no MIG partitioning, per-job cgroup, or NVML scoping. |
| 8 | +- Containers receive `CUDA_VISIBLE_DEVICES` limited to the allocated devices, but NVML queries can still reveal global device information. |
| 9 | +- There is no admission control around MIG/MPS configuration; all jobs assume exclusive access. |
| 10 | + |
| 11 | +## Required Work |
| 12 | + |
| 13 | +1. **MIG & MPS Strategy** |
| 14 | + - Decide on sharing model (exclusive GPU vs MIG vs CUDA MPS). |
| 15 | + - Document supported configurations and required driver settings. |
| 16 | + |
| 17 | +2. **Per-job Isolation** |
| 18 | + - Configure `nvidia-container-runtime` with per-job device cgroups. |
| 19 | + - Restrict `/dev/nvidia*` device nodes to allocated instances only. |
| 20 | + - Implement NVML filtering (e.g., via container runtime args or LD_PRELOAD) to prevent topology leakage. |
| 21 | + |
| 22 | +3. **Scheduling & Labels** |
| 23 | + - Extend labels to express MIG profiles (e.g., `gpu:mig-1g.5gb`). |
| 24 | + - Ensure scheduler prevents over-commit by tracking available partitions. |
| 25 | + |
| 26 | +4. **Attestation & Monitoring** |
| 27 | + - Capture MIG/GDS state in agent telemetry. |
| 28 | + - Alert on unexpected configuration changes or high utilisation. |
| 29 | + |
| 30 | +5. **Testing** |
| 31 | + - Add integration tests that run concurrent GPU jobs ensuring isolation (no cross-job visibility). |
| 32 | + - Include red-team scenarios (bus probing, NVML enumeration, PCI scans) to verify controls. |
| 33 | + |
| 34 | +Until these items are addressed, document that GPU workloads must be run in dedicated hosts without untrusted tenants. |
0 commit comments