-
Notifications
You must be signed in to change notification settings - Fork 441
Description
Hi,
As of right now the GPU operator only supports only one mode of sandbox per node set through the .spec.sandboxWorkloads.defaultWorkload field, or an annotation on the node.
This makes perfect sense to segregate vgpu drivers nodes from the "standard" drivers nodes as they cannot coexist on the same kernel.
But it would make sense to be able to mix containers and PCIe passthrough on the same node as you can have N cards using the vfio-pci (and exposed to virtualization engines such as KVM), and the other cards just used by the nvidia driver to then be "forwarded" to containers using the nvidia-ctk.
I've validated (by modifying the operator resources) that this is possible, a dirty PoC works by doing these steps in-order:
- Setup the nvidia driver on the node
- Kill the
nvidia-persistenceddaemon (if present) temporarily to free the handles on the card, in order to unbind the selected cards from the nvidia driver - Run the
/bin/vfio-manage.shscript in the vfio-manager with-d(instead of-allcurrently) to bind the selected cards to the vfio driver - Run the device plugin pod as usual to discover the VFIO bound devices and expose them to Kubevirt
- Proceed as usual with the container toolkit pod setup
This present a net gain for added flexibility when using multiple GPUs per node.
Concerning the implementation we could imagine having either the selection of what each device is used for through a field in the ClusterPolicy object (map per node), or a special annotation (on each node).
If this sounds like something that could benefit users of the GPU operator I'd be willing to open a draft PR with a usable solution, let me know what you think.