Skip to content

Conversation

@karthikvetrivel
Copy link
Member

@karthikvetrivel karthikvetrivel commented Jan 7, 2026

Description

Prevents unnecessary GPU unbind/rebind operations during rolling updates of the vfio-manager DaemonSet. Currently, k8s-driver-manager unconditionally unbinds all GPUs from vfio-pci on startup, even when the desired state is already vfio-pci. This disrupts active VM workloads using GPU passthrough (KubeVirt, Kata Containers).

This fix checks the node's nvidia.com/gpu.workload.config label to determine if the node is in VFIO mode (vm-passthrough or vm-vgpu). If so, it verifies all GPUs are already bound to vfio-pci variants before proceeding with unbind. If they're already in the correct state, the unbind operation is skipped entirely.

Testing

Scenario 1: Rolling update with no state change

  • Node label: nvidia.com/gpu.workload.config=vm-passthrough
  • GPU already bound to vfio-pci

Result: Unbind was skipped, GPU remained bound through rolling updates

time=2026-01-07T18:03:45Z level=info msg=All 1 GPUs are on vfio-pci variants
time=2026-01-07T18:03:45Z level=info msg=All GPUs already bound to vfio-pci variants, skipping unbind

Scenario 2: State transition (nvidia → vfio)

  • Node label: nvidia.com/gpu.workload.config=vm-passthrough
  • GPU unbound (no driver) (exercises the same state transition as if we were passing from NVIDIA to vfio driver)

Result: Unbind proceeded as expected

time=2026-01-07T18:37:07Z level=info msg=GPU 0000:65:00.0 is bound to  (not vfio)
time=2026-01-07T18:37:07Z level=info msg=Unbinding vfio-pci driver from all devices

@karthikvetrivel karthikvetrivel marked this pull request as draft January 7, 2026 17:09
@karthikvetrivel karthikvetrivel marked this pull request as ready for review January 7, 2026 20:56
}

func (dm *DriverManager) isVFIOWorkloadConfig() bool {
workloadConfig, err := dm.kubeClient.GetNodeLabelValue(dm.config.nodeName, gpuWorkloadConfigLabelKey)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a hole in this implementation currently. It is possible for the vfio-manager pod to run on nodes that do not have the nvidia.com/gpu.workload.config=vm-passthrough label set. For example, if users have set sandboxWorkloads.defaultWorkload=vm-passthrough in ClusterPolicy, then vfio-manager will get deployed by default on GPU nodes if the nvidia.com/gpu.workload.config label is not present.

For this implementation to be complete, we would need to take the value of sandboxWorkloads.defaultWorkload into account, similar to what is done in the validator: https://github.com/NVIDIA/gpu-operator/blob/2104c0e1ff1893012c3d72f5c09f2c345bc4313c/cmd/nvidia-validator/main.go#L477

What are your thoughts on this alternative solution -- In the gpu-operator, when we are rendering the vfio-manager daemonset, we know it will only run for the vm-passthrough use case. Could we not just pass the "workload type" as an envvar in the init container? The k8s-driver-manager code would use the newly introduced envvar to determine what to do with GPUs that are already bound to the vfio-pci driver.

return false
}

return strings.HasPrefix(workloadConfig, "vm-")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We only want to return true when the workload config is vm-passthrough (not vm-vgpu)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants