Kubernetes native way to handle VGPU tokens and license validation.
NVIDIA vGPUs fetch licenses through JWTs via the gridd service. While the NVIDIA GPU Operator facilitates licensing for its own driver deployments via mounted ConfigMaps, it doesn't solve the problem for systems with pre-installed vGPU drivers. In these environments, there's no inherent mechanism to automatically update expired JWTs, forcing manual intervention on every node. This project addresses this specific gap by offering a Kubernetes-native solution to streamline the token update process for these pre-installed driver setups.
Install tools
VGPU token operator is deployed using a helm chart which can be found in charts/
Absolutely Required:
- Kubernetes cluster
- GPU operator deployed
Things you probably have if you're looking at this project.
- Valid vgpu token with a corresponding license server
- VGPU drivers on host hyper-visor
- VM Image with VGPU driver installed
To deploy helm chart to your cluster during development run
NOTE: set OCI_REPOSITORY
to a repository that you have push access to
make helm-install-snapshot
- Create the secret
NOTE: It is critical that the key for the token value is client_configuration_token.tok
. Otherwise the mounts for the daemonset will fail.
apiVersion: v1
kind: Secret
metadata:
name: client-config-token
namespace: vgpu-system
stringData:
client_configuration_token.tok: "${VGPU_TOKEN_VALUE}"
- Create the VGPUToken object, setting
tokenSecretRef
to the same name as the secret created above
apiVersion: vgpu-token.nutanix.com/v1alpha1
kind: VGPUToken
metadata:
name: vgpu-token
namespace: vgpu-system
spec:
tokenSecretRef:
name: client-config-token
After creating these resources the token secret should mount on the host at /etc/nvidia/ClientConfigToken/client_configuration_token.tok
Finally, we can verify the license status by creating a pod to verify the license status by running nvidia-smi
and checking the license status
apiVersion: v1
kind: Pod
metadata:
generateName: gpu-pod-
labels:
test: gpu-pod
spec:
restartPolicy: OnFailure
containers:
- name: gpu-pod
image: nvcr.io/nvidia/k8s/cuda-sample:vectoradd-cuda12.5.0
command: ["nvidia-smi", "-q"]
resources:
limits:
nvidia.com/gpu: 1
nodeSelector:
"nvidia.com/gpu.present": "true"
tolerations:
- key: "nvidia.com/gpu"
operator: "Exists"
effect: "NoSchedule"
In the pod logs, you should see the product is Licensed.
vGPU Software Licensed Product
Product Name : NVIDIA Virtual Compute Server
License Status : Licensed (Expiry: 2025-6-5 15:22:24 GMT)