@@ -164,7 +164,7 @@ To use Azure Linux, you specify the OS SKU by setting `os-sku` to `AzureLinux` d
164
164
kind: DaemonSet
165
165
metadata:
166
166
name: nvidia-device-plugin-daemonset
167
- namespace: gpu-resources
167
+ namespace: kube-system
168
168
spec:
169
169
selector:
170
170
matchLabels:
@@ -173,40 +173,35 @@ To use Azure Linux, you specify the OS SKU by setting `os-sku` to `AzureLinux` d
173
173
type: RollingUpdate
174
174
template:
175
175
metadata:
176
- # Mark this pod as a critical add-on; when enabled, the critical add-on scheduler
177
- # reserves resources for critical add-on pods so that they can be rescheduled after
178
- # a failure. This annotation works in tandem with the toleration below.
179
- annotations:
180
- scheduler.alpha.kubernetes.io/critical-pod: ""
181
176
labels:
182
177
name: nvidia-device-plugin-ds
183
178
spec:
184
179
tolerations:
185
- # Allow this pod to be rescheduled while the node is in "critical add-ons only" mode.
186
- # This, along with the annotation above marks this pod as a critical add-on.
187
- - key: CriticalAddonsOnly
188
- operator: Exists
189
180
- key: nvidia.com/gpu
190
181
operator: Exists
191
182
effect: NoSchedule
192
- - key: "sku"
193
- operator: "Equal"
194
- value: "gpu"
195
- effect: "NoSchedule"
183
+ # Mark this pod as a critical add-on; when enabled, the critical add-on
184
+ # scheduler reserves resources for critical add-on pods so that they can
185
+ # be rescheduled after a failure.
186
+ # See https://kubernetes.io/docs/tasks/administer-cluster/guaranteed-scheduling-critical-addon-pods/
187
+ priorityClassName: "system-node-critical"
196
188
containers:
197
- - image: mcr.microsoft.com/oss/ nvidia/k8s-device-plugin:v0.14.1
189
+ - image: nvcr.io/ nvidia/k8s-device-plugin:v0.15.0
198
190
name: nvidia-device-plugin-ctr
191
+ env:
192
+ - name: FAIL_ON_INIT_ERROR
193
+ value: "false"
199
194
securityContext:
200
195
allowPrivilegeEscalation: false
201
196
capabilities:
202
197
drop: ["ALL"]
203
198
volumeMounts:
204
- - name: device-plugin
205
- mountPath: /var/lib/kubelet/device-plugins
206
- volumes:
207
199
- name: device-plugin
208
- hostPath:
209
- path: /var/lib/kubelet/device-plugins
200
+ mountPath: /var/lib/kubelet/device-plugins
201
+ volumes:
202
+ - name: device-plugin
203
+ hostPath:
204
+ path: /var/lib/kubelet/device-plugins
210
205
```
211
206
212
207
3. Create the DaemonSet and confirm the NVIDIA device plugin is created successfully using the [`kubectl apply`][kubectl-apply] command.
@@ -499,7 +494,7 @@ To see the GPU in action, you can schedule a GPU-enabled workload with the appro
499
494
[kubectl-create]: https://kubernetes.io/docs/reference/generated/kubectl/kubectl-commands#create
500
495
[azure-pricing]: https://azure.microsoft.com/pricing/
501
496
[azure-availability]: https://azure.microsoft.com/global-infrastructure/services/
502
- [nvidia-github]: https://github.com/NVIDIA/k8s-device-plugin
497
+ [nvidia-github]: https://github.com/NVIDIA/k8s-device-plugin/blob/4b3d6b0a6613a3672f71ea4719fd8633eaafb4f3/deployments/static/nvidia-device-plugin.yml
503
498
504
499
<!-- LINKS - internal -->
505
500
[az-aks-create]: /cli/azure/aks#az_aks_create
0 commit comments