You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: content/en/docs/concepts/extend-kubernetes/compute-storage-net/device-plugins.md
+54-37Lines changed: 54 additions & 37 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,6 +1,8 @@
1
1
---
2
2
title: Device Plugins
3
-
description: Device plugins let you configure your cluster with support for devices or resources that require vendor-specific setup, such as GPUs, NICs, FPGAs, or non-volatile main memory.
3
+
description: >
4
+
Device plugins let you configure your cluster with support for devices or resources that require
5
+
vendor-specific setup, such as GPUs, NICs, FPGAs, or non-volatile main memory.
4
6
content_type: concept
5
7
weight: 20
6
8
---
@@ -33,12 +35,12 @@ service Registration {
33
35
A device plugin can register itself with the kubelet through this gRPC service.
34
36
During the registration, the device plugin needs to send:
35
37
36
-
* The name of its Unix socket.
37
-
* The Device Plugin API version against which it was built.
38
-
* The `ResourceName` it wants to advertise. Here `ResourceName` needs to follow the
in the plugin's [PodSpec](/docs/reference/generated/kubernetes-api/{{< param "version" >}}/#podspec-v1-core).
161
162
162
163
If you choose the DaemonSet approach you can rely on Kubernetes to: place the device plugin's
163
164
Pod onto Nodes, to restart the daemon Pod after failure, and to help automate upgrades.
@@ -202,7 +203,8 @@ service PodResourcesLister {
202
203
203
204
The `List` endpoint provides information on resources of running pods, with details such as the
204
205
id of exclusively allocated CPUs, device id as it was reported by device plugins and id of
205
-
the NUMA node where these devices are allocated. Also, for NUMA-based machines, it contains the information about memory and hugepages reserved for a container.
206
+
the NUMA node where these devices are allocated. Also, for NUMA-based machines, it contains the
207
+
information about memory and hugepages reserved for a container.
206
208
207
209
```gRPC
208
210
// ListPodResourcesResponse is the response returned by List function
@@ -273,6 +275,7 @@ conjunction with the List() endpoint. The result obtained by `GetAllocatableReso
273
275
the same unless the underlying resources exposed to kubelet change. This happens rarely but when
274
276
it does (for example: hotplug/hotunplug, device health changes), client is expected to call
275
277
`GetAlloctableResources` endpoint.
278
+
276
279
However, calling `GetAllocatableResources` endpoint is not sufficient in case of cpu and/or memory
277
280
update and Kubelet needs to be restarted to reflect the correct resource capacity and allocatable.
`ContainerDevices` do expose the topology information declaring to which NUMA cells the device is affine.
299
-
The NUMA cells are identified using a opaque integer ID, which value is consistent to what device
300
-
plugins report [when they register themselves to the kubelet](/docs/concepts/extend-kubernetes/compute-storage-net/device-plugins/#device-plugin-integration-with-the-topology-manager).
`ContainerDevices` do expose the topology information declaring to which NUMA cells the device is
304
+
affine. The NUMA cells are identified using a opaque integer ID, which value is consistent to
305
+
what device plugins report
306
+
[when they register themselves to the kubelet](/docs/concepts/extend-kubernetes/compute-storage-net/device-plugins/#device-plugin-integration-with-the-topology-manager).
302
307
303
308
The gRPC service is served over a unix socket at `/var/lib/kubelet/pod-resources/kubelet.sock`.
304
309
Monitoring agents for device plugin resources can be deployed as a daemon, or as a DaemonSet.
@@ -308,15 +313,17 @@ DaemonSet, `/var/lib/kubelet/pod-resources` must be mounted as a
308
313
{{< glossary_tooltip term_id="volume" >}} in the device monitoring agent's
Support for the `PodResourcesLister service` requires `KubeletPodResources`[feature gate](/docs/reference/command-line-tools-reference/feature-gates/) to be enabled.
316
+
Support for the `PodResourcesLister service` requires `KubeletPodResources`
317
+
[feature gate](/docs/reference/command-line-tools-reference/feature-gates/) to be enabled.
312
318
It is enabled by default starting with Kubernetes 1.15 and is v1 since Kubernetes 1.20.
313
319
314
320
## Device plugin integration with the Topology Manager
The Topology Manager is a Kubelet component that allows resources to be co-ordinated in a Topology aligned manner. In order to do this, the Device Plugin API was extended to include a `TopologyInfo` struct.
319
-
324
+
The Topology Manager is a Kubelet component that allows resources to be co-ordinated in a Topology
325
+
aligned manner. In order to do this, the Device Plugin API was extended to include a
326
+
`TopologyInfo` struct.
320
327
321
328
```gRPC
322
329
message TopologyInfo {
@@ -327,11 +334,17 @@ message NUMANode {
327
334
int64 ID = 1;
328
335
}
329
336
```
330
-
Device Plugins that wish to leverage the Topology Manager can send back a populated TopologyInfo struct as part of the device registration, along with the device IDs and the health of the device. The device manager will then use this information to consult with the Topology Manager and make resource assignment decisions.
331
337
332
-
`TopologyInfo` supports setting a `nodes` field to either `nil` or a list of NUMA nodes. This allows the Device Plugin to advertise a device that spans multiple NUMA nodes.
338
+
Device Plugins that wish to leverage the Topology Manager can send back a populated TopologyInfo
339
+
struct as part of the device registration, along with the device IDs and the health of the device.
340
+
The device manager will then use this information to consult with the Topology Manager and make
341
+
resource assignment decisions.
342
+
343
+
`TopologyInfo` supports setting a `nodes` field to either `nil` or a list of NUMA nodes. This
344
+
allows the Device Plugin to advertise a device that spans multiple NUMA nodes.
333
345
334
-
Setting `TopologyInfo` to `nil` or providing an empty list of NUMA nodes for a given device indicates that the Device Plugin does not have a NUMA affinity preference for that device.
346
+
Setting `TopologyInfo` to `nil` or providing an empty list of NUMA nodes for a given device
347
+
indicates that the Device Plugin does not have a NUMA affinity preference for that device.
335
348
336
349
An example `TopologyInfo` struct populated for a device by a Device Plugin:
* The [KubeVirt device plugins](https://github.com/kubevirt/kubernetes-device-plugins) for
365
+
hardware-assisted virtualization
351
366
* The [NVIDIA GPU device plugin for Container-Optimized OS](https://github.com/GoogleCloudPlatform/container-engine-accelerators/tree/master/cmd/nvidia_gpu)
352
367
* The [RDMA device plugin](https://github.com/hustcat/k8s-rdma-device-plugin)
353
368
* The [SocketCAN device plugin](https://github.com/collabora/k8s-socketcan)
354
369
* The [Solarflare device plugin](https://github.com/vikaschoudhary16/sfc-device-plugin)
355
370
* The [SR-IOV Network device plugin](https://github.com/intel/sriov-network-device-plugin)
356
371
* The [Xilinx FPGA device plugins](https://github.com/Xilinx/FPGA_as_a_Service/tree/master/k8s-device-plugin) for Xilinx FPGA devices
357
372
358
-
359
373
## {{% heading "whatsnext" %}}
360
374
361
-
362
-
* Learn about [scheduling GPU resources](/docs/tasks/manage-gpus/scheduling-gpus/) using device plugins
363
-
* Learn about [advertising extended resources](/docs/tasks/administer-cluster/extended-resource-node/) on a node
375
+
* Learn about [scheduling GPU resources](/docs/tasks/manage-gpus/scheduling-gpus/) using device
376
+
plugins
377
+
* Learn about [advertising extended resources](/docs/tasks/administer-cluster/extended-resource-node/)
378
+
on a node
364
379
* Learn about the [Topology Manager](/docs/tasks/administer-cluster/topology-manager/)
365
-
* Read about using [hardware acceleration for TLS ingress](/blog/2019/04/24/hardware-accelerated-ssl/tls-termination-in-ingress-controllers-using-kubernetes-device-plugins-and-runtimeclass/) with Kubernetes
380
+
* Read about using [hardware acceleration for TLS ingress](/blog/2019/04/24/hardware-accelerated-ssl/tls-termination-in-ingress-controllers-using-kubernetes-device-plugins-and-runtimeclass/)
0 commit comments