Skip to content

Commit 9b62cca

Browse files
authored
Merge pull request #37217 from tengqm/tweak-device-plugins
Tweak line wrappings in the device-plugin concepts page
2 parents 924f40f + 9ad91eb commit 9b62cca

File tree

1 file changed

+54
-37
lines changed

1 file changed

+54
-37
lines changed

content/en/docs/concepts/extend-kubernetes/compute-storage-net/device-plugins.md

Lines changed: 54 additions & 37 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,8 @@
11
---
22
title: Device Plugins
3-
description: Device plugins let you configure your cluster with support for devices or resources that require vendor-specific setup, such as GPUs, NICs, FPGAs, or non-volatile main memory.
3+
description: >
4+
Device plugins let you configure your cluster with support for devices or resources that require
5+
vendor-specific setup, such as GPUs, NICs, FPGAs, or non-volatile main memory.
46
content_type: concept
57
weight: 20
68
---
@@ -33,12 +35,12 @@ service Registration {
3335
A device plugin can register itself with the kubelet through this gRPC service.
3436
During the registration, the device plugin needs to send:
3537

36-
* The name of its Unix socket.
37-
* The Device Plugin API version against which it was built.
38-
* The `ResourceName` it wants to advertise. Here `ResourceName` needs to follow the
39-
[extended resource naming scheme](/docs/concepts/configuration/manage-resources-containers/#extended-resources)
40-
as `vendor-domain/resourcetype`.
41-
(For example, an NVIDIA GPU is advertised as `nvidia.com/gpu`.)
38+
* The name of its Unix socket.
39+
* The Device Plugin API version against which it was built.
40+
* The `ResourceName` it wants to advertise. Here `ResourceName` needs to follow the
41+
[extended resource naming scheme](/docs/concepts/configuration/manage-resources-containers/#extended-resources)
42+
as `vendor-domain/resourcetype`.
43+
(For example, an NVIDIA GPU is advertised as `nvidia.com/gpu`.)
4244

4345
Following a successful registration, the device plugin sends the kubelet the
4446
list of devices it manages, and the kubelet is then in charge of advertising those
@@ -133,12 +135,12 @@ The general workflow of a device plugin includes the following steps:
133135
path `/var/lib/kubelet/device-plugins/kubelet.sock`.
134136

135137
* After successfully registering itself, the device plugin runs in serving mode, during which it keeps
136-
monitoring device health and reports back to the kubelet upon any device state changes.
137-
It is also responsible for serving `Allocate` gRPC requests. During `Allocate`, the device plugin may
138-
do device-specific preparation; for example, GPU cleanup or QRNG initialization.
139-
If the operations succeed, the device plugin returns an `AllocateResponse` that contains container
140-
runtime configurations for accessing the allocated devices. The kubelet passes this information
141-
to the container runtime.
138+
monitoring device health and reports back to the kubelet upon any device state changes.
139+
It is also responsible for serving `Allocate` gRPC requests. During `Allocate`, the device plugin may
140+
do device-specific preparation; for example, GPU cleanup or QRNG initialization.
141+
If the operations succeed, the device plugin returns an `AllocateResponse` that contains container
142+
runtime configurations for accessing the allocated devices. The kubelet passes this information
143+
to the container runtime.
142144

143145
### Handling kubelet restarts
144146

@@ -156,8 +158,7 @@ The canonical directory `/var/lib/kubelet/device-plugins` requires privileged ac
156158
so a device plugin must run in a privileged security context.
157159
If you're deploying a device plugin as a DaemonSet, `/var/lib/kubelet/device-plugins`
158160
must be mounted as a {{< glossary_tooltip term_id="volume" >}}
159-
in the plugin's
160-
[PodSpec](/docs/reference/generated/kubernetes-api/{{< param "version" >}}/#podspec-v1-core).
161+
in the plugin's [PodSpec](/docs/reference/generated/kubernetes-api/{{< param "version" >}}/#podspec-v1-core).
161162

162163
If you choose the DaemonSet approach you can rely on Kubernetes to: place the device plugin's
163164
Pod onto Nodes, to restart the daemon Pod after failure, and to help automate upgrades.
@@ -202,7 +203,8 @@ service PodResourcesLister {
202203

203204
The `List` endpoint provides information on resources of running pods, with details such as the
204205
id of exclusively allocated CPUs, device id as it was reported by device plugins and id of
205-
the NUMA node where these devices are allocated. Also, for NUMA-based machines, it contains the information about memory and hugepages reserved for a container.
206+
the NUMA node where these devices are allocated. Also, for NUMA-based machines, it contains the
207+
information about memory and hugepages reserved for a container.
206208

207209
```gRPC
208210
// ListPodResourcesResponse is the response returned by List function
@@ -273,6 +275,7 @@ conjunction with the List() endpoint. The result obtained by `GetAllocatableReso
273275
the same unless the underlying resources exposed to kubelet change. This happens rarely but when
274276
it does (for example: hotplug/hotunplug, device health changes), client is expected to call
275277
`GetAlloctableResources` endpoint.
278+
276279
However, calling `GetAllocatableResources` endpoint is not sufficient in case of cpu and/or memory
277280
update and Kubelet needs to be restarted to reflect the correct resource capacity and allocatable.
278281
{{< /note >}}
@@ -285,20 +288,22 @@ message AllocatableResourcesResponse {
285288
repeated int64 cpu_ids = 2;
286289
repeated ContainerMemory memory = 3;
287290
}
288-
289291
```
292+
290293
Starting from Kubernetes v1.23, the `GetAllocatableResources` is enabled by default.
291-
You can disable it by turning off the
292-
`KubeletPodResourcesGetAllocatable` [feature gate](/docs/reference/command-line-tools-reference/feature-gates/).
294+
You can disable it by turning off the `KubeletPodResourcesGetAllocatable`
295+
[feature gate](/docs/reference/command-line-tools-reference/feature-gates/).
293296

294297
Preceding Kubernetes v1.23, to enable this feature `kubelet` must be started with the following flag:
295298

296-
`--feature-gates=KubeletPodResourcesGetAllocatable=true`
297-
298-
`ContainerDevices` do expose the topology information declaring to which NUMA cells the device is affine.
299-
The NUMA cells are identified using a opaque integer ID, which value is consistent to what device
300-
plugins report [when they register themselves to the kubelet](/docs/concepts/extend-kubernetes/compute-storage-net/device-plugins/#device-plugin-integration-with-the-topology-manager).
299+
```
300+
--feature-gates=KubeletPodResourcesGetAllocatable=true
301+
```
301302

303+
`ContainerDevices` do expose the topology information declaring to which NUMA cells the device is
304+
affine. The NUMA cells are identified using a opaque integer ID, which value is consistent to
305+
what device plugins report
306+
[when they register themselves to the kubelet](/docs/concepts/extend-kubernetes/compute-storage-net/device-plugins/#device-plugin-integration-with-the-topology-manager).
302307

303308
The gRPC service is served over a unix socket at `/var/lib/kubelet/pod-resources/kubelet.sock`.
304309
Monitoring agents for device plugin resources can be deployed as a daemon, or as a DaemonSet.
@@ -308,15 +313,17 @@ DaemonSet, `/var/lib/kubelet/pod-resources` must be mounted as a
308313
{{< glossary_tooltip term_id="volume" >}} in the device monitoring agent's
309314
[PodSpec](/docs/reference/generated/kubernetes-api/{{< param "version" >}}/#podspec-v1-core).
310315

311-
Support for the `PodResourcesLister service` requires `KubeletPodResources` [feature gate](/docs/reference/command-line-tools-reference/feature-gates/) to be enabled.
316+
Support for the `PodResourcesLister service` requires `KubeletPodResources`
317+
[feature gate](/docs/reference/command-line-tools-reference/feature-gates/) to be enabled.
312318
It is enabled by default starting with Kubernetes 1.15 and is v1 since Kubernetes 1.20.
313319

314320
## Device plugin integration with the Topology Manager
315321

316322
{{< feature-state for_k8s_version="v1.18" state="beta" >}}
317323

318-
The Topology Manager is a Kubelet component that allows resources to be co-ordinated in a Topology aligned manner. In order to do this, the Device Plugin API was extended to include a `TopologyInfo` struct.
319-
324+
The Topology Manager is a Kubelet component that allows resources to be co-ordinated in a Topology
325+
aligned manner. In order to do this, the Device Plugin API was extended to include a
326+
`TopologyInfo` struct.
320327

321328
```gRPC
322329
message TopologyInfo {
@@ -327,11 +334,17 @@ message NUMANode {
327334
int64 ID = 1;
328335
}
329336
```
330-
Device Plugins that wish to leverage the Topology Manager can send back a populated TopologyInfo struct as part of the device registration, along with the device IDs and the health of the device. The device manager will then use this information to consult with the Topology Manager and make resource assignment decisions.
331337

332-
`TopologyInfo` supports setting a `nodes` field to either `nil` or a list of NUMA nodes. This allows the Device Plugin to advertise a device that spans multiple NUMA nodes.
338+
Device Plugins that wish to leverage the Topology Manager can send back a populated TopologyInfo
339+
struct as part of the device registration, along with the device IDs and the health of the device.
340+
The device manager will then use this information to consult with the Topology Manager and make
341+
resource assignment decisions.
342+
343+
`TopologyInfo` supports setting a `nodes` field to either `nil` or a list of NUMA nodes. This
344+
allows the Device Plugin to advertise a device that spans multiple NUMA nodes.
333345

334-
Setting `TopologyInfo` to `nil` or providing an empty list of NUMA nodes for a given device indicates that the Device Plugin does not have a NUMA affinity preference for that device.
346+
Setting `TopologyInfo` to `nil` or providing an empty list of NUMA nodes for a given device
347+
indicates that the Device Plugin does not have a NUMA affinity preference for that device.
335348

336349
An example `TopologyInfo` struct populated for a device by a Device Plugin:
337350

@@ -346,20 +359,24 @@ pluginapi.Device{ID: "25102017", Health: pluginapi.Healthy, Topology:&pluginapi.
346359
Here are some examples of device plugin implementations:
347360

348361
* The [AMD GPU device plugin](https://github.com/RadeonOpenCompute/k8s-device-plugin)
349-
* The [Intel device plugins](https://github.com/intel/intel-device-plugins-for-kubernetes) for Intel GPU, FPGA, QAT, VPU, SGX, DSA, DLB and IAA devices
350-
* The [KubeVirt device plugins](https://github.com/kubevirt/kubernetes-device-plugins) for hardware-assisted virtualization
362+
* The [Intel device plugins](https://github.com/intel/intel-device-plugins-for-kubernetes) for
363+
Intel GPU, FPGA, QAT, VPU, SGX, DSA, DLB and IAA devices
364+
* The [KubeVirt device plugins](https://github.com/kubevirt/kubernetes-device-plugins) for
365+
hardware-assisted virtualization
351366
* The [NVIDIA GPU device plugin for Container-Optimized OS](https://github.com/GoogleCloudPlatform/container-engine-accelerators/tree/master/cmd/nvidia_gpu)
352367
* The [RDMA device plugin](https://github.com/hustcat/k8s-rdma-device-plugin)
353368
* The [SocketCAN device plugin](https://github.com/collabora/k8s-socketcan)
354369
* The [Solarflare device plugin](https://github.com/vikaschoudhary16/sfc-device-plugin)
355370
* The [SR-IOV Network device plugin](https://github.com/intel/sriov-network-device-plugin)
356371
* The [Xilinx FPGA device plugins](https://github.com/Xilinx/FPGA_as_a_Service/tree/master/k8s-device-plugin) for Xilinx FPGA devices
357372

358-
359373
## {{% heading "whatsnext" %}}
360374

361-
362-
* Learn about [scheduling GPU resources](/docs/tasks/manage-gpus/scheduling-gpus/) using device plugins
363-
* Learn about [advertising extended resources](/docs/tasks/administer-cluster/extended-resource-node/) on a node
375+
* Learn about [scheduling GPU resources](/docs/tasks/manage-gpus/scheduling-gpus/) using device
376+
plugins
377+
* Learn about [advertising extended resources](/docs/tasks/administer-cluster/extended-resource-node/)
378+
on a node
364379
* Learn about the [Topology Manager](/docs/tasks/administer-cluster/topology-manager/)
365-
* Read about using [hardware acceleration for TLS ingress](/blog/2019/04/24/hardware-accelerated-ssl/tls-termination-in-ingress-controllers-using-kubernetes-device-plugins-and-runtimeclass/) with Kubernetes
380+
* Read about using [hardware acceleration for TLS ingress](/blog/2019/04/24/hardware-accelerated-ssl/tls-termination-in-ingress-controllers-using-kubernetes-device-plugins-and-runtimeclass/)
381+
with Kubernetes
382+

0 commit comments

Comments
 (0)