@@ -297,9 +297,9 @@ address limitations of the current approach for the following use cases:
297
297
298
298
- * Device initialization* : When starting a workload that uses
299
299
an accelerator like an FPGA, I’d like to have the accelerator
300
- reconfigured or reprogrammed for the workload before the workload
301
- itself starts. For security reasons, workloads should not be able to
302
- reconfigure devices directly .
300
+ reconfigured or reprogrammed without having to deploy my application
301
+ with full hardware access and/or root privileges. Running applications
302
+ with less privileges is better for overall security of the cluster .
303
303
304
304
* Limitation* : Currently, it’s impossible to specify the desired
305
305
device properties that are required for reconfiguring devices.
@@ -315,28 +315,22 @@ address limitations of the current approach for the following use cases:
315
315
316
316
* Limitation* : Post-stop actions are not supported.
317
317
318
- - * Partial allocation* : When deploying a container I’d like to be able
319
- to use part of the shareable device inside a container and other
320
- containers should be able to use other free resources on the same
321
- device.
322
-
323
- * Limitation* : For example, newer generations of NVIDIA GPUs have a mode of
324
- operation called MIG, that allow them to be sub-divided into a set of
325
- mini-GPUs (called MIG devices) with varying amounts of memory and compute
326
- resources provided by each. From a hardware-standpoint, configuring a GPU
327
- into a set of MIG devices is highly-dynamic and creating a MIG device
328
- tailored to the resource needs of a particular application is well
329
- supported. However, with the current device plugin API, the only way to make
330
- use of this feature is to pre-partition a GPU into a set of MIG devices and
331
- advertise them to the kubelet in the same way a full / static GPU is
332
- advertised. The user must then pick from this set of pre-partitioned MIG
333
- devices instead of having one created for them on the fly based on their
334
- particular resource constraints. Without the ability to create MIG devices
335
- dynamically (i.e. at the time they are requested) the set of pre-defined MIG
336
- devices must be carefully tuned to ensure that GPU resources do not go unused
337
- because some of the pre-partioned devices are in low-demand. It also puts
338
- the burden on the user to pick a particular MIG device type, rather than
339
- declaring the resource constraints more abstractly.
318
+ - * Partial allocation* : When workloads use only a portion of the device
319
+ capabilities, devices can be partitioned (e.g. using Nvidia MIG or SR-IOV) to
320
+ better match workload needs. Sharing the devices in this way can greatly
321
+ increase HW utilization / reduce costs.
322
+
323
+ - * Limitation* : currently there's no API to request partial device
324
+ allocation. With the current device plugin API, devices need to be
325
+ pre-partitioned and advertised in the same way a full / static devices
326
+ are. User must then select a pre-partitioned device instead of having one
327
+ created for them on the fly based on their particular resource
328
+ constraints. Without the ability to create devices dynamically (i.e. at the
329
+ time they are requested) the set of pre-defined devices must be carefully
330
+ tuned to ensure that device resources do not go unused because some of the
331
+ pre-partioned devices are in low-demand. It also puts the burden on the user
332
+ to pick a particular device type, rather than declaring the resource
333
+ constraints more abstractly.
340
334
341
335
- * Optional allocation* : When deploying a workload I’d like to specify
342
336
soft(optional) device requirements. If a device exists and it’s
@@ -873,8 +867,8 @@ parameters differently for reporting resource availability.
873
867
874
868
This is the result of an attack against the resource driver, either from a
875
869
container which uses a resource exposed by the driver, a compromised kubelet
876
- which interacts with the plugin, or through a successful attack against the
877
- node which led to root access .
870
+ which interacts with the plugin, or due to resource driver running on a node
871
+ with a compromised root account .
878
872
879
873
The resource driver plugin only needs read access to objects described in this
880
874
KEP, so compromising it does not interfere with dynamic resource allocation for
@@ -1161,6 +1155,15 @@ resources.
1161
1155
1162
1156
# ## API
1163
1157
1158
+ ```
1159
+ <<[ UNRESOLVED @pohly @johnbelamaric ] >>
1160
+ Before 1.31, we need to re-evaluate the API, including, but not limited to:
1161
+ - Do we really need a separate ResourceClaim?
1162
+ - Does "Device" instead of "Resource" make the API easier to understand?
1163
+ - Avoid separate parameter objects if and when possible.
1164
+ <<[ /UNRESOLVED] >>
1165
+ ```
1166
+
1164
1167
The PodSpec gets extended. To minimize the changes in core/v1, all new types
1165
1168
get defined in a new resource group. This makes it possible to revise those
1166
1169
more experimental parts of the API in the future. The new fields in the
0 commit comments