Skip to content

Commit 278cd5c

Browse files
klueskapohly
authored andcommitted
dynamic resource allocation: more specific motivation for partial allocation
NVIDIA cards in MIG mode are limited by the current need to pre-partition the hardware.
1 parent bc3ccc7 commit 278cd5c

File tree

1 file changed

+17
-4
lines changed
  • keps/sig-node/3063-dynamic-resource-allocation

1 file changed

+17
-4
lines changed

keps/sig-node/3063-dynamic-resource-allocation/README.md

Lines changed: 17 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -250,10 +250,23 @@ limitations of the current approach for the following use cases:
250250
containers should be able to use other free resources on the same
251251
device.
252252

253-
*Limitation*: Current implementation of the device plugin doesn’t
254-
allow one to allocate part of the device because parameters are too limited
255-
and Kubernetes doesn't have enough information about the extended
256-
resources on a node to decide whether they can be shared.
253+
*Limitation*: For example, newer generations of NVIDIA GPUs have a mode of
254+
operation called MIG, that allow them to be sub-divided into a set of
255+
mini-GPUs (called MIG devices) with varying amounts of memory and compute
256+
resources provided by each. From a hardware-standpoint, configuring a GPU
257+
into a set of MIG devices is highly-dynamic and creating a MIG device
258+
tailored to the resource needs of a particular application is well
259+
supported. However, with the current device plugin API, the only way to make
260+
use of this feature is to pre-partition a GPU into a set of MIG devices and
261+
advertise them to the kubelet in the same way a full / static GPU is
262+
advertised. The user must then pick from this set of pre-partitioned MIG
263+
devices instead of having one created for them on the fly based on their
264+
particular resource constraints. Without the ability to create MIG devices
265+
dynamically (i.e. at the time they are requested) the set of pre-defined MIG
266+
devices must be carefully tuned to ensure that GPU resources do not go unused
267+
because some of the pre-partioned devices are in low-demand. It also puts
268+
the burden on the user to pick a particular MIG device type, rather than
269+
declaring the resource constraints more abstractly.
257270

258271
- *Optional allocation*: When deploying a workload I’d like to specify
259272
soft(optional) device requirements. If a device exists and it’s

0 commit comments

Comments
 (0)