Skip to content

Commit 490f3bb

Browse files
committed
Documentation for the DRA Partitionable Devices feature
1 parent 23f39cd commit 490f3bb

File tree

2 files changed

+78
-1
lines changed

2 files changed

+78
-1
lines changed

content/en/docs/concepts/scheduling-eviction/dynamic-resource-allocation.md

Lines changed: 61 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -258,7 +258,7 @@ real time changes of the state of the device.
258258
When the feature is disabled, that field automatically gets cleared when storing the ResourceClaim.
259259

260260
A ResourceClaim device status is supported when it is possible, from a DRA driver, to update an
261-
existing ResourceClaim where the `status.devices` field is set.
261+
existing ResourceClaim where the `status.devices` field is set.
262262

263263
## Prioritized List
264264

@@ -304,6 +304,59 @@ spec:
304304
count: 2
305305
```
306306

307+
## Partitionable Devices
308+
309+
{{< feature-state feature_gate_name="DRAPartitionableDevices" >}}
310+
311+
Devices represented in DRA don't necessarily have to be a single unit connected to a single machine,
312+
but can also be a logical device comprised of multiple devices connected to multiple machines. These
313+
devices might consume overlapping resources of the underlying phyical devices, meaning that when one
314+
logical device is allocated other devices will no longer be available.
315+
316+
In the ResourceSlice API, this is represented as a list of named CounterSets, each of which
317+
contains a set of named counters. The counters represent the resources available on the physical
318+
device that are used by the logical devices advertised through DRA.
319+
320+
Logical devices can specify the ConsumesCounters list. Each entry contains a reference to a CounterSet
321+
and a set of named counters with the amounts they will consume. So for a device to be allocatable,
322+
the referenced counter sets must have sufficient quantity for the counters referenced by the device.
323+
324+
Here is an example of two devices, each consuming 6Gi of memory from the a shared counter with
325+
8Gi of memory. Thus, only one of the devices can be allocated at any point in time. The scheduler
326+
handles this and it is transparent to the consumer as the ResourceClaim API is not affected.
327+
328+
```yaml
329+
kind: ResourceSlice
330+
apiVersion: resource.k8s.io/v1beta1
331+
metadata:
332+
name: resourceslice
333+
spec:
334+
nodeName: worker-1
335+
pool:
336+
name: pool
337+
generation: 1
338+
resourceSliceCount: 1
339+
driver: dra.example.com
340+
sharedCounters:
341+
- name: gpu-1-counters
342+
counters:
343+
memory:
344+
value: 8Gi
345+
devices:
346+
- name: device-1
347+
consumesCounters:
348+
- counterSet: gpu-1-counters
349+
counters:
350+
memory:
351+
value: 6Gi
352+
- name: device-2
353+
consumesCounters:
354+
- counterSet: gpu-1-counters
355+
counters:
356+
memory:
357+
value: 6Gi
358+
```
359+
307360
## Enabling dynamic resource allocation
308361

309362
Dynamic resource allocation is a *beta feature* which is off by default and only enabled when the
@@ -366,6 +419,13 @@ is enabled in the kube-apiserver and kube-scheduler. It also requires that the
366419
`DynamicResourceAllocation` [feature gate](/docs/reference/command-line-tools-reference/feature-gates/)
367420
is enabled.
368421

422+
### Enabling Partitionable Devices
423+
424+
[Partitionable Devices](#partitionable-devices) is an *alpha feature*
425+
and only enabled when the `DRAPartitionableDevices`
426+
[feature gate](/docs/reference/command-line-tools-reference/feature-gates/)
427+
is enabled in the kube-apiserver and kube-scheduler.
428+
369429
## {{% heading "whatsnext" %}}
370430

371431
- For more information on the design, see the
Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,17 @@
1+
---
2+
title: DRAPartitionableDevices
3+
content_type: feature_gate
4+
_build:
5+
list: never
6+
render: false
7+
8+
stages:
9+
- stage: alpha
10+
defaultValue: false
11+
fromVersion: "1.33"
12+
---
13+
Enables support for requesting [Partitionable Devices](/docs/concepts/scheduling-eviction/dynamic-resource-allocation/#partitionable-devices)
14+
for DRA. This lets drivers advertise multiple devices that maps to the same resources
15+
of a physical device.
16+
17+
This feature gate has no effect unless you also enable the `DynamicResourceAllocation` feature gate.

0 commit comments

Comments
 (0)