Skip to content

Commit d58cf8c

Browse files
committed
Addressed comments
1 parent b40dc02 commit d58cf8c

File tree

1 file changed

+31
-28
lines changed
  • keps/sig-scheduling/5234-dra-resourceslice-mixins

1 file changed

+31
-28
lines changed

keps/sig-scheduling/5234-dra-resourceslice-mixins/README.md

Lines changed: 31 additions & 28 deletions
Original file line numberDiff line numberDiff line change
@@ -70,7 +70,7 @@ SIG Architecture for cross-cutting KEPs).
7070
- [Risks and Mitigations](#risks-and-mitigations)
7171
- [ResourceSlice resources will be harder to understand](#resourceslice-resources-will-be-harder-to-understand)
7272
- [Flatting of ResourceSlices might be needed by all tools using the API](#flatting-of-resourceslices-might-be-needed-by-all-tools-using-the-api)
73-
- [More attributes, capacities and counters might worsen worst-case scheduling](#more-attributes-capacities-and-counters-might-worsen-worst-case-scheduling)
73+
- [Mixins and more counters might worsen worst-case scheduling](#mixins-and-more-counters-might-worsen-worst-case-scheduling)
7474
- [Design Details](#design-details)
7575
- [API](#api)
7676
- [Implementation](#implementation)
@@ -179,15 +179,18 @@ also limits the number of partitionable devices for a single physical device.
179179
- Enable a more compact way to define devices in ResourceSlices so duplication can
180180
be reduced and a larger number of devices can be published within a single
181181
ResourceSlice.
182-
- Enable defining devices with more attributes, capacities, and consumed counters.
183-
- Enable defining counter sets with more counters.
182+
- Enable defining counter sets with more counters and devices with more counsumed
183+
counters.
184184

185185
### Non-Goals
186186

187187
- Not part of the plan for alpha: developing kubectl command or plugin to let
188188
users see the flattened device definitions. Mixins does make it harder to find
189189
the full definition for a specific device, so this might be added to the scope
190190
for Beta or GA.
191+
- Enable devices to have more than 32 attributes and capacities. Increasing this
192+
have implications for the CEL cost functions, so we are not looking to increase
193+
the limits as part of this KEP.
191194

192195
## Proposal
193196

@@ -259,7 +262,7 @@ tools and potential for implementations that differ in meaningful ways. This can
259262
be addressed by providing reusable libraries that can be leveraged by other
260263
tools.
261264

262-
#### More attributes, capacities and counters might worsen worst-case scheduling
265+
#### Mixins and more counters might worsen worst-case scheduling
263266

264267
This will not negatively effect existing scheduling performance of existing
265268
ResourceSlice definitions, but DRA driver authors taking advantage of mixins should
@@ -323,9 +326,6 @@ type ResourceSliceMixins struct {
323326
// shared attributes and capacities that an actual device can "include"
324327
// to extend the set of attributes and capacities it already defines.
325328
//
326-
// The maximum number of attributes, capacity, and counters across all
327-
// mixins is 256.
328-
//
329329
// +optional
330330
// +listType=atomic
331331
Device []DeviceMixin
@@ -334,9 +334,6 @@ type ResourceSliceMixins struct {
334334
// consumption mixins, each of which contains a set of counters
335335
// that a device will consume from a counter set.
336336
//
337-
// The maximum number of attributes, capacity, and counters across all
338-
// mixins is 256.
339-
//
340337
// +optional
341338
// +listType=atomic
342339
DeviceCounterConsumption []DeviceCounterConsumptionMixin
@@ -345,9 +342,6 @@ type ResourceSliceMixins struct {
345342
// a collection of counters that a CounterSet can "include"
346343
// to extend the set of counters it already defines.
347344
//
348-
// The maximum number of attributes, capacity, and counters across all
349-
// mixins is 256.
350-
//
351345
// +optional
352346
// +listType=atomic
353347
CounterSet []CounterSetMixin
@@ -368,8 +362,10 @@ type DeviceMixin struct {
368362
// must be listed without the driver name as domain prefix in
369363
// their name. All others must be listed with their domain prefix.
370364
//
371-
// The maximum number of attributes, capacity, and counters across all
372-
// mixins is 256.
365+
// The maximum number of attributes and capacities across all devices
366+
// and device mixins in a ResourceSlice is 4096. When flattened, the
367+
// total number of attributes and capacities for each device can not
368+
// exceed 32.
373369
//
374370
// +optional
375371
Attributes map[QualifiedName]DeviceAttribute
@@ -381,8 +377,10 @@ type DeviceMixin struct {
381377
// must be listed without the driver name as domain prefix in
382378
// their name. All others must be listed with their domain prefix.
383379
//
384-
// The maximum number of attributes, capacity, and counters across all
385-
// mixins is 256.
380+
// The maximum number of attributes and capacities across all devices
381+
// and device mixins in a ResourceSlice is 4096. When flattened, the
382+
// total number of attributes and capacities for each device can not
383+
// exceed 32.
386384
//
387385
// +optional
388386
Capacity map[QualifiedName]DeviceCapacity
@@ -401,8 +399,8 @@ type DeviceCounterConsumptionMixin struct {
401399
// Counters defines a set of counters
402400
// that a device will consume from a counter set.
403401
//
404-
// The maximum number of attributes, capacity, and counters across all
405-
// mixins is 256.
402+
// The maximum number device counter consumption all device counter consumptions
403+
// and device counter consumption mixins in a ResourceSlice is 2048.
406404
//
407405
// +required
408406
Counters map[string]Counter
@@ -419,7 +417,8 @@ type CounterSetMixin struct {
419417
// Counters defines the set of counters for this mixin.
420418
// The name of each counter must be unique in that set and must be a DNS label.
421419
//
422-
// The maximum number of counters is 32.
420+
// The maximum number of counters across all counter sets and counter set
421+
// mixins in a ResourceSlice is 256.
423422
//
424423
// +required
425424
Counters map[string]Counter
@@ -473,12 +472,13 @@ type DeviceCounterConsumption struct {
473472

474473
### Implementation
475474

476-
The DRA scheduler plugin will flatten the counter sets and devices before
477-
going through the allocation process. This will happen as part of conversion
478-
from the `v1beta1` API to the types defined in
479-
[`k8s.io/dynamic-resource-allocation/api`](https://github.com/kubernetes/dynamic-resource-allocation/tree/master/api).
475+
The DRA scheduler will keep the mixin structure throughout the scheduling process
476+
as much as possible and avoid completely flattening the ResourceSlices. This is
477+
to avoid additional memory usage that might come as a result. For example, we
478+
plan to walk the mixins as part of the CEL variable lookup to avoid having to
479+
flatten the device representation.
480480

481-
If the mixins feature is disabled during this process, any devices or counter sets that
481+
If the mixins feature is disabled, any devices or counter sets that
482482
references mixins will be droppped. This also means that all devices that references
483483
a dropped counter set will also be dropped. The result is that the scheduler will not
484484
see those devices. From the users point of view, the consequence is that the scheduler
@@ -507,6 +507,10 @@ We will still enforce some per-slice limits:
507507
* The number of mixins that can be referenced from each device, counter set or device counter consumption is 8.
508508
* The number of taints per device is 4.
509509

510+
We will also enforce one limit on the flattened device:
511+
* The combined number of attributes and capacities for a single device can not exceed 32. We do this
512+
to avoid increasing the cost of evaluation the CEL expressions for a device.
513+
510514
The limits on the number of counters across counter sets, mixins and device counter consumption in 1.33 for the
511515
Partitionable Devices KEP will be removed, as those are still in alpha.
512516
The limit of 32 on the number of attributes and capacities per device will be removed over the next 2 releases (1.34 and 1.35) to
@@ -841,9 +845,8 @@ Flattening the devices and counter sets will require slightly more work, but
841845
this is unlikely to have any meaningful impact on the time used for allocation.
842846

843847
It does allow DRA driver authors to create more complex devices, with a larger
844-
number of attributes, capacities and counters. It also allows for larger number
845-
of counters in the counter sets. This can worsen the worst-case scheduling
846-
performance.
848+
number of counters. It also allows for larger number of counters in the counter
849+
sets. This can worsen the worst-case scheduling performance.
847850

848851
###### Will enabling / using this feature result in non-negligible increase of resource usage (CPU, RAM, disk, IO, ...) in any components?
849852

0 commit comments

Comments
 (0)