@@ -70,7 +70,7 @@ SIG Architecture for cross-cutting KEPs).
70
70
- [ Risks and Mitigations] ( #risks-and-mitigations )
71
71
- [ ResourceSlice resources will be harder to understand] ( #resourceslice-resources-will-be-harder-to-understand )
72
72
- [ Flatting of ResourceSlices might be needed by all tools using the API] ( #flatting-of-resourceslices-might-be-needed-by-all-tools-using-the-api )
73
- - [ More attributes, capacities and counters might worsen worst-case scheduling] ( #more-attributes-capacities- and-counters-might-worsen-worst-case-scheduling )
73
+ - [ Mixins and more counters might worsen worst-case scheduling] ( #mixins- and-more -counters-might-worsen-worst-case-scheduling )
74
74
- [ Design Details] ( #design-details )
75
75
- [ API] ( #api )
76
76
- [ Implementation] ( #implementation )
@@ -179,15 +179,18 @@ also limits the number of partitionable devices for a single physical device.
179
179
- Enable a more compact way to define devices in ResourceSlices so duplication can
180
180
be reduced and a larger number of devices can be published within a single
181
181
ResourceSlice.
182
- - Enable defining devices with more attributes, capacities, and consumed counters.
183
- - Enable defining counter sets with more counters.
182
+ - Enable defining counter sets with more counters and devices with more counsumed
183
+ counters.
184
184
185
185
### Non-Goals
186
186
187
187
- Not part of the plan for alpha: developing kubectl command or plugin to let
188
188
users see the flattened device definitions. Mixins does make it harder to find
189
189
the full definition for a specific device, so this might be added to the scope
190
190
for Beta or GA.
191
+ - Enable devices to have more than 32 attributes and capacities. Increasing this
192
+ have implications for the CEL cost functions, so we are not looking to increase
193
+ the limits as part of this KEP.
191
194
192
195
## Proposal
193
196
@@ -259,7 +262,7 @@ tools and potential for implementations that differ in meaningful ways. This can
259
262
be addressed by providing reusable libraries that can be leveraged by other
260
263
tools.
261
264
262
- #### More attributes, capacities and counters might worsen worst-case scheduling
265
+ #### Mixins and more counters might worsen worst-case scheduling
263
266
264
267
This will not negatively effect existing scheduling performance of existing
265
268
ResourceSlice definitions, but DRA driver authors taking advantage of mixins should
@@ -323,9 +326,6 @@ type ResourceSliceMixins struct {
323
326
// shared attributes and capacities that an actual device can "include"
324
327
// to extend the set of attributes and capacities it already defines.
325
328
//
326
- // The maximum number of attributes, capacity, and counters across all
327
- // mixins is 256.
328
- //
329
329
// +optional
330
330
// +listType=atomic
331
331
Device []DeviceMixin
@@ -334,9 +334,6 @@ type ResourceSliceMixins struct {
334
334
// consumption mixins, each of which contains a set of counters
335
335
// that a device will consume from a counter set.
336
336
//
337
- // The maximum number of attributes, capacity, and counters across all
338
- // mixins is 256.
339
- //
340
337
// +optional
341
338
// +listType=atomic
342
339
DeviceCounterConsumption []DeviceCounterConsumptionMixin
@@ -345,9 +342,6 @@ type ResourceSliceMixins struct {
345
342
// a collection of counters that a CounterSet can "include"
346
343
// to extend the set of counters it already defines.
347
344
//
348
- // The maximum number of attributes, capacity, and counters across all
349
- // mixins is 256.
350
- //
351
345
// +optional
352
346
// +listType=atomic
353
347
CounterSet []CounterSetMixin
@@ -368,8 +362,10 @@ type DeviceMixin struct {
368
362
// must be listed without the driver name as domain prefix in
369
363
// their name. All others must be listed with their domain prefix.
370
364
//
371
- // The maximum number of attributes, capacity, and counters across all
372
- // mixins is 256.
365
+ // The maximum number of attributes and capacities across all devices
366
+ // and device mixins in a ResourceSlice is 4096. When flattened, the
367
+ // total number of attributes and capacities for each device can not
368
+ // exceed 32.
373
369
//
374
370
// +optional
375
371
Attributes map [QualifiedName]DeviceAttribute
@@ -381,8 +377,10 @@ type DeviceMixin struct {
381
377
// must be listed without the driver name as domain prefix in
382
378
// their name. All others must be listed with their domain prefix.
383
379
//
384
- // The maximum number of attributes, capacity, and counters across all
385
- // mixins is 256.
380
+ // The maximum number of attributes and capacities across all devices
381
+ // and device mixins in a ResourceSlice is 4096. When flattened, the
382
+ // total number of attributes and capacities for each device can not
383
+ // exceed 32.
386
384
//
387
385
// +optional
388
386
Capacity map [QualifiedName]DeviceCapacity
@@ -401,8 +399,8 @@ type DeviceCounterConsumptionMixin struct {
401
399
// Counters defines a set of counters
402
400
// that a device will consume from a counter set.
403
401
//
404
- // The maximum number of attributes, capacity, and counters across all
405
- // mixins is 256 .
402
+ // The maximum number device counter consumption all device counter consumptions
403
+ // and device counter consumption mixins in a ResourceSlice is 2048 .
406
404
//
407
405
// +required
408
406
Counters map [string ]Counter
@@ -419,7 +417,8 @@ type CounterSetMixin struct {
419
417
// Counters defines the set of counters for this mixin.
420
418
// The name of each counter must be unique in that set and must be a DNS label.
421
419
//
422
- // The maximum number of counters is 32.
420
+ // The maximum number of counters across all counter sets and counter set
421
+ // mixins in a ResourceSlice is 256.
423
422
//
424
423
// +required
425
424
Counters map [string ]Counter
@@ -473,12 +472,13 @@ type DeviceCounterConsumption struct {
473
472
474
473
### Implementation
475
474
476
- The DRA scheduler plugin will flatten the counter sets and devices before
477
- going through the allocation process. This will happen as part of conversion
478
- from the ` v1beta1 ` API to the types defined in
479
- [ ` k8s.io/dynamic-resource-allocation/api ` ] ( https://github.com/kubernetes/dynamic-resource-allocation/tree/master/api ) .
475
+ The DRA scheduler will keep the mixin structure throughout the scheduling process
476
+ as much as possible and avoid completely flattening the ResourceSlices. This is
477
+ to avoid additional memory usage that might come as a result. For example, we
478
+ plan to walk the mixins as part of the CEL variable lookup to avoid having to
479
+ flatten the device representation.
480
480
481
- If the mixins feature is disabled during this process , any devices or counter sets that
481
+ If the mixins feature is disabled, any devices or counter sets that
482
482
references mixins will be droppped. This also means that all devices that references
483
483
a dropped counter set will also be dropped. The result is that the scheduler will not
484
484
see those devices. From the users point of view, the consequence is that the scheduler
@@ -507,6 +507,10 @@ We will still enforce some per-slice limits:
507
507
* The number of mixins that can be referenced from each device, counter set or device counter consumption is 8.
508
508
* The number of taints per device is 4.
509
509
510
+ We will also enforce one limit on the flattened device:
511
+ * The combined number of attributes and capacities for a single device can not exceed 32. We do this
512
+ to avoid increasing the cost of evaluation the CEL expressions for a device.
513
+
510
514
The limits on the number of counters across counter sets, mixins and device counter consumption in 1.33 for the
511
515
Partitionable Devices KEP will be removed, as those are still in alpha.
512
516
The limit of 32 on the number of attributes and capacities per device will be removed over the next 2 releases (1.34 and 1.35) to
@@ -841,9 +845,8 @@ Flattening the devices and counter sets will require slightly more work, but
841
845
this is unlikely to have any meaningful impact on the time used for allocation.
842
846
843
847
It does allow DRA driver authors to create more complex devices, with a larger
844
- number of attributes, capacities and counters. It also allows for larger number
845
- of counters in the counter sets. This can worsen the worst-case scheduling
846
- performance.
848
+ number of counters. It also allows for larger number of counters in the counter
849
+ sets. This can worsen the worst-case scheduling performance.
847
850
848
851
###### Will enabling / using this feature result in non-negligible increase of resource usage (CPU, RAM, disk, IO, ...) in any components?
849
852
0 commit comments