@@ -6,6 +6,8 @@ title: Dynamic Resource Allocation
6
6
content_type : concept
7
7
weight : 65
8
8
api_metadata :
9
+ - apiVersion : " resource.k8s.io/v1alpha3"
10
+ kind : " DeviceTaintRule"
9
11
- apiVersion : " resource.k8s.io/v1beta1"
10
12
kind : " ResourceClaim"
11
13
- apiVersion : " resource.k8s.io/v1beta1"
@@ -14,6 +16,14 @@ api_metadata:
14
16
kind : " DeviceClass"
15
17
- apiVersion : " resource.k8s.io/v1beta1"
16
18
kind : " ResourceSlice"
19
+ - apiVersion : " resource.k8s.io/v1beta2"
20
+ kind : " ResourceClaim"
21
+ - apiVersion : " resource.k8s.io/v1beta2"
22
+ kind : " ResourceClaimTemplate"
23
+ - apiVersion : " resource.k8s.io/v1beta2"
24
+ kind : " DeviceClass"
25
+ - apiVersion : " resource.k8s.io/v1beta2"
26
+ kind : " ResourceSlice"
17
27
---
18
28
19
29
<!-- overview -->
@@ -48,8 +58,8 @@ v{{< skew currentVersion>}}, check the documentation for that version of Kuberne
48
58
49
59
## API
50
60
51
- The ` resource.k8s.io/v1beta1 `
52
- {{< glossary_tooltip text="API group " term_id="api-group" >}} provides these types:
61
+ The ` resource.k8s.io/v1beta1 ` and ` resource.k8s.io/v1beta2 `
62
+ {{< glossary_tooltip text="API groups " term_id="api-group" >}} provide these types:
53
63
54
64
ResourceClaim
55
65
: Describes a request for access to resources in the cluster,
@@ -71,9 +81,13 @@ DeviceClass
71
81
in a ResourceClaim must reference exactly one DeviceClass.
72
82
73
83
ResourceSlice
74
- : Used by DRA drivers to publish information about resources
84
+ : Used by DRA drivers to publish information about resources (typically devices)
75
85
that are available in the cluster.
76
86
87
+ DeviceTaintRule
88
+ : Used by admins or control plane components to add device taints
89
+ to the devices described in ResourceSlices.
90
+
77
91
All parameters that select devices are defined in the ResourceClaim and
78
92
DeviceClass with in-tree types. Configuration parameters can be embedded there.
79
93
Which configuration parameters are valid depends on the DRA driver -- Kubernetes
@@ -94,15 +108,16 @@ Here is an example for a fictional resource driver. Two ResourceClaim objects
94
108
will get created for this Pod and each container gets access to one of them.
95
109
96
110
``` yaml
97
- apiVersion : resource.k8s.io/v1beta1
111
+ apiVersion : resource.k8s.io/v1beta2
98
112
kind : DeviceClass
99
- name : resource.example.com
113
+ metadata :
114
+ name : resource.example.com
100
115
spec :
101
116
selectors :
102
117
- cel :
103
118
expression : device.driver == "resource-driver.example.com"
104
119
---
105
- apiVersion : resource.k8s.io/v1beta1
120
+ apiVersion : resource.k8s.io/v1beta2
106
121
kind : ResourceClaimTemplate
107
122
metadata :
108
123
name : large-black-cat-claim-template
@@ -111,13 +126,14 @@ spec:
111
126
devices :
112
127
requests :
113
128
- name : req-0
114
- deviceClassName : resource.example.com
115
- selectors :
116
- - cel :
117
- expression : |-
118
- device.attributes["resource-driver.example.com"].color == "black" &&
119
- device.attributes["resource-driver.example.com"].size == "large"
120
- –--
129
+ exactly :
130
+ deviceClassName : resource.example.com
131
+ selectors :
132
+ - cel :
133
+ expression : |-
134
+ device.attributes["resource-driver.example.com"].color == "black" &&
135
+ device.attributes["resource-driver.example.com"].size == "large"
136
+ ---
121
137
apiVersion : v1
122
138
kind : Pod
123
139
metadata :
@@ -219,7 +235,7 @@ admin access grants access to in-use devices and may enable additional
219
235
permissions when making the device available in a container :
220
236
221
237
` ` ` yaml
222
- apiVersion: resource.k8s.io/v1beta1
238
+ apiVersion: resource.k8s.io/v1beta2
223
239
kind: ResourceClaimTemplate
224
240
metadata:
225
241
name: large-black-cat-claim-template
@@ -228,9 +244,10 @@ spec:
228
244
devices:
229
245
requests:
230
246
- name: req-0
231
- deviceClassName: resource.example.com
232
- allocationMode: All
233
- adminAccess: true
247
+ exactly:
248
+ deviceClassName: resource.example.com
249
+ allocationMode: All
250
+ adminAccess: true
234
251
` ` `
235
252
236
253
If this feature is disabled, the `adminAccess` field will be removed
@@ -277,7 +294,7 @@ allocated if it is available. But if it is not and two small white devices are a
277
294
the pod will still be able to run.
278
295
279
296
` ` ` yaml
280
- apiVersion: resource.k8s.io/v1beta1
297
+ apiVersion: resource.k8s.io/v1beta2
281
298
kind: ResourceClaimTemplate
282
299
metadata:
283
300
name: prioritized-list-claim-template
@@ -327,7 +344,7 @@ handles this and it is transparent to the consumer as the ResourceClaim API is n
327
344
328
345
` ` ` yaml
329
346
kind: ResourceSlice
330
- apiVersion: resource.k8s.io/v1beta1
347
+ apiVersion: resource.k8s.io/v1beta2
331
348
metadata:
332
349
name: resourceslice
333
350
spec:
@@ -347,21 +364,110 @@ spec:
347
364
consumesCounters:
348
365
- counterSet: gpu-1-counters
349
366
counters:
350
- memory:
367
+ memory:
351
368
value: 6Gi
352
369
- name: device-2
353
370
consumesCounters:
354
371
- counterSet: gpu-1-counters
355
372
counters:
356
- memory:
373
+ memory:
357
374
value: 6Gi
358
375
` ` `
359
376
377
+ # # Device taints and tolerations
378
+
379
+ {{< feature-state feature_gate_name="DRADeviceTaints" >}}
380
+
381
+ Device taints are similar to node taints : a taint has a string key, a string
382
+ value, and an effect. The effect is applied to the ResourceClaim which is
383
+ using a tainted device and to all Pods referencing that ResourceClaim.
384
+ The "NoSchedule" effect prevents scheduling those Pods.
385
+ Tainted devices are ignored when trying to allocate a ResourceClaim
386
+ because using them would prevent scheduling of Pods.
387
+
388
+ The "NoExecute" effect implies "NoSchedule" and in addition causes eviction
389
+ of all Pods which have been scheduled already. This eviction is implemented
390
+ in the device taint eviction controller in kube-controller-manager by
391
+ deleting affected Pods.
392
+
393
+ ResourceClaims can tolerate taints. If a taint is tolerated, its effect does
394
+ not apply. An empty toleration matches all taints. A toleration can be limited to
395
+ certain effects and/or match certain key/value pairs. A toleration can check
396
+ that a certain key exists, regardless which value it has, or it can check
397
+ for specific values of a key.
398
+ For more information on this matching see the
399
+ [node taint concepts](/docs/concepts/scheduling-eviction/taint-and-toleration#concepts).
400
+
401
+ Eviction can be delayed by tolerating a taint for a certain duration.
402
+ That delay starts at the time when a taint gets added to a device, which is recorded in a field
403
+ of the taint.
404
+
405
+ Taints apply as described above also to ResourceClaims allocating "all" devices on a node.
406
+ All devices must be untainted or all of their taints must be tolerated.
407
+ Allocating a device with admin access (described [above](#admin-access))
408
+ is not exempt either. An admin using that mode must explicitly tolerate all taints
409
+ to access tainted devices.
410
+
411
+ Taints can be added to devices in two different ways :
412
+
413
+ # ## Taints set by the driver
414
+
415
+ A DRA driver can add taints to the device information that it publishes in ResourceSlices.
416
+ Consult the documentation of a DRA driver to learn whether the driver uses taints and what
417
+ their keys and values are.
418
+
419
+ # ## Taints set by an admin
420
+
421
+ An admin or a control plane component can taint devices without having to tell
422
+ the DRA driver to include taints in its device information in ResourceSlices. They do that by
423
+ creating DeviceTaintRules. Each DeviceTaintRule adds one taint to devices which
424
+ match the device selector. Without such a selector, no devices are tainted. This
425
+ makes it harder to accidentally evict all pods using ResourceClaims when leaving out
426
+ the selector by mistake.
427
+
428
+ Devices can be selected by giving the name of a DeviceClass, driver, pool,
429
+ and/or device. The DeviceClass selects all devices that are selected by the
430
+ selectors in that DeviceClass. With just the driver name, an admin can taint
431
+ all devices managed by that driver, for example while doing some kind of
432
+ maintenance of that driver across the entire cluster. Adding a pool name can
433
+ limit the taint to a single node, if the driver manages node-local devices.
434
+
435
+ Finally, adding the device name can select one specific device. The device name
436
+ and pool name can also be used alone, if desired. For example, drivers for node-local
437
+ devices are encouraged to use the node name as their pool name. Then tainting with
438
+ that pool name automatically taints all devices on a node.
439
+
440
+ Drivers might use stable names like "gpu-0" that hide which specific device is
441
+ currently assigned to that name. To support tainting a specific hardware
442
+ instance, CEL selectors can be used in a DeviceTaintRule to match a vendor-specific
443
+ unique ID attribute, if the driver supports one for its hardware.
444
+
445
+ The taint applies as long as the DeviceTaintRule exists. It can be modified and
446
+ and removed at any time. Here is one example of a DeviceTaintRule for a fictional
447
+ DRA driver :
448
+
449
+ ` ` ` yaml
450
+ apiVersion: resource.k8s.io/v1alpha3
451
+ kind: DeviceTaintRule
452
+ metadata:
453
+ name: example
454
+ spec:
455
+ # The entire hardware installation for this
456
+ # particular driver is broken.
457
+ # Evict all pods and don't schedule new ones.
458
+ deviceSelector:
459
+ driver: dra.example.com
460
+ taint:
461
+ key: dra.example.com/unhealthy
462
+ value: Broken
463
+ effect: NoExecute
464
+ ` ` `
465
+
360
466
# # Enabling dynamic resource allocation
361
467
362
468
Dynamic resource allocation is a *beta feature* which is off by default and only enabled when the
363
469
` DynamicResourceAllocation` [feature gate](/docs/reference/command-line-tools-reference/feature-gates/)
364
- and the `resource.k8s.io/v1beta1` {{< glossary_tooltip text="API group " term_id="api-group" >}}
470
+ and the `resource.k8s.io/v1beta1` and `resource.k8s.io/v1beta2` {{< glossary_tooltip text="API groups " term_id="api-group" >}}
365
471
are enabled. For details on that, see the `--feature-gates` and `--runtime-config`
366
472
[kube-apiserver parameters](/docs/reference/command-line-tools-reference/kube-apiserver/).
367
473
kube-scheduler, kube-controller-manager and kubelet also need the feature gate.
@@ -426,6 +532,13 @@ and only enabled when the `DRAPartitionableDevices`
426
532
[feature gate](/docs/reference/command-line-tools-reference/feature-gates/)
427
533
is enabled in the kube-apiserver and kube-scheduler.
428
534
535
+ # ## Enabling device taints and tolerations
536
+
537
+ [Device taints and tolerations](#device-taints-and-tolerations) is an *alpha feature* and only enabled when the
538
+ ` DRADeviceTaints` [feature gate](/docs/reference/command-line-tools-reference/feature-gates/)
539
+ is enabled in the kube-apiserver, kube-controller-manager and kube-scheduler. To use DeviceTaintRules, the
540
+ ` resource.k8s.io/v1alpha3` API version must be enabled.
541
+
429
542
# # {{% heading "whatsnext" %}}
430
543
431
544
- For more information on the design, see the
0 commit comments