@@ -9,18 +9,28 @@ weight: 65
9
9
10
10
<!-- overview -->
11
11
12
+ Core Dynamic Resource Allocation with structured parameters:
13
+
12
14
{{< feature-state feature_gate_name="DynamicResourceAllocation" >}}
13
15
16
+ Dynamic Resource Allocation with control plane controller:
17
+
18
+ {{< feature-state feature_gate_name="DRAControlPlaneController" >}}
19
+
14
20
Dynamic resource allocation is an API for requesting and sharing resources
15
21
between pods and containers inside a pod. It is a generalization of the
16
- persistent volumes API for generic resources. Third-party resource drivers are
17
- responsible for tracking and allocating resources, with additional support
18
- provided by Kubernetes via _ structured parameters_ (introduced in Kubernetes 1.30).
19
- When a driver uses structured parameters, Kubernetes handles scheduling
20
- and resource allocation without having to communicate with the driver.
22
+ persistent volumes API for generic resources. Typically those resources
23
+ are devices like GPUs.
24
+
25
+ Third-party resource drivers are
26
+ responsible for tracking and preparing resources, with allocation of
27
+ resources handled by Kubernetes via _ structured parameters_ (introduced in Kubernetes 1.30).
21
28
Different kinds of resources support arbitrary parameters for defining requirements and
22
29
initialization.
23
30
31
+ When a driver provides a _ control plane controller_ , the driver itself
32
+ handles allocation in cooperation with the Kubernetes scheduler.
33
+
24
34
## {{% heading "prerequisites" %}}
25
35
26
36
Kubernetes v{{< skew currentVersion >}} includes cluster-level API support for
@@ -34,63 +44,47 @@ check the documentation for that version of Kubernetes.
34
44
35
45
## API
36
46
37
- The ` resource.k8s.io/v1alpha2 ` {{< glossary_tooltip text="API group"
47
+ The ` resource.k8s.io/v1alpha3 ` {{< glossary_tooltip text="API group"
38
48
term_id="api-group" >}} provides these types:
39
49
40
- ResourceClass
41
- : Defines which resource driver handles a certain kind of
42
- resource and provides common parameters for it. ResourceClasses
43
- are created by a cluster administrator when installing a resource
44
- driver.
45
-
46
50
ResourceClaim
47
- : Defines a particular resource instance that is required by a
48
- workload. Created by a user (lifecycle managed manually, can be shared
49
- between different Pods) or for individual Pods by the control plane based on
50
- a ResourceClaimTemplate (automatic lifecycle, typically used by just one
51
- Pod) .
51
+ : Describes a request for access to resources in the cluster,
52
+ for use by workloads. For example, if a workload needs an accelerator device
53
+ with specific properties, this is how that request is expressed. The status
54
+ stanza tracks whether this claim has been satisfied and what specific
55
+ resources have been allocated .
52
56
53
57
ResourceClaimTemplate
54
58
: Defines the spec and some metadata for creating
55
59
ResourceClaims. Created by a user when deploying a workload.
60
+ The per-Pod ResourceClaims are then created and removed by Kubernetes
61
+ automatically.
62
+
63
+ DeviceClass
64
+ : Contains pre-defined selection criteria for certain devices and
65
+ configuration for them. DeviceClasses are created by a cluster administrator
66
+ when installing a resource driver. Each request to allocate a device
67
+ in a ResourceClaim must reference exactly one DeviceClass.
56
68
57
69
PodSchedulingContext
58
70
: Used internally by the control plane and resource drivers
59
71
to coordinate pod scheduling when ResourceClaims need to be allocated
60
- for a Pod.
72
+ for a Pod and those ResourceClaims use a control plane controller .
61
73
62
74
ResourceSlice
63
75
: Used with structured parameters to publish information about resources
64
76
that are available in the cluster.
65
77
66
- ResourceClaimParameters
67
- : Contain the parameters for a ResourceClaim which influence scheduling,
68
- in a format that is understood by Kubernetes (the "structured parameter
69
- model"). Additional parameters may be embedded in an opaque
70
- extension, for use by the vendor driver when setting up the underlying
71
- resource.
72
-
73
- ResourceClassParameters
74
- : Similar to ResourceClaimParameters, the ResourceClassParameters provides
75
- a type for ResourceClass parameters which is understood by Kubernetes.
76
-
77
- Parameters for ResourceClass and ResourceClaim are stored in separate objects,
78
- typically using the type defined by a {{< glossary_tooltip
79
- term_id="CustomResourceDefinition" text="CRD" >}} that was created when
80
- installing a resource driver.
81
-
82
- The developer of a resource driver decides whether they want to handle these
83
- parameters in their own external controller or instead rely on Kubernetes to
84
- handle them through the use of structured parameters. A
78
+ The developer of a resource driver decides whether they want to handle
79
+ allocation themselves with a control plane controller or instead rely on allocation
80
+ through Kubernetes with structured parameters. A
85
81
custom controller provides more flexibility, but cluster autoscaling is not
86
82
going to work reliably for node-local resources. Structured parameters enable
87
83
cluster autoscaling, but might not satisfy all use-cases.
88
84
89
- When a driver uses structured parameters, it is still possible to let the
90
- end-user specify parameters with vendor-specific CRDs. When doing so, the
91
- driver needs to translate those
92
- custom parameters into the in-tree types. Alternatively, a driver may also
93
- document how to use the in-tree types directly.
85
+ When a driver uses structured parameters, all parameters that select devices
86
+ are defined in the ResourceClaim and DeviceClass with in-tree types. Configuration
87
+ parameters can be embedded there as arbitrary JSON objects.
94
88
95
89
The ` core/v1 ` ` PodSpec ` defines ResourceClaims that are needed for a Pod in a
96
90
` resourceClaims ` field. Entries in that list reference either a ResourceClaim
@@ -107,29 +101,29 @@ Here is an example for a fictional resource driver. Two ResourceClaim objects
107
101
will get created for this Pod and each container gets access to one of them.
108
102
109
103
``` yaml
110
- apiVersion : resource.k8s.io/v1alpha2
111
- kind : ResourceClass
104
+ apiVersion : resource.k8s.io/v1alpha3
105
+ kind : DeviceClass
112
106
name : resource.example.com
113
- driverName : resource-driver.example.com
114
- ---
115
- apiVersion : cats.resource.example.com/v1
116
- kind : ClaimParameters
117
- name : large-black-cat-claim-parameters
118
107
spec :
119
- color : black
120
- size : large
108
+ selectors :
109
+ - cel :
110
+ expression : device.driver == "resource-driver.example.com"
121
111
---
122
112
apiVersion : resource.k8s.io/v1alpha2
123
113
kind : ResourceClaimTemplate
124
114
metadata :
125
115
name : large-black-cat-claim-template
126
116
spec :
127
117
spec :
128
- resourceClassName : resource.example.com
129
- parametersRef :
130
- apiGroup : cats.resource.example.com
131
- kind : ClaimParameters
132
- name : large-black-cat-claim-parameters
118
+ devices :
119
+ requests :
120
+ - name : req-0
121
+ deviceClassName : resource.example.com
122
+ selectors :
123
+ - cel :
124
+ expression : |-
125
+ device.attributes["resource-driver.example.com"].color == "black" &&
126
+ device.attributes["resource-driver.example.com"].size == "large"
133
127
–--
134
128
apiVersion : v1
135
129
kind : Pod
@@ -151,16 +145,14 @@ spec:
151
145
- name : cat-1
152
146
resourceClaims :
153
147
- name : cat-0
154
- source :
155
- resourceClaimTemplateName : large-black-cat-claim-template
148
+ resourceClaimTemplateName : large-black-cat-claim-template
156
149
- name : cat-1
157
- source :
158
- resourceClaimTemplateName : large-black-cat-claim-template
150
+ resourceClaimTemplateName : large-black-cat-claim-template
159
151
` ` `
160
152
161
153
## Scheduling
162
154
163
- ### Without structured parameters
155
+ ### With control plane controller
164
156
165
157
In contrast to native resources (CPU, RAM) and extended resources (managed by a
166
158
device plugin, advertised by kubelet), without structured parameters
@@ -171,12 +163,7 @@ responsible for that. They mark ResourceClaims as "allocated" once resources
171
163
for it are reserved. This also then tells the scheduler where in the cluster a
172
164
ResourceClaim is available.
173
165
174
- ResourceClaims can get allocated as soon as they are created ("immediate
175
- allocation"), without considering which Pods will use them. The default is to
176
- delay allocation until a Pod gets scheduled which needs the ResourceClaim
177
- (i.e. "wait for first consumer").
178
-
179
- In that mode, the scheduler checks all ResourceClaims needed by a Pod and
166
+ When a pod gets scheduled, the scheduler checks all ResourceClaims needed by a Pod and
180
167
creates a PodScheduling object where it informs the resource drivers
181
168
responsible for those ResourceClaims about nodes that the scheduler considers
182
169
suitable for the Pod. The resource drivers respond by excluding nodes that
@@ -213,12 +200,16 @@ responsibility of allocating resources to a ResourceClaim whenever a pod needs
213
200
them. It does so by retrieving the full list of available resources from
214
201
ResourceSlice objects, tracking which of those resources have already been
215
202
allocated to existing ResourceClaims, and then selecting from those resources
216
- that remain. The exact resources selected are subject to the constraints
217
- provided in any ResourceClaimParameters or ResourceClassParameters associated
218
- with the ResourceClaim.
203
+ that remain.
204
+
205
+ The only kind of supported resources at the moment are devices. A device
206
+ instance has a name and several attributes and capacities. Devices get selected
207
+ through CEL expressions which check those attributes and capacities. In
208
+ addition, the set of selected devices also can be restricted to sets which meet
209
+ certain constraints.
219
210
220
211
The chosen resource is recorded in the ResourceClaim status together with any
221
- vendor-specific parameters , so when a pod is about to start on a node, the
212
+ vendor-specific configuration , so when a pod is about to start on a node, the
222
213
resource driver on the node has all the information it needs to prepare the
223
214
resource.
224
215
@@ -279,21 +270,25 @@ the `.spec.nodeName` field and to use a node selector instead.
279
270
Dynamic resource allocation is an *alpha feature* and only enabled when the
280
271
` DynamicResourceAllocation` [feature
281
272
gate](/docs/reference/command-line-tools-reference/feature-gates/) and the
282
- ` resource.k8s.io/v1alpha2 ` {{< glossary_tooltip text="API group"
273
+ ` resource.k8s.io/v1alpha3 ` {{< glossary_tooltip text="API group"
283
274
term_id="api-group" >}} are enabled. For details on that, see the
284
275
` --feature-gates` and `--runtime-config` [kube-apiserver
285
276
parameters](/docs/reference/command-line-tools-reference/kube-apiserver/).
286
277
kube-scheduler, kube-controller-manager and kubelet also need the feature gate.
287
278
279
+ When a resource driver uses a control plane controller, then the
280
+ ` DRAControlPlaneController` feature gate has to be enabled in addition to
281
+ ` DynamicResourceAllocation` .
282
+
288
283
A quick check whether a Kubernetes cluster supports the feature is to list
289
284
ResourceClass objects with :
290
285
291
286
` ` ` shell
292
- kubectl get resourceclasses
287
+ kubectl get deviceclasses
293
288
` ` `
294
289
295
290
If your cluster supports dynamic resource allocation, the response is either a
296
- list of ResourceClass objects or :
291
+ list of DeviceClass objects or :
297
292
298
293
` ` `
299
294
No resources found
@@ -302,9 +297,14 @@ No resources found
302
297
If not supported, this error is printed instead :
303
298
304
299
` ` `
305
- error: the server doesn't have a resource type "resourceclasses "
300
+ error: the server doesn't have a resource type "deviceclasses "
306
301
` ` `
307
302
303
+ A control plane controller is supported when it is possible to create a
304
+ ResourceClaim where the `spec.controller` field is set. When the
305
+ ` DRAControlPlaneController` feature is disabled, that field automatically
306
+ gets cleared when storing the ResourceClaim.
307
+
308
308
The default configuration of kube-scheduler enables the "DynamicResources"
309
309
plugin if and only if the feature gate is enabled and when using
310
310
the v1 configuration API. Custom configurations may have to be modified to
@@ -316,5 +316,6 @@ be installed. Please refer to the driver's documentation for details.
316
316
# # {{% heading "whatsnext" %}}
317
317
318
318
- For more information on the design, see the
319
- [Dynamic Resource Allocation KEP](https://github.com/kubernetes/enhancements/blob/master/keps/sig-node/3063-dynamic-resource-allocation/README.md)
320
- and the [Structured Parameters KEP](https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/4381-dra-structured-parameters).
319
+ [Structured Parameters with Structured Parameters](https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/4381-dra-structured-parameters)
320
+ and the
321
+ [Dynamic Resource Allocation with Control Plane Controller](https://github.com/kubernetes/enhancements/blob/master/keps/sig-node/3063-dynamic-resource-allocation/README.md) KEPs.
0 commit comments