Skip to content

Commit a8f156b

Browse files
authored
Merge pull request #46816 from pohly/dra-1.31
DRA documentation for 1.31
2 parents fd52687 + 5b40c51 commit a8f156b

File tree

3 files changed

+96
-79
lines changed

3 files changed

+96
-79
lines changed

content/en/docs/concepts/scheduling-eviction/dynamic-resource-allocation.md

Lines changed: 78 additions & 77 deletions
Original file line numberDiff line numberDiff line change
@@ -9,18 +9,28 @@ weight: 65
99

1010
<!-- overview -->
1111

12+
Core Dynamic Resource Allocation with structured parameters:
13+
1214
{{< feature-state feature_gate_name="DynamicResourceAllocation" >}}
1315

16+
Dynamic Resource Allocation with control plane controller:
17+
18+
{{< feature-state feature_gate_name="DRAControlPlaneController" >}}
19+
1420
Dynamic resource allocation is an API for requesting and sharing resources
1521
between pods and containers inside a pod. It is a generalization of the
16-
persistent volumes API for generic resources. Third-party resource drivers are
17-
responsible for tracking and allocating resources, with additional support
18-
provided by Kubernetes via _structured parameters_ (introduced in Kubernetes 1.30).
19-
When a driver uses structured parameters, Kubernetes handles scheduling
20-
and resource allocation without having to communicate with the driver.
22+
persistent volumes API for generic resources. Typically those resources
23+
are devices like GPUs.
24+
25+
Third-party resource drivers are
26+
responsible for tracking and preparing resources, with allocation of
27+
resources handled by Kubernetes via _structured parameters_ (introduced in Kubernetes 1.30).
2128
Different kinds of resources support arbitrary parameters for defining requirements and
2229
initialization.
2330

31+
When a driver provides a _control plane controller_, the driver itself
32+
handles allocation in cooperation with the Kubernetes scheduler.
33+
2434
## {{% heading "prerequisites" %}}
2535

2636
Kubernetes v{{< skew currentVersion >}} includes cluster-level API support for
@@ -34,63 +44,47 @@ check the documentation for that version of Kubernetes.
3444

3545
## API
3646

37-
The `resource.k8s.io/v1alpha2` {{< glossary_tooltip text="API group"
47+
The `resource.k8s.io/v1alpha3` {{< glossary_tooltip text="API group"
3848
term_id="api-group" >}} provides these types:
3949

40-
ResourceClass
41-
: Defines which resource driver handles a certain kind of
42-
resource and provides common parameters for it. ResourceClasses
43-
are created by a cluster administrator when installing a resource
44-
driver.
45-
4650
ResourceClaim
47-
: Defines a particular resource instance that is required by a
48-
workload. Created by a user (lifecycle managed manually, can be shared
49-
between different Pods) or for individual Pods by the control plane based on
50-
a ResourceClaimTemplate (automatic lifecycle, typically used by just one
51-
Pod).
51+
: Describes a request for access to resources in the cluster,
52+
for use by workloads. For example, if a workload needs an accelerator device
53+
with specific properties, this is how that request is expressed. The status
54+
stanza tracks whether this claim has been satisfied and what specific
55+
resources have been allocated.
5256

5357
ResourceClaimTemplate
5458
: Defines the spec and some metadata for creating
5559
ResourceClaims. Created by a user when deploying a workload.
60+
The per-Pod ResourceClaims are then created and removed by Kubernetes
61+
automatically.
62+
63+
DeviceClass
64+
: Contains pre-defined selection criteria for certain devices and
65+
configuration for them. DeviceClasses are created by a cluster administrator
66+
when installing a resource driver. Each request to allocate a device
67+
in a ResourceClaim must reference exactly one DeviceClass.
5668

5769
PodSchedulingContext
5870
: Used internally by the control plane and resource drivers
5971
to coordinate pod scheduling when ResourceClaims need to be allocated
60-
for a Pod.
72+
for a Pod and those ResourceClaims use a control plane controller.
6173

6274
ResourceSlice
6375
: Used with structured parameters to publish information about resources
6476
that are available in the cluster.
6577

66-
ResourceClaimParameters
67-
: Contain the parameters for a ResourceClaim which influence scheduling,
68-
in a format that is understood by Kubernetes (the "structured parameter
69-
model"). Additional parameters may be embedded in an opaque
70-
extension, for use by the vendor driver when setting up the underlying
71-
resource.
72-
73-
ResourceClassParameters
74-
: Similar to ResourceClaimParameters, the ResourceClassParameters provides
75-
a type for ResourceClass parameters which is understood by Kubernetes.
76-
77-
Parameters for ResourceClass and ResourceClaim are stored in separate objects,
78-
typically using the type defined by a {{< glossary_tooltip
79-
term_id="CustomResourceDefinition" text="CRD" >}} that was created when
80-
installing a resource driver.
81-
82-
The developer of a resource driver decides whether they want to handle these
83-
parameters in their own external controller or instead rely on Kubernetes to
84-
handle them through the use of structured parameters. A
78+
The developer of a resource driver decides whether they want to handle
79+
allocation themselves with a control plane controller or instead rely on allocation
80+
through Kubernetes with structured parameters. A
8581
custom controller provides more flexibility, but cluster autoscaling is not
8682
going to work reliably for node-local resources. Structured parameters enable
8783
cluster autoscaling, but might not satisfy all use-cases.
8884

89-
When a driver uses structured parameters, it is still possible to let the
90-
end-user specify parameters with vendor-specific CRDs. When doing so, the
91-
driver needs to translate those
92-
custom parameters into the in-tree types. Alternatively, a driver may also
93-
document how to use the in-tree types directly.
85+
When a driver uses structured parameters, all parameters that select devices
86+
are defined in the ResourceClaim and DeviceClass with in-tree types. Configuration
87+
parameters can be embedded there as arbitrary JSON objects.
9488

9589
The `core/v1` `PodSpec` defines ResourceClaims that are needed for a Pod in a
9690
`resourceClaims` field. Entries in that list reference either a ResourceClaim
@@ -107,29 +101,29 @@ Here is an example for a fictional resource driver. Two ResourceClaim objects
107101
will get created for this Pod and each container gets access to one of them.
108102

109103
```yaml
110-
apiVersion: resource.k8s.io/v1alpha2
111-
kind: ResourceClass
104+
apiVersion: resource.k8s.io/v1alpha3
105+
kind: DeviceClass
112106
name: resource.example.com
113-
driverName: resource-driver.example.com
114-
---
115-
apiVersion: cats.resource.example.com/v1
116-
kind: ClaimParameters
117-
name: large-black-cat-claim-parameters
118107
spec:
119-
color: black
120-
size: large
108+
selectors:
109+
- cel:
110+
expression: device.driver == "resource-driver.example.com"
121111
---
122112
apiVersion: resource.k8s.io/v1alpha2
123113
kind: ResourceClaimTemplate
124114
metadata:
125115
name: large-black-cat-claim-template
126116
spec:
127117
spec:
128-
resourceClassName: resource.example.com
129-
parametersRef:
130-
apiGroup: cats.resource.example.com
131-
kind: ClaimParameters
132-
name: large-black-cat-claim-parameters
118+
devices:
119+
requests:
120+
- name: req-0
121+
deviceClassName: resource.example.com
122+
selectors:
123+
- cel:
124+
expression: |-
125+
device.attributes["resource-driver.example.com"].color == "black" &&
126+
device.attributes["resource-driver.example.com"].size == "large"
133127
–--
134128
apiVersion: v1
135129
kind: Pod
@@ -151,16 +145,14 @@ spec:
151145
- name: cat-1
152146
resourceClaims:
153147
- name: cat-0
154-
source:
155-
resourceClaimTemplateName: large-black-cat-claim-template
148+
resourceClaimTemplateName: large-black-cat-claim-template
156149
- name: cat-1
157-
source:
158-
resourceClaimTemplateName: large-black-cat-claim-template
150+
resourceClaimTemplateName: large-black-cat-claim-template
159151
```
160152
161153
## Scheduling
162154
163-
### Without structured parameters
155+
### With control plane controller
164156
165157
In contrast to native resources (CPU, RAM) and extended resources (managed by a
166158
device plugin, advertised by kubelet), without structured parameters
@@ -171,12 +163,7 @@ responsible for that. They mark ResourceClaims as "allocated" once resources
171163
for it are reserved. This also then tells the scheduler where in the cluster a
172164
ResourceClaim is available.
173165
174-
ResourceClaims can get allocated as soon as they are created ("immediate
175-
allocation"), without considering which Pods will use them. The default is to
176-
delay allocation until a Pod gets scheduled which needs the ResourceClaim
177-
(i.e. "wait for first consumer").
178-
179-
In that mode, the scheduler checks all ResourceClaims needed by a Pod and
166+
When a pod gets scheduled, the scheduler checks all ResourceClaims needed by a Pod and
180167
creates a PodScheduling object where it informs the resource drivers
181168
responsible for those ResourceClaims about nodes that the scheduler considers
182169
suitable for the Pod. The resource drivers respond by excluding nodes that
@@ -213,12 +200,16 @@ responsibility of allocating resources to a ResourceClaim whenever a pod needs
213200
them. It does so by retrieving the full list of available resources from
214201
ResourceSlice objects, tracking which of those resources have already been
215202
allocated to existing ResourceClaims, and then selecting from those resources
216-
that remain. The exact resources selected are subject to the constraints
217-
provided in any ResourceClaimParameters or ResourceClassParameters associated
218-
with the ResourceClaim.
203+
that remain.
204+
205+
The only kind of supported resources at the moment are devices. A device
206+
instance has a name and several attributes and capacities. Devices get selected
207+
through CEL expressions which check those attributes and capacities. In
208+
addition, the set of selected devices also can be restricted to sets which meet
209+
certain constraints.
219210
220211
The chosen resource is recorded in the ResourceClaim status together with any
221-
vendor-specific parameters, so when a pod is about to start on a node, the
212+
vendor-specific configuration, so when a pod is about to start on a node, the
222213
resource driver on the node has all the information it needs to prepare the
223214
resource.
224215
@@ -279,21 +270,25 @@ the `.spec.nodeName` field and to use a node selector instead.
279270
Dynamic resource allocation is an *alpha feature* and only enabled when the
280271
`DynamicResourceAllocation` [feature
281272
gate](/docs/reference/command-line-tools-reference/feature-gates/) and the
282-
`resource.k8s.io/v1alpha2` {{< glossary_tooltip text="API group"
273+
`resource.k8s.io/v1alpha3` {{< glossary_tooltip text="API group"
283274
term_id="api-group" >}} are enabled. For details on that, see the
284275
`--feature-gates` and `--runtime-config` [kube-apiserver
285276
parameters](/docs/reference/command-line-tools-reference/kube-apiserver/).
286277
kube-scheduler, kube-controller-manager and kubelet also need the feature gate.
287278

279+
When a resource driver uses a control plane controller, then the
280+
`DRAControlPlaneController` feature gate has to be enabled in addition to
281+
`DynamicResourceAllocation`.
282+
288283
A quick check whether a Kubernetes cluster supports the feature is to list
289284
ResourceClass objects with:
290285

291286
```shell
292-
kubectl get resourceclasses
287+
kubectl get deviceclasses
293288
```
294289

295290
If your cluster supports dynamic resource allocation, the response is either a
296-
list of ResourceClass objects or:
291+
list of DeviceClass objects or:
297292

298293
```
299294
No resources found
@@ -302,9 +297,14 @@ No resources found
302297
If not supported, this error is printed instead:
303298

304299
```
305-
error: the server doesn't have a resource type "resourceclasses"
300+
error: the server doesn't have a resource type "deviceclasses"
306301
```
307302

303+
A control plane controller is supported when it is possible to create a
304+
ResourceClaim where the `spec.controller` field is set. When the
305+
`DRAControlPlaneController` feature is disabled, that field automatically
306+
gets cleared when storing the ResourceClaim.
307+
308308
The default configuration of kube-scheduler enables the "DynamicResources"
309309
plugin if and only if the feature gate is enabled and when using
310310
the v1 configuration API. Custom configurations may have to be modified to
@@ -316,5 +316,6 @@ be installed. Please refer to the driver's documentation for details.
316316
## {{% heading "whatsnext" %}}
317317

318318
- For more information on the design, see the
319-
[Dynamic Resource Allocation KEP](https://github.com/kubernetes/enhancements/blob/master/keps/sig-node/3063-dynamic-resource-allocation/README.md)
320-
and the [Structured Parameters KEP](https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/4381-dra-structured-parameters).
319+
[Structured Parameters with Structured Parameters](https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/4381-dra-structured-parameters)
320+
and the
321+
[Dynamic Resource Allocation with Control Plane Controller](https://github.com/kubernetes/enhancements/blob/master/keps/sig-node/3063-dynamic-resource-allocation/README.md) KEPs.
Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
---
2+
title: DRAControlPlaneController
3+
content_type: feature_gate
4+
_build:
5+
list: never
6+
render: false
7+
8+
stages:
9+
- stage: alpha
10+
defaultValue: false
11+
fromVersion: "1.26"
12+
---
13+
Enables support for resources with custom parameters and a lifecycle
14+
that is independent of a Pod. Allocation of resources is handled
15+
by a resource driver's control plane controller.

content/en/docs/reference/command-line-tools-reference/feature-gates/dynamic-resource-allocation.md

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,8 @@ _build:
88
stages:
99
- stage: alpha
1010
defaultValue: false
11-
fromVersion: "1.26"
11+
fromVersion: "1.30"
1212
---
1313
Enables support for resources with custom parameters and a lifecycle
14-
that is independent of a Pod.
14+
that is independent of a Pod. Allocation of resources is handled
15+
by the Kubernetes scheduler based on "structured parameters".

0 commit comments

Comments
 (0)