Skip to content

Commit a5bdf2c

Browse files
authored
Merge pull request #51091 from shannonxtreme/dra-docs-improve
Add foundational docs improvements for DRA
2 parents 36e9647 + 44c0c10 commit a5bdf2c

File tree

16 files changed

+955
-238
lines changed

16 files changed

+955
-238
lines changed

content/en/docs/concepts/scheduling-eviction/dynamic-resource-allocation.md

Lines changed: 327 additions & 238 deletions
Large diffs are not rendered by default.
Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,24 @@
1+
---
2+
title: Common Expression Language
3+
id: cel
4+
date: 2025-06-04
5+
full_link: https://cel.dev
6+
short_description: >
7+
An expression language that's designed to be safe for executing user code.
8+
tags:
9+
- extension
10+
- fundamental
11+
aka:
12+
- CEL
13+
---
14+
A general-purpose expression language that's designed to be fast, portable, and
15+
safe to execute.
16+
17+
<!--more-->
18+
19+
In Kubernetes, CEL can be used to run queries and perform fine-grained
20+
filtering. For example, you can use CEL expressions with
21+
[dynamic admission control](/docs/reference/access-authn-authz/extensible-admission-controllers/)
22+
to filter for specific fields in requests, and with
23+
[dynamic resource allocation (DRA)](/docs/concepts/scheduling-eviction/dynamic-resource-allocation)
24+
to select resources based on specific attributes.
Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,23 @@
1+
---
2+
title: Device
3+
id: device
4+
date: 2025-05-13
5+
short_description: >
6+
Any resource that's directly or indirectly attached your cluster's nodes, like
7+
GPUs or circuit boards.
8+
9+
tags:
10+
- extension
11+
- fundamental
12+
---
13+
One or more
14+
{{< glossary_tooltip text="infrastructure resources" term_id="infrastructure-resource" >}}
15+
that are directly or indirectly attached to your
16+
{{< glossary_tooltip text="nodes" term_id="node" >}}.
17+
18+
<!--more-->
19+
20+
Devices might be commercial products like GPUs, or custom hardware like
21+
[ASIC boards](https://en.wikipedia.org/wiki/Application-specific_integrated_circuit).
22+
Attached devices usually require device drivers that let Kubernetes
23+
{{< glossary_tooltip text="Pods" term_id="pod" >}} access the devices.
Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,23 @@
1+
---
2+
title: DeviceClass
3+
id: deviceclass
4+
date: 2025-05-26
5+
full_link: /docs/concepts/scheduling-eviction/dynamic-resource-allocation/#deviceclass
6+
short_description: >
7+
A category of devices in the cluster. Users can claim specific
8+
devices in a DeviceClass.
9+
tags:
10+
- extension
11+
---
12+
A category of {{< glossary_tooltip text="devices" term_id="device" >}} in the
13+
cluster that can be used with dynamic resource allocation (DRA).
14+
15+
<!--more-->
16+
17+
Administrators or device owners use DeviceClasses to define a set of devices
18+
that can be claimed and used in workloads. Devices are claimed by creating
19+
{{< glossary_tooltip text="ResourceClaims" term_id="resourceclaim" >}}
20+
that filter for specific device parameters in a DeviceClass.
21+
22+
For more information, see
23+
[Dynamic Resource Allocation](/docs/concepts/scheduling-eviction/dynamic-resource-allocation/#deviceclass)
Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,25 @@
1+
---
2+
title: Dynamic Resource Allocation
3+
id: dra
4+
date: 2025-05-13
5+
full_link: /docs/concepts/scheduling-eviction/dynamic-resource-allocation/
6+
short_description: >
7+
A Kubernetes feature for requesting and sharing resources, like hardware
8+
accelerators, among Pods.
9+
10+
aka:
11+
- DRA
12+
tags:
13+
- extension
14+
---
15+
A Kubernetes feature that lets you request and share resources among Pods.
16+
These resources are often attached
17+
{{< glossary_tooltip text="devices" term_id="device" >}} like hardware
18+
accelerators.
19+
20+
<!--more-->
21+
22+
With DRA, device drivers and cluster admins define device _classes_ that are
23+
available to _claim_ in workloads. Kubernetes allocates matching devices to
24+
specific claims and places the corresponding Pods on nodes that can access the
25+
allocated devices.
Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,23 @@
1+
---
2+
title: ResourceClaim
3+
id: resourceclaim
4+
date: 2025-05-26
5+
full_link: /docs/concepts/scheduling-eviction/dynamic-resource-allocation/#resourceclaims-templates
6+
short_description: >
7+
Describes the resources that a workload needs, such as devices. ResourceClaims
8+
can request devices from DeviceClasses.
9+
10+
tags:
11+
- workload
12+
---
13+
Describes the resources that a workload needs, such as
14+
{{< glossary_tooltip text="devices" term_id="device" >}}. ResourceClaims are
15+
used in
16+
[dynamic resource allocation (DRA)](/docs/concepts/scheduling-eviction/dynamic-resource-allocation/)
17+
to provide Pods with access to a specific resource.
18+
19+
<!--more-->
20+
21+
ResourceClaims can be created by workload operators or generated by Kubernetes
22+
based on a
23+
{{< glossary_tooltip text="ResourceClaimTemplate" term_id="resourceclaimtemplate" >}}.
Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,24 @@
1+
---
2+
title: ResourceClaimTemplate
3+
id: resourceclaimtemplate
4+
date: 2025-05-26
5+
full_link: /docs/concepts/scheduling-eviction/dynamic-resource-allocation/#resourceclaims-templates
6+
short_description: >
7+
Defines a template for Kubernetes to create ResourceClaims. Used to provide
8+
per-Pod access to separate, similar resources.
9+
10+
tags:
11+
- workload
12+
---
13+
Defines a template that Kubernetes uses to create
14+
{{< glossary_tooltip text="ResourceClaims" term_id="resourceclaim" >}}.
15+
ResourceClaimTemplates are used in
16+
[dynamic resource allocation (DRA)](/docs/concepts/scheduling-eviction/dynamic-resource-allocation/)
17+
to provide _per-Pod access to separate, similar resources_.
18+
19+
<!--more-->
20+
21+
When a ResourceClaimTemplate is referenced in a workload specification,
22+
Kubernetes automatically creates ResourceClaim objects based on the template.
23+
Each ResourceClaim is bound to a specific Pod. When the Pod terminates,
24+
Kubernetes deletes the corresponding ResourceClaim.
Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,24 @@
1+
---
2+
title: ResourceSlice
3+
id: resourceslice
4+
date: 2025-05-26
5+
full_link: /docs/reference/kubernetes-api/workload-resources/resource-slice-v1beta1/
6+
short_description: >
7+
Represents one or more infrastructure resources, like devices, in a pool of
8+
similar resources.
9+
10+
tags:
11+
- workload
12+
---
13+
Represents one or more infrastructure resources, such as
14+
{{< glossary_tooltip text="devices" term_id="device" >}}, that are attached to
15+
nodes. Drivers create and manage ResourceSlices in the cluster. ResourceSlices
16+
are used for
17+
[dynamic resource allocation (DRA)](/docs/concepts/scheduling-eviction/dynamic-resource-allocation/).
18+
19+
<!--more-->
20+
21+
When a {{< glossary_tooltip text="ResourceClaim" term_id="resourceclaim" >}} is
22+
created, Kubernetes uses ResourceSlices to find nodes that have access to
23+
resources that can satisfy the claim. Kubernetes allocates resources to the
24+
ResourceClaim and schedules the Pod onto a node that can access the resources.
Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
---
2+
title: "Assign Devices to Pods and Containers"
3+
description: Assign infrastructure resources to your Kubernetes workloads.
4+
weight: 30
5+
---
6+
Lines changed: 186 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,186 @@
1+
---
2+
title: Allocate Devices to Workloads with DRA
3+
content_type: task
4+
min-kubernetes-server-version: v1.32
5+
weight: 20
6+
---
7+
{{< feature-state feature_gate_name="DynamicResourceAllocation" >}}
8+
9+
<!-- overview -->
10+
11+
This page shows you how to allocate devices to your Pods by using
12+
_dynamic resource allocation (DRA)_. These instructions are for workload
13+
operators. Before reading this page, familiarize yourself with how DRA works and
14+
with DRA terminology like
15+
{{< glossary_tooltip text="ResourceClaims" term_id="resourceclaim" >}} and
16+
{{< glossary_tooltip text="ResourceClaimTemplates" term_id="resourceclaimtemplate" >}}.
17+
For more information, see
18+
[Dynamic Resource Allocation (DRA)](/docs/concepts/scheduling-eviction/dynamic-resource-allocation/).
19+
20+
<!-- body -->
21+
22+
## About device allocation with DRA {#about-device-allocation-dra}
23+
24+
As a workload operator, you can _claim_ devices for your workloads by creating
25+
ResourceClaims or ResourceClaimTemplates. When you deploy your workload,
26+
Kubernetes and the device drivers find available devices, allocate them to your
27+
Pods, and place the Pods on nodes that can access those devices.
28+
29+
<!-- prerequisites -->
30+
31+
## {{% heading "prerequisites" %}}
32+
33+
{{< include "task-tutorial-prereqs.md" >}} {{< version-check >}}
34+
35+
* Ensure that your cluster admin has set up DRA, attached devices, and installed
36+
drivers. For more information, see
37+
[Set Up DRA in a Cluster](/docs/tasks/configure-pod-container/assign-resources/set-up-dra-cluster).
38+
39+
<!-- steps -->
40+
41+
## Identify devices to claim {#identify-devices}
42+
43+
Your cluster administrator or the device drivers create
44+
_{{< glossary_tooltip term_id="deviceclass" text="DeviceClasses" >}}_ that
45+
define categories of devices. You can claim devices by using
46+
{{< glossary_tooltip term_id="cel" >}} to filter for specific device properties.
47+
48+
Get a list of DeviceClasses in the cluster:
49+
50+
```shell
51+
kubectl get deviceclasses
52+
```
53+
The output is similar to the following:
54+
55+
```
56+
NAME AGE
57+
driver.example.com 16m
58+
```
59+
If you get a permission error, you might not have access to get DeviceClasses.
60+
Check with your cluster administrator or with the driver provider for available
61+
device properties.
62+
63+
## Claim resources {#claim-resources}
64+
65+
You can request resources from a DeviceClass by using
66+
{{< glossary_tooltip text="ResourceClaims" term_id="resourceclaim" >}}. To
67+
create a ResourceClaim, do one of the following:
68+
69+
* Manually create a ResourceClaim if you want multiple Pods to share access to
70+
the same devices, or if you want a claim to exist beyond the lifetime of a
71+
Pod.
72+
* Use a
73+
{{< glossary_tooltip text="ResourceClaimTemplate" term_id="resourceclaimtemplate" >}}
74+
to let Kubernetes generate and manage per-Pod ResourceClaims. Create a
75+
ResourceClaimTemplate if you want every Pod to have access to separate devices
76+
that have similar configurations. For example, you might want simultaneous
77+
access to devices for Pods in a Job that uses
78+
[parallel execution](/docs/concepts/workloads/controllers/job/#parallel-jobs).
79+
80+
If you directly reference a specific ResourceClaim in a Pod, that ResourceClaim
81+
must already exist in the cluster. If a referenced ResourceClaim doesn't exist,
82+
the Pod remains in a pending state until the ResourceClaim is created. You can
83+
reference an auto-generated ResourceClaim in a Pod, but this isn't recommended
84+
because auto-generated ResourceClaims are bound to the lifetime of the Pod that
85+
triggered the generation.
86+
87+
To create a workload that claims resources, select one of the following options:
88+
89+
{{< tabs name="claim-resources" >}}
90+
{{% tab name="ResourceClaimTemplate" %}}
91+
92+
Review the following example manifest:
93+
94+
{{% code_sample file="dra/resourceclaimtemplate.yaml" %}}
95+
96+
This manifest creates a ResourceClaimTemplate that requests devices in the
97+
`example-device-class` DeviceClass that match both of the following parameters:
98+
99+
* Devices that have a `driver.example.com/type` attribute with a value of
100+
`gpu`.
101+
* Devices that have `64Gi` of capacity.
102+
103+
To create the ResourceClaimTemplate, run the following command:
104+
105+
```shell
106+
kubectl apply -f https://k8s.io/examples/dra/resourceclaimtemplate.yaml
107+
```
108+
109+
{{% /tab %}}
110+
{{% tab name="ResourceClaim" %}}
111+
112+
Review the following example manifest:
113+
114+
{{% code_sample file="dra/resourceclaim.yaml" %}}
115+
116+
This manifest creates ResourceClaim that requests devices in the
117+
`example-device-class` DeviceClass that match both of the following parameters:
118+
119+
* Devices that have a `driver.example.com/type` attribute with a value of
120+
`gpu`.
121+
* Devices that have `64Gi` of capacity.
122+
123+
To create the ResourceClaim, run the following command:
124+
125+
```shell
126+
kubectl apply -f https://k8s.io/examples/dra/resourceclaim.yaml
127+
```
128+
129+
{{% /tab %}}
130+
{{< /tabs >}}
131+
132+
## Request devices in workloads using DRA {#request-devices-workloads}
133+
134+
To request device allocation, specify a ResourceClaim or a ResourceClaimTemplate
135+
in the `resourceClaims` field of the Pod specification. Then, request a specific
136+
claim by name in the `resources.claims` field of a container in that Pod.
137+
You can specify multiple entries in the `resourceClaims` field and use specific
138+
claims in different containers.
139+
140+
1. Review the following example Job:
141+
142+
{{% code_sample file="dra/dra-example-job.yaml" %}}
143+
144+
Each Pod in this Job has the following properties:
145+
146+
* Makes a ResourceClaimTemplate named `separate-gpu-claim` and a
147+
ResourceClaim named `shared-gpu-claim` available to containers.
148+
* Runs the following containers:
149+
* `container0` requests the devices from the `separate-gpu-claim`
150+
ResourceClaimTemplate.
151+
* `container1` and `container2` share access to the devices from the
152+
`shared-gpu-claim` ResourceClaim.
153+
154+
1. Create the Job:
155+
156+
```shell
157+
kubectl apply -f https://k8s.io/examples/dra/dra-example-job.yaml
158+
```
159+
160+
## Clean up {#clean-up}
161+
162+
To delete the Kubernetes objects that you created in this task, follow these
163+
steps:
164+
165+
1. Delete the example Job:
166+
167+
```shell
168+
kubectl delete -f https://k8s.io/examples/dra/dra-example-job.yaml
169+
```
170+
171+
1. To delete your resource claims, run one of the following commands:
172+
173+
* Delete the ResourceClaimTemplate:
174+
175+
```shell
176+
kubectl delete -f https://k8s.io/examples/dra/resourceclaimtemplate.yaml
177+
```
178+
* Delete the ResourceClaim:
179+
180+
```shell
181+
kubectl delete -f https://k8s.io/examples/dra/resourceclaim.yaml
182+
```
183+
184+
## {{% heading "whatsnext" %}}
185+
186+
* [Learn more about DRA](/docs/concepts/scheduling-eviction/dynamic-resource-allocation)

0 commit comments

Comments
 (0)