Skip to content

Commit 72beda6

Browse files
authored
Merge pull request #51603 from windsonsea/assces
[zh] Add assign-resources/allocate-devices-dra.md
2 parents 18f8762 + 4dfe69f commit 72beda6

File tree

5 files changed

+395
-0
lines changed

5 files changed

+395
-0
lines changed
Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
---
2+
title: "向 Pod 和容器分配设备"
3+
description: 向你的 Kubernetes 工作负载分配基础设施资源。
4+
weight: 30
5+
---
6+
<!--
7+
title: "Assign Devices to Pods and Containers"
8+
description: Assign infrastructure resources to your Kubernetes workloads.
9+
weight: 30
10+
-->
Lines changed: 319 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,319 @@
1+
---
2+
title: 使用 DRA 为工作负载分配设备
3+
content_type: task
4+
min-kubernetes-server-version: v1.32
5+
weight: 20
6+
---
7+
<!--
8+
title: Allocate Devices to Workloads with DRA
9+
content_type: task
10+
min-kubernetes-server-version: v1.32
11+
weight: 20
12+
-->
13+
14+
{{< feature-state feature_gate_name="DynamicResourceAllocation" >}}
15+
16+
<!-- overview -->
17+
18+
<!--
19+
This page shows you how to allocate devices to your Pods by using
20+
_dynamic resource allocation (DRA)_. These instructions are for workload
21+
operators. Before reading this page, familiarize yourself with how DRA works and
22+
with DRA terminology like
23+
{{< glossary_tooltip text="ResourceClaims" term_id="resourceclaim" >}} and
24+
{{< glossary_tooltip text="ResourceClaimTemplates" term_id="resourceclaimtemplate" >}}.
25+
For more information, see
26+
[Dynamic Resource Allocation (DRA)](/docs/concepts/scheduling-eviction/dynamic-resource-allocation/).
27+
-->
28+
本文介绍如何使用**动态资源分配(DRA)**为 Pod 分配设备。
29+
这些指示说明面向工作负载运维人员。在阅读本文之前,请先了解 DRA 的工作原理以及相关术语,例如
30+
{{< glossary_tooltip text="ResourceClaims" term_id="resourceclaim" >}} 和
31+
{{< glossary_tooltip text="ResourceClaimTemplates" term_id="resourceclaimtemplate" >}}。
32+
更多信息参阅[动态资源分配(DRA)](/zh-cn/docs/concepts/scheduling-eviction/dynamic-resource-allocation/)
33+
34+
<!-- body -->
35+
36+
<!--
37+
## About device allocation with DRA {#about-device-allocation-dra}
38+
39+
As a workload operator, you can _claim_ devices for your workloads by creating
40+
ResourceClaims or ResourceClaimTemplates. When you deploy your workload,
41+
Kubernetes and the device drivers find available devices, allocate them to your
42+
Pods, and place the Pods on nodes that can access those devices.
43+
-->
44+
## 关于使用 DRA 分配设备 {#about-device-allocation-dra}
45+
46+
作为工作负载运维人员,你可以通过创建 ResourceClaim 或 ResourceClaimTemplate
47+
**申领**工作负载所需的设备。当你部署工作负载时,Kubernetes 和设备驱动会找到可用的设备,
48+
将其分配给 Pod,并将 Pod 调度到可访问这些设备的节点上。
49+
50+
<!-- prerequisites -->
51+
52+
## {{% heading "prerequisites" %}}
53+
54+
{{< include "task-tutorial-prereqs.md" >}} {{< version-check >}}
55+
56+
<!--
57+
* Ensure that your cluster admin has set up DRA, attached devices, and installed
58+
drivers. For more information, see
59+
[Set Up DRA in a Cluster](/docs/tasks/configure-pod-container/assign-resources/set-up-dra-cluster).
60+
-->
61+
* 请确保集群管理员已设置好 DRA,挂接了设备并安装了驱动程序。
62+
详情请参见[在集群中设置 DRA](/zh-cn/docs/tasks/configure-pod-container/assign-resources/set-up-dra-cluster)
63+
64+
<!-- steps -->
65+
66+
<!--
67+
## Identify devices to claim {#identify-devices}
68+
69+
Your cluster administrator or the device drivers create
70+
_{{< glossary_tooltip term_id="deviceclass" text="DeviceClasses" >}}_ that
71+
define categories of devices. You can claim devices by using
72+
{{< glossary_tooltip term_id="cel" >}} to filter for specific device properties.
73+
74+
Get a list of DeviceClasses in the cluster:、
75+
-->
76+
## 寻找可申领的设备 {#identify-devices}
77+
78+
你的集群管理员或设备驱动程序会创建定义设备类别的
79+
{{< glossary_tooltip term_id="deviceclass" text="DeviceClass" >}}。你可以使用
80+
{{< glossary_tooltip term_id="cel" >}} 表达式筛选特定的设备属性,从而申领设备。
81+
82+
获取集群中的 DeviceClass 列表:
83+
84+
```shell
85+
kubectl get deviceclasses
86+
```
87+
88+
<!--
89+
The output is similar to the following:
90+
-->
91+
输出类似如下:
92+
93+
```
94+
NAME AGE
95+
driver.example.com 16m
96+
```
97+
98+
<!--
99+
If you get a permission error, you might not have access to get DeviceClasses.
100+
Check with your cluster administrator or with the driver provider for available
101+
device properties.
102+
-->
103+
如果你遇到权限错误,你可能无权获取 DeviceClass。
104+
请与你的集群管理员或驱动提供商联系,了解可用的设备属性。
105+
106+
<!--
107+
## Claim resources {#claim-resources}
108+
109+
You can request resources from a DeviceClass by using
110+
{{< glossary_tooltip text="ResourceClaims" term_id="resourceclaim" >}}. To
111+
create a ResourceClaim, do one of the following:
112+
-->
113+
## 申领资源 {#claim-resources}
114+
115+
你可以通过
116+
{{< glossary_tooltip text="ResourceClaims" term_id="resourceclaim" >}}
117+
请求某个 DeviceClass 的资源。要创建 ResourceClaim,可以采用以下方式之一:
118+
119+
<!--
120+
* Manually create a ResourceClaim if you want multiple Pods to share access to
121+
the same devices, or if you want a claim to exist beyond the lifetime of a
122+
Pod.
123+
* Use a
124+
{{< glossary_tooltip text="ResourceClaimTemplate" term_id="resourceclaimtemplate" >}}
125+
to let Kubernetes generate and manage per-Pod ResourceClaims. Create a
126+
ResourceClaimTemplate if you want every Pod to have access to separate devices
127+
that have similar configurations. For example, you might want simultaneous
128+
access to devices for Pods in a Job that uses
129+
[parallel execution](/docs/concepts/workloads/controllers/job/#parallel-jobs).
130+
-->
131+
* 手动创建 ResourceClaim,如果你希望多个 Pod 共享相同设备,或希望申领在 Pod 生命期结束后仍然存在。
132+
* 使用
133+
{{< glossary_tooltip text="ResourceClaimTemplate" term_id="resourceclaimtemplate" >}},
134+
让 Kubernetes 为每个 Pod 生成并管理 ResourceClaim。如果你希望每个 Pod
135+
访问独立的、具有类似配置的设备,你可以创建 ResourceClaimTemplate。例如,
136+
在使用[并行执行](/zh-cn/docs/concepts/workloads/controllers/job/#parallel-jobs)
137+
Job 中,你可能希望多个 Pod 同时访问设备。
138+
139+
<!--
140+
If you directly reference a specific ResourceClaim in a Pod, that ResourceClaim
141+
must already exist in the cluster. If a referenced ResourceClaim doesn't exist,
142+
the Pod remains in a pending state until the ResourceClaim is created. You can
143+
reference an auto-generated ResourceClaim in a Pod, but this isn't recommended
144+
because auto-generated ResourceClaims are bound to the lifetime of the Pod that
145+
triggered the generation.
146+
147+
To create a workload that claims resources, select one of the following options:
148+
-->
149+
如果你在 Pod 中直接引用了特定 ResourceClaim,该 ResourceClaim 必须已存在于集群中。否则,
150+
Pod 会保持在 Pending 状态,直到申领被创建。你可以在 Pod 中引用自动生成的 ResourceClaim,
151+
但不推荐这样做,因为自动生成的 ResourceClaim 的生命期被绑定到了触发生成它的 Pod。
152+
153+
要创建申领资源的工作负载,请选择以下选项之一:
154+
155+
{{< tabs name="claim-resources" >}}
156+
{{% tab name="ResourceClaimTemplate" %}}
157+
158+
<!--
159+
Review the following example manifest:
160+
-->
161+
查看以下示例清单:
162+
163+
{{% code_sample file="dra/resourceclaimtemplate.yaml" %}}
164+
165+
<!--
166+
This manifest creates a ResourceClaimTemplate that requests devices in the
167+
`example-device-class` DeviceClass that match both of the following parameters:
168+
169+
* Devices that have a `driver.example.com/type` attribute with a value of
170+
`gpu`.
171+
* Devices that have `64Gi` of capacity.
172+
173+
To create the ResourceClaimTemplate, run the following command:
174+
-->
175+
此清单会创建一个 ResourceClaimTemplate,它请求属于 `example-device-class`
176+
DeviceClass、且同时满足以下两个参数的设备:
177+
178+
* 属性 `driver.example.com/type` 的值为 `gpu`
179+
* 容量为 `64Gi`
180+
181+
创建 ResourceClaimTemplate 的命令如下:
182+
183+
```shell
184+
kubectl apply -f https://k8s.io/examples/dra/resourceclaimtemplate.yaml
185+
```
186+
187+
{{% /tab %}}
188+
{{% tab name="ResourceClaim" %}}
189+
190+
<!--
191+
Review the following example manifest:
192+
-->
193+
查看以下示例清单:
194+
195+
{{% code_sample file="dra/resourceclaim.yaml" %}}
196+
197+
<!--
198+
This manifest creates ResourceClaim that requests devices in the
199+
`example-device-class` DeviceClass that match both of the following parameters:
200+
201+
* Devices that have a `driver.example.com/type` attribute with a value of
202+
`gpu`.
203+
* Devices that have `64Gi` of capacity.
204+
205+
To create the ResourceClaim, run the following command:
206+
-->
207+
此清单会创建一个 ResourceClaim,请求属于 `example-device-class`
208+
DeviceClass、且同时满足以下两个参数的设备:
209+
210+
* 属性 `driver.example.com/type` 的值为 `gpu`
211+
* 容量为 `64Gi`
212+
213+
创建 ResourceClaim 的命令如下:
214+
215+
```shell
216+
kubectl apply -f https://k8s.io/examples/dra/resourceclaim.yaml
217+
```
218+
219+
{{% /tab %}}
220+
{{< /tabs >}}
221+
222+
<!--
223+
## Request devices in workloads using DRA {#request-devices-workloads}
224+
225+
To request device allocation, specify a ResourceClaim or a ResourceClaimTemplate
226+
in the `resourceClaims` field of the Pod specification. Then, request a specific
227+
claim by name in the `resources.claims` field of a container in that Pod.
228+
You can specify multiple entries in the `resourceClaims` field and use specific
229+
claims in different containers.
230+
231+
1. Review the following example Job:
232+
-->
233+
## 使用 DRA 在工作负载中请求设备 {#request-devices-workloads}
234+
235+
要请求设备分配,请在 Pod 规约的 `resourceClaims` 字段中指定 ResourceClaim
236+
或 ResourceClaimTemplate,然后在容器的 `resources.claims` 字段中按名称请求具体的资源申领。
237+
你可以在 `resourceClaims` 中列出多个条目,并在不同容器中使用特定的申领。
238+
239+
1. 查看以下 Job 示例:
240+
241+
{{% code_sample file="dra/dra-example-job.yaml" %}}
242+
243+
<!--
244+
Each Pod in this Job has the following properties:
245+
246+
* Makes a ResourceClaimTemplate named `separate-gpu-claim` and a
247+
ResourceClaim named `shared-gpu-claim` available to containers.
248+
* Runs the following containers:
249+
* `container0` requests the devices from the `separate-gpu-claim`
250+
ResourceClaimTemplate.
251+
* `container1` and `container2` share access to the devices from the
252+
`shared-gpu-claim` ResourceClaim.
253+
-->
254+
255+
此 Job 中的每个 Pod 具备以下属性:
256+
257+
* 提供名为 `separate-gpu-claim` 的 ResourceClaimTemplate 和名为
258+
`shared-gpu-claim` 的 ResourceClaim 给容器使用。
259+
* 运行以下容器:
260+
261+
* `container0` 请求 `separate-gpu-claim` ResourceClaimTemplate 中定义的设备。
262+
* `container1``container2` 共享对 `shared-gpu-claim` ResourceClaim 中设备的访问。
263+
264+
<!--
265+
1. Create the Job:
266+
-->
267+
2. 创建 Job:
268+
269+
```shell
270+
kubectl apply -f https://k8s.io/examples/dra/dra-example-job.yaml
271+
```
272+
273+
<!--
274+
## Clean up {#clean-up}
275+
276+
To delete the Kubernetes objects that you created in this task, follow these
277+
steps:
278+
279+
1. Delete the example Job:
280+
-->
281+
## 清理 {#clean-up}
282+
283+
要删除本任务中创建的 Kubernetes 对象,请按照以下步骤操作:
284+
285+
1. 删除示例 Job:
286+
287+
```shell
288+
kubectl delete -f https://k8s.io/examples/dra/dra-example-job.yaml
289+
```
290+
291+
<!--
292+
1. To delete your resource claims, run one of the following commands:
293+
294+
* Delete the ResourceClaimTemplate:
295+
-->
296+
2. 运行以下其中一条命令来删除你的资源申领:
297+
298+
* 删除 ResourceClaimTemplate:
299+
300+
```shell
301+
kubectl delete -f https://k8s.io/examples/dra/resourceclaimtemplate.yaml
302+
```
303+
304+
<!--
305+
* Delete the ResourceClaim:
306+
-->
307+
308+
* 删除 ResourceClaim:
309+
310+
```shell
311+
kubectl delete -f https://k8s.io/examples/dra/resourceclaim.yaml
312+
```
313+
314+
## {{% heading "whatsnext" %}}
315+
316+
<!--
317+
* [Learn more about DRA](/docs/concepts/scheduling-eviction/dynamic-resource-allocation)
318+
-->
319+
* [进一步了解 DRA](/zh-cn/docs/concepts/scheduling-eviction/dynamic-resource-allocation)
Lines changed: 34 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,34 @@
1+
apiVersion: batch/v1
2+
kind: Job
3+
metadata:
4+
name: example-dra-job
5+
spec:
6+
completions: 10
7+
parallelism: 2
8+
template:
9+
spec:
10+
restartPolicy: Never
11+
containers:
12+
- name: container0
13+
image: ubuntu:24.04
14+
command: ["sleep", "9999"]
15+
resources:
16+
claims:
17+
- name: separate-gpu-claim
18+
- name: container1
19+
image: ubuntu:24.04
20+
command: ["sleep", "9999"]
21+
resources:
22+
claims:
23+
- name: shared-gpu-claim
24+
- name: container2
25+
image: ubuntu:24.04
26+
command: ["sleep", "9999"]
27+
resources:
28+
claims:
29+
- name: shared-gpu-claim
30+
resourceClaims:
31+
- name: separate-gpu-claim
32+
resourceClaimTemplateName: example-resource-claim-template
33+
- name: shared-gpu-claim
34+
resourceClaimName: example-resource-claim
Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,16 @@
1+
apiVersion: resource.k8s.io/v1beta2
2+
kind: ResourceClaim
3+
metadata:
4+
name: example-resource-claim
5+
spec:
6+
devices:
7+
requests:
8+
- name: single-gpu-claim
9+
exactly:
10+
deviceClassName: example-device-class
11+
allocationMode: All
12+
selectors:
13+
- cel:
14+
expression: |-
15+
device.attributes["driver.example.com"].type == "gpu" &&
16+
device.capacity["driver.example.com"].memory == quantity("64Gi")
Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,16 @@
1+
apiVersion: resource.k8s.io/v1beta2
2+
kind: ResourceClaimTemplate
3+
metadata:
4+
name: example-resource-claim-template
5+
spec:
6+
spec:
7+
devices:
8+
requests:
9+
- name: gpu-claim
10+
exactly:
11+
deviceClassName: example-device-class
12+
selectors:
13+
- cel:
14+
expression: |-
15+
device.attributes["driver.example.com"].type == "gpu" &&
16+
device.capacity["driver.example.com"].memory == quantity("64Gi")

0 commit comments

Comments
 (0)