|
| 1 | +--- |
| 2 | +title: "在集群中设置 DRA" |
| 3 | +content_type: task |
| 4 | +min-kubernetes-server-version: v1.32 |
| 5 | +weight: 10 |
| 6 | +--- |
| 7 | +<!-- |
| 8 | +title: "Set Up DRA in a Cluster" |
| 9 | +content_type: task |
| 10 | +min-kubernetes-server-version: v1.32 |
| 11 | +weight: 10 |
| 12 | +--> |
| 13 | + |
| 14 | +{{< feature-state feature_gate_name="DynamicResourceAllocation" >}} |
| 15 | + |
| 16 | +<!-- overview --> |
| 17 | + |
| 18 | +<!-- |
| 19 | +This page shows you how to configure _dynamic resource allocation (DRA)_ in a |
| 20 | +Kubernetes cluster by enabling API groups and configuring classes of devices. |
| 21 | +These instructions are for cluster administrators. |
| 22 | +--> |
| 23 | +本文介绍如何在 Kubernetes 集群中通过启用 API 组并配置设备类别来设置**动态资源分配(DRA)**。 |
| 24 | +这些指示说明适用于集群管理员。 |
| 25 | + |
| 26 | +<!-- body --> |
| 27 | + |
| 28 | +<!-- |
| 29 | +## About DRA {#about-dra} |
| 30 | +--> |
| 31 | +## 关于 DRA {#about-dra} |
| 32 | + |
| 33 | +{{< glossary_definition term_id="dra" length="all" >}} |
| 34 | + |
| 35 | +<!-- |
| 36 | +Ensure that you're familiar with how DRA works and with DRA terminology like |
| 37 | +{{< glossary_tooltip text="DeviceClasses" term_id="deviceclass" >}}, |
| 38 | +{{< glossary_tooltip text="ResourceClaims" term_id="resourceclaim" >}}, and |
| 39 | +{{< glossary_tooltip text="ResourceClaimTemplates" term_id="resourceclaimtemplate" >}}. |
| 40 | +For details, see |
| 41 | +[Dynamic Resource Allocation (DRA)](/docs/concepts/scheduling-eviction/dynamic-resource-allocation/). |
| 42 | +--> |
| 43 | +确保你已了解 DRA 的工作机制及其术语,例如 |
| 44 | +{{< glossary_tooltip text="DeviceClasses" term_id="deviceclass" >}}、 |
| 45 | +{{< glossary_tooltip text="ResourceClaims" term_id="resourceclaim" >}}以及 |
| 46 | +{{< glossary_tooltip text="ResourceClaimTemplates" term_id="resourceclaimtemplate" >}}。 |
| 47 | +更多信息请参见[动态资源分配(DRA)](/zh-cn/docs/concepts/scheduling-eviction/dynamic-resource-allocation/)。 |
| 48 | + |
| 49 | +<!-- prerequisites --> |
| 50 | + |
| 51 | +## {{% heading "prerequisites" %}} |
| 52 | + |
| 53 | +{{< include "task-tutorial-prereqs.md" >}} {{< version-check >}} |
| 54 | + |
| 55 | +<!-- |
| 56 | +* Directly or indirectly attach devices to your cluster. To avoid potential |
| 57 | + issues with drivers, wait until you set up the DRA feature for your |
| 58 | + cluster before you install drivers. |
| 59 | +--> |
| 60 | +* 将设备直接或间接挂接到你的集群中。为避免驱动相关的问题,请在安装驱动之前先完成 DRA 特性的配置。 |
| 61 | + |
| 62 | +<!-- steps --> |
| 63 | + |
| 64 | +<!-- |
| 65 | +## Enable the DRA API groups {#enable-dra} |
| 66 | +
|
| 67 | +To let Kubernetes allocate resources to your Pods with DRA, complete the |
| 68 | +following configuration steps: |
| 69 | +
|
| 70 | +1. Enable the `DynamicResourceAllocation` |
| 71 | + [feature gate](/docs/reference/command-line-tools-reference/feature-gates/) |
| 72 | + on all of the following components: |
| 73 | +--> |
| 74 | +## 启用 DRA API 组 {#enable-dra} |
| 75 | + |
| 76 | +若要让 Kubernetes 能够使用 DRA 为你的 Pod 分配资源,需完成以下配置步骤: |
| 77 | + |
| 78 | +1. 在所有以下组件中启用 `DynamicResourceAllocation` |
| 79 | + [特性门控](/zh-cn/docs/reference/command-line-tools-reference/feature-gates/): |
| 80 | + |
| 81 | + * `kube-apiserver` |
| 82 | + * `kube-controller-manager` |
| 83 | + * `kube-scheduler` |
| 84 | + * `kubelet` |
| 85 | + |
| 86 | +<!-- |
| 87 | +1. Enable the following |
| 88 | + {{< glossary_tooltip text="API groups" term_id="api-group" >}}: |
| 89 | +
|
| 90 | + * `resource.k8s.io/v1beta1`: required for DRA to function. |
| 91 | + * `resource.k8s.io/v1beta2`: optional, recommended improvements to the user |
| 92 | + experience. |
| 93 | + |
| 94 | + For more information, see |
| 95 | + [Enabling or disabling API groups](/docs/reference/using-api/#enabling-or-disabling). |
| 96 | +--> |
| 97 | +2. 启用以下 {{< glossary_tooltip text="API 组" term_id="api-group" >}}: |
| 98 | + |
| 99 | + * `resource.k8s.io/v1beta1`:DRA 所必需。 |
| 100 | + * `resource.k8s.io/v1beta2`:可选,推荐启用以提升用户体验。 |
| 101 | + |
| 102 | + 更多信息请参阅[启用或禁用 API 组](/zh-cn/docs/reference/using-api/#enabling-or-disabling)。 |
| 103 | + |
| 104 | +<!-- |
| 105 | +## Verify that DRA is enabled {#verify} |
| 106 | +
|
| 107 | +To verify that the cluster is configured correctly, try to list DeviceClasses: |
| 108 | +--> |
| 109 | +## 验证是否启用了 DRA {#verify} |
| 110 | + |
| 111 | +若要验证集群是否配置正确,可尝试列出 DeviceClass: |
| 112 | + |
| 113 | +```shell |
| 114 | +kubectl get deviceclasses |
| 115 | +``` |
| 116 | + |
| 117 | +<!-- |
| 118 | +If the component configuration was correct, the output is similar to the |
| 119 | +following: |
| 120 | +--> |
| 121 | +如果组件配置正确,输出类似如下: |
| 122 | + |
| 123 | +``` |
| 124 | +No resources found |
| 125 | +``` |
| 126 | + |
| 127 | +<!-- |
| 128 | +If DRA isn't correctly configured, the output of the preceding command is |
| 129 | +similar to the following: |
| 130 | +--> |
| 131 | +如果 DRA 未正确配置,则上述命令的输出可能如下: |
| 132 | + |
| 133 | +``` |
| 134 | +error: the server doesn't have a resource type "deviceclasses" |
| 135 | +``` |
| 136 | + |
| 137 | +<!-- |
| 138 | +Try the following troubleshooting steps: |
| 139 | +
|
| 140 | +1. Ensure that the `kube-scheduler` component has the `DynamicResourceAllocation` |
| 141 | + feature gate enabled *and* uses the |
| 142 | + [v1 configuration API](/docs/reference/config-api/kube-scheduler-config.v1/). |
| 143 | + If you use a custom configuration, you might need to perform additional steps |
| 144 | + to enable the `DynamicResource` plugin. |
| 145 | +1. Restart the `kube-apiserver` component and the `kube-controller-manager` |
| 146 | + component to propagate the API group changes. |
| 147 | +--> |
| 148 | +你可以尝试以下排查步骤: |
| 149 | + |
| 150 | +1. 确保 `kube-scheduler` 组件已启用 `DynamicResourceAllocation` 特性门控,并且使用的是 |
| 151 | + [v1 配置 API](/zh-cn/docs/reference/config-api/kube-scheduler-config.v1/)。 |
| 152 | + 如果你使用自定义配置,你可能还需额外启用 `DynamicResource` 插件。 |
| 153 | + |
| 154 | +2. 重启 `kube-apiserver` 和 `kube-controller-manager` 组件,以传播 API 组变更。 |
| 155 | + |
| 156 | +<!-- |
| 157 | +## Install device drivers {#install-drivers} |
| 158 | +
|
| 159 | +After you enable DRA for your cluster, you can install the drivers for your |
| 160 | +attached devices. For instructions, check the documentation of your device |
| 161 | +owner or the project that maintains the device drivers. The drivers that you |
| 162 | +install must be compatible with DRA. |
| 163 | +
|
| 164 | +To verify that your installed drivers are working as expected, list |
| 165 | +ResourceSlices in your cluster: |
| 166 | +--> |
| 167 | +## 安装设备驱动 {#install-drivers} |
| 168 | + |
| 169 | +你启用集群的 DRA 特性后,你可以安装所挂接设备的驱动。 |
| 170 | +安装方式请参见设备所有者或驱动维护方提供的文档。你安装的驱动必须与 DRA 兼容。 |
| 171 | + |
| 172 | +若要验证驱动是否正常工作,可列出集群中的 ResourceSlice: |
| 173 | + |
| 174 | +```shell |
| 175 | +kubectl get resourceslices |
| 176 | +``` |
| 177 | + |
| 178 | +<!-- |
| 179 | +The output is similar to the following: |
| 180 | +--> |
| 181 | +输出示例如下: |
| 182 | + |
| 183 | +``` |
| 184 | +NAME NODE DRIVER POOL AGE |
| 185 | +cluster-1-device-pool-1-driver.example.com-lqx8x cluster-1-node-1 driver.example.com cluster-1-device-pool-1-r1gc 7s |
| 186 | +cluster-1-device-pool-2-driver.example.com-29t7b cluster-1-node-2 driver.example.com cluster-1-device-pool-2-446z 8s |
| 187 | +``` |
| 188 | + |
| 189 | +<!-- |
| 190 | +## Create DeviceClasses {#create-deviceclasses} |
| 191 | +
|
| 192 | +You can define categories of devices that your application operators can |
| 193 | +claim in workloads by creating |
| 194 | +{{< glossary_tooltip text="DeviceClasses" term_id="deviceclass" >}}. Some device |
| 195 | +driver providers might also instruct you to create DeviceClasses during driver |
| 196 | +installation. |
| 197 | +--> |
| 198 | +## 创建 DeviceClass {#create-deviceclasses} |
| 199 | + |
| 200 | +你可以通过创建 |
| 201 | +{{< glossary_tooltip text="DeviceClasses" term_id="deviceclass" >}} |
| 202 | +定义设备的分类,供应用运维人员在工作负载中申领这些设备。 |
| 203 | +某些设备驱动提供方也可能在驱动安装过程中要求你创建 DeviceClass。 |
| 204 | + |
| 205 | +<!-- |
| 206 | +The ResourceSlices that your driver publishes contain information about the |
| 207 | +devices that the driver manages, such as capacity, metadata, and attributes. You |
| 208 | +can use {{< glossary_tooltip term_id="cel" >}} to filter for properties in your |
| 209 | +DeviceClasses, which can make finding devices easier for your workload |
| 210 | +operators. |
| 211 | +
|
| 212 | +1. To find the device properties that you can select in DeviceClasses by using |
| 213 | + CEL expressions, get the specification of a ResourceSlice: |
| 214 | +--> |
| 215 | +你的驱动所发布的 ResourceSlice 中包含了设备的相关信息,例如容量、元数据和属性。你可以使用 |
| 216 | +{{< glossary_tooltip term_id="cel" >}} 表达式按 DeviceClass 中的属性进行筛选, |
| 217 | +从而帮助工作负载运维人员更轻松地找到合适的设备。 |
| 218 | + |
| 219 | +1. 若要查看可通过 CEL 表达式在 DeviceClass 中选择的设备属性,你可以查看某个 ResourceSlice 的规约: |
| 220 | + |
| 221 | + ```shell |
| 222 | + kubectl get resourceslice <resourceslice-name> -o yaml |
| 223 | + ``` |
| 224 | + |
| 225 | + <!-- |
| 226 | + The output is similar to the following: |
| 227 | + --> |
| 228 | + |
| 229 | + 输出类似如下: |
| 230 | + |
| 231 | + <!-- |
| 232 | + # lines omitted for clarity |
| 233 | + --> |
| 234 | + |
| 235 | + ```yaml |
| 236 | + apiVersion: resource.k8s.io/v1beta1 |
| 237 | + kind: ResourceSlice |
| 238 | + # 为简洁省略部分内容 |
| 239 | + spec: |
| 240 | + devices: |
| 241 | + - basic: |
| 242 | + attributes: |
| 243 | + type: |
| 244 | + string: gpu |
| 245 | + capacity: |
| 246 | + memory: |
| 247 | + value: 64Gi |
| 248 | + name: gpu-0 |
| 249 | + - basic: |
| 250 | + attributes: |
| 251 | + type: |
| 252 | + string: gpu |
| 253 | + capacity: |
| 254 | + memory: |
| 255 | + value: 64Gi |
| 256 | + name: gpu-1 |
| 257 | + driver: driver.example.com |
| 258 | + nodeName: cluster-1-node-1 |
| 259 | + # 为简洁省略部分内容 |
| 260 | + ``` |
| 261 | + |
| 262 | + <!-- |
| 263 | + You can also check the driver provider's documentation for available |
| 264 | + properties and values. |
| 265 | + --> |
| 266 | + |
| 267 | + 你也可以查阅驱动提供商的文档,了解可用的属性和对应值。 |
| 268 | + |
| 269 | +<!-- |
| 270 | +1. Review the following example DeviceClass manifest, which selects any device |
| 271 | + that's managed by the `driver.example.com` device driver: |
| 272 | +--> |
| 273 | +2. 查看以下 DeviceClass 示例清单,它选择所有由 `driver.example.com` 设备驱动管理的设备: |
| 274 | + |
| 275 | + {{% code_sample file="dra/deviceclass.yaml" %}} |
| 276 | + |
| 277 | +<!-- |
| 278 | +1. Create the DeviceClass in your cluster: |
| 279 | +--> |
| 280 | +3. 在集群中创建 DeviceClass: |
| 281 | + |
| 282 | + ```shell |
| 283 | + kubectl apply -f https://k8s.io/examples/dra/deviceclass.yaml |
| 284 | + ``` |
| 285 | + |
| 286 | +<!-- |
| 287 | +## Clean up {#clean-up} |
| 288 | +
|
| 289 | +To delete the DeviceClass that you created in this task, run the following |
| 290 | +command: |
| 291 | +--> |
| 292 | +## 清理 {#clean-up} |
| 293 | + |
| 294 | +要删除本任务中创建的 DeviceClass,运行以下命令: |
| 295 | + |
| 296 | +```shell |
| 297 | +kubectl delete -f https://k8s.io/examples/dra/deviceclass.yaml |
| 298 | +``` |
| 299 | + |
| 300 | +## {{% heading "whatsnext" %}} |
| 301 | + |
| 302 | +<!-- |
| 303 | +* [Learn more about DRA](/docs/concepts/scheduling-eviction/dynamic-resource-allocation) |
| 304 | +* [Allocate Devices to Workloads with DRA](/docs/tasks/configure-pod-container/assign-resources/allocate-devices-dra) |
| 305 | +--> |
| 306 | +* [进一步了解 DRA](/zh-cn/docs/concepts/scheduling-eviction/dynamic-resource-allocation) |
| 307 | +* [使用 DRA 为工作负载分配设备](/zh-cn/docs/tasks/configure-pod-container/assign-resources/allocate-devices-dra) |
0 commit comments