|
1 | 1 | ---
|
2 | 2 | title: "在集群中设置 DRA"
|
3 | 3 | content_type: task
|
4 |
| -min-kubernetes-server-version: v1.32 |
| 4 | +min-kubernetes-server-version: v1.34 |
5 | 5 | weight: 10
|
6 | 6 | ---
|
7 | 7 | <!--
|
8 | 8 | title: "Set Up DRA in a Cluster"
|
9 | 9 | content_type: task
|
10 |
| -min-kubernetes-server-version: v1.32 |
| 10 | +min-kubernetes-server-version: v1.34 |
11 | 11 | weight: 10
|
12 | 12 | -->
|
13 | 13 |
|
@@ -62,44 +62,30 @@ For details, see
|
62 | 62 | <!-- steps -->
|
63 | 63 |
|
64 | 64 | <!--
|
65 |
| -## Enable the DRA API groups {#enable-dra} |
| 65 | +## Optional: enable legacy DRA API groups {#enable-dra} |
66 | 66 |
|
67 |
| -To let Kubernetes allocate resources to your Pods with DRA, complete the |
68 |
| -following configuration steps: |
| 67 | +DRA graduated to stable in Kubernetes 1.34 and is enabled by default. |
| 68 | +Some older DRA drivers or workloads might still need the |
| 69 | +v1beta1 API from Kubernetes 1.30 or v1beta2 from Kubernetes 1.32. |
| 70 | +If and only if support for those is desired, then enable the following |
| 71 | +{{< glossary_tooltip text="API groups" term_id="api-group" >}}: |
69 | 72 |
|
70 |
| -1. Enable the `DynamicResourceAllocation` |
71 |
| - [feature gate](/docs/reference/command-line-tools-reference/feature-gates/) |
72 |
| - on all of the following components: |
73 |
| ---> |
74 |
| -## 启用 DRA API 组 {#enable-dra} |
75 |
| - |
76 |
| -若要让 Kubernetes 能够使用 DRA 为你的 Pod 分配资源,需完成以下配置步骤: |
77 |
| - |
78 |
| -1. 在所有以下组件中启用 `DynamicResourceAllocation` |
79 |
| - [特性门控](/zh-cn/docs/reference/command-line-tools-reference/feature-gates/): |
80 |
| - |
81 |
| - * `kube-apiserver` |
82 |
| - * `kube-controller-manager` |
83 |
| - * `kube-scheduler` |
84 |
| - * `kubelet` |
| 73 | + * `resource.k8s.io/v1beta1` |
| 74 | + * `resource.k8s.io/v1beta2` |
85 | 75 |
|
86 |
| -<!-- |
87 |
| -1. Enable the following |
88 |
| - {{< glossary_tooltip text="API groups" term_id="api-group" >}}: |
89 |
| -
|
90 |
| - * `resource.k8s.io/v1beta1`: required for DRA to function. |
91 |
| - * `resource.k8s.io/v1beta2`: optional, recommended improvements to the user |
92 |
| - experience. |
93 |
| - |
94 |
| - For more information, see |
95 |
| - [Enabling or disabling API groups](/docs/reference/using-api/#enabling-or-disabling). |
| 76 | +For more information, see |
| 77 | +[Enabling or disabling API groups](/docs/reference/using-api/#enabling-or-disabling). |
96 | 78 | -->
|
97 |
| -2. 启用以下 {{< glossary_tooltip text="API 组" term_id="api-group" >}}: |
| 79 | +DRA 在 Kubernetes 1.34 中进阶至 Stable 并默认启用。 |
| 80 | +一些较旧的 DRA 驱动或工作负载可能仍需要 Kubernetes 1.30 的 v1beta1 API |
| 81 | +或 Kubernetes 1.32 的 v1beta2 API。 |
| 82 | +当且仅当需要支持这些时,才启用以下 |
| 83 | +{{< glossary_tooltip text="API 组" term_id="api-group" >}}: |
98 | 84 |
|
99 |
| - * `resource.k8s.io/v1beta1`:DRA 所必需。 |
100 |
| - * `resource.k8s.io/v1beta2`:可选,推荐启用以提升用户体验。 |
| 85 | + * `resource.k8s.io/v1beta1` |
| 86 | + * `resource.k8s.io/v1beta2` |
101 | 87 |
|
102 |
| - 更多信息请参阅[启用或禁用 API 组](/zh-cn/docs/reference/using-api/#enabling-or-disabling)。 |
| 88 | +更多信息请参阅[启用或禁用 API 组](/zh-cn/docs/reference/using-api/#enabling-or-disabling)。 |
103 | 89 |
|
104 | 90 | <!--
|
105 | 91 | ## Verify that DRA is enabled {#verify}
|
@@ -137,21 +123,22 @@ error: the server doesn't have a resource type "deviceclasses"
|
137 | 123 | <!--
|
138 | 124 | Try the following troubleshooting steps:
|
139 | 125 |
|
140 |
| -1. Ensure that the `kube-scheduler` component has the `DynamicResourceAllocation` |
141 |
| - feature gate enabled *and* uses the |
142 |
| - [v1 configuration API](/docs/reference/config-api/kube-scheduler-config.v1/). |
143 |
| - If you use a custom configuration, you might need to perform additional steps |
144 |
| - to enable the `DynamicResource` plugin. |
145 |
| -1. Restart the `kube-apiserver` component and the `kube-controller-manager` |
146 |
| - component to propagate the API group changes. |
| 126 | +1. Reconfigure and restart the `kube-apiserver` component. |
| 127 | +
|
| 128 | +1. If the complete `.spec.resourceClaims` field gets removed from Pods, or if |
| 129 | + Pods get scheduled without considering the ResourceClaims, then verify |
| 130 | + that the `DynamicResourceAllocation` [feature gate](/docs/reference/command-line-tools-reference/feature-gates/) is not turned off |
| 131 | + for kube-apiserver, kube-controller-manager, kube-scheduler or the kubelet. |
147 | 132 | -->
|
148 | 133 | 你可以尝试以下排查步骤:
|
149 | 134 |
|
150 |
| -1. 确保 `kube-scheduler` 组件已启用 `DynamicResourceAllocation` 特性门控,并且使用的是 |
151 |
| - [v1 配置 API](/zh-cn/docs/reference/config-api/kube-scheduler-config.v1/)。 |
152 |
| - 如果你使用自定义配置,你可能还需额外启用 `DynamicResource` 插件。 |
| 135 | +1. 重新配置并重启 `kube-apiserver` 组件。 |
153 | 136 |
|
154 |
| -2. 重启 `kube-apiserver` 和 `kube-controller-manager` 组件,以传播 API 组变更。 |
| 137 | +2. 如果从 Pod 中完全删除了 `.spec.resourceClaims` 字段, |
| 138 | + 或者 Pod 在不考虑 ResourceClaim 的情况下被调度, |
| 139 | + 那么请验证 `DynamicResourceAllocation` **特性门控**在 |
| 140 | + kube-apiserver、kube-controller-manager、kube-schedule |
| 141 | + 或 kubelet 组件中是否被关闭。 |
155 | 142 |
|
156 | 143 | <!--
|
157 | 144 | ## Install device drivers {#install-drivers}
|
@@ -186,6 +173,19 @@ cluster-1-device-pool-1-driver.example.com-lqx8x cluster-1-node-1 driver
|
186 | 173 | cluster-1-device-pool-2-driver.example.com-29t7b cluster-1-node-2 driver.example.com cluster-1-device-pool-2-446z 8s
|
187 | 174 | ```
|
188 | 175 |
|
| 176 | +<!-- |
| 177 | +Try the following troubleshooting steps: |
| 178 | +
|
| 179 | +1. Check the health of the DRA driver and look for error messages about |
| 180 | + publishing ResourceSlices in its log output. The vendor of the driver |
| 181 | + may have further instructions about installation and troubleshooting. |
| 182 | +--> |
| 183 | +尝试以下故障排查步骤: |
| 184 | + |
| 185 | +1. 检查 DRA 驱动的健康状况,并在其日志输出中查找关于发布 ResourceSlice |
| 186 | + 的错误消息。驱动的供应商可能有关于安装和故障排除的进一步指示。 |
| 187 | + |
| 188 | + |
189 | 189 | <!--
|
190 | 190 | ## Create DeviceClasses {#create-deviceclasses}
|
191 | 191 |
|
@@ -233,27 +233,25 @@ operators.
|
233 | 233 | -->
|
234 | 234 |
|
235 | 235 | ```yaml
|
236 |
| - apiVersion: resource.k8s.io/v1beta1 |
| 236 | + apiVersion: resource.k8s.io/v1 |
237 | 237 | kind: ResourceSlice
|
238 | 238 | # 为简洁省略部分内容
|
239 | 239 | spec:
|
240 | 240 | devices:
|
241 |
| - - basic: |
242 |
| - attributes: |
243 |
| - type: |
244 |
| - string: gpu |
245 |
| - capacity: |
246 |
| - memory: |
247 |
| - value: 64Gi |
248 |
| - name: gpu-0 |
249 |
| - - basic: |
250 |
| - attributes: |
251 |
| - type: |
252 |
| - string: gpu |
253 |
| - capacity: |
254 |
| - memory: |
255 |
| - value: 64Gi |
256 |
| - name: gpu-1 |
| 241 | + - attributes: |
| 242 | + type: |
| 243 | + string: gpu |
| 244 | + capacity: |
| 245 | + memory: |
| 246 | + value: 64Gi |
| 247 | + name: gpu-0 |
| 248 | + - attributes: |
| 249 | + type: |
| 250 | + string: gpu |
| 251 | + capacity: |
| 252 | + memory: |
| 253 | + value: 64Gi |
| 254 | + name: gpu-1 |
257 | 255 | driver: driver.example.com
|
258 | 256 | nodeName: cluster-1-node-1
|
259 | 257 | # 为简洁省略部分内容
|
|
0 commit comments