Skip to content

Commit 79e5222

Browse files
committed
[zh-cn]sync set-up-dra-cluster
Signed-off-by: xin.li <[email protected]>
1 parent 66d2642 commit 79e5222

File tree

1 file changed

+60
-62
lines changed

1 file changed

+60
-62
lines changed

content/zh-cn/docs/tasks/configure-pod-container/assign-resources/set-up-dra-cluster.md

Lines changed: 60 additions & 62 deletions
Original file line numberDiff line numberDiff line change
@@ -1,13 +1,13 @@
11
---
22
title: "在集群中设置 DRA"
33
content_type: task
4-
min-kubernetes-server-version: v1.32
4+
min-kubernetes-server-version: v1.34
55
weight: 10
66
---
77
<!--
88
title: "Set Up DRA in a Cluster"
99
content_type: task
10-
min-kubernetes-server-version: v1.32
10+
min-kubernetes-server-version: v1.34
1111
weight: 10
1212
-->
1313

@@ -62,44 +62,30 @@ For details, see
6262
<!-- steps -->
6363

6464
<!--
65-
## Enable the DRA API groups {#enable-dra}
65+
## Optional: enable legacy DRA API groups {#enable-dra}
6666
67-
To let Kubernetes allocate resources to your Pods with DRA, complete the
68-
following configuration steps:
67+
DRA graduated to stable in Kubernetes 1.34 and is enabled by default.
68+
Some older DRA drivers or workloads might still need the
69+
v1beta1 API from Kubernetes 1.30 or v1beta2 from Kubernetes 1.32.
70+
If and only if support for those is desired, then enable the following
71+
{{< glossary_tooltip text="API groups" term_id="api-group" >}}:
6972
70-
1. Enable the `DynamicResourceAllocation`
71-
[feature gate](/docs/reference/command-line-tools-reference/feature-gates/)
72-
on all of the following components:
73-
-->
74-
## 启用 DRA API 组 {#enable-dra}
75-
76-
若要让 Kubernetes 能够使用 DRA 为你的 Pod 分配资源,需完成以下配置步骤:
77-
78-
1. 在所有以下组件中启用 `DynamicResourceAllocation`
79-
[特性门控](/zh-cn/docs/reference/command-line-tools-reference/feature-gates/)
80-
81-
* `kube-apiserver`
82-
* `kube-controller-manager`
83-
* `kube-scheduler`
84-
* `kubelet`
73+
* `resource.k8s.io/v1beta1`
74+
* `resource.k8s.io/v1beta2`
8575
86-
<!--
87-
1. Enable the following
88-
{{< glossary_tooltip text="API groups" term_id="api-group" >}}:
89-
90-
* `resource.k8s.io/v1beta1`: required for DRA to function.
91-
* `resource.k8s.io/v1beta2`: optional, recommended improvements to the user
92-
experience.
93-
94-
For more information, see
95-
[Enabling or disabling API groups](/docs/reference/using-api/#enabling-or-disabling).
76+
For more information, see
77+
[Enabling or disabling API groups](/docs/reference/using-api/#enabling-or-disabling).
9678
-->
97-
2. 启用以下 {{< glossary_tooltip text="API 组" term_id="api-group" >}}:
79+
DRA 在 Kubernetes 1.34 中进阶至 Stable 并默认启用。
80+
一些较旧的 DRA 驱动或工作负载可能仍需要 Kubernetes 1.30 的 v1beta1 API
81+
或 Kubernetes 1.32 的 v1beta2 API。
82+
当且仅当需要支持这些时,才启用以下
83+
{{< glossary_tooltip text="API 组" term_id="api-group" >}}:
9884

99-
* `resource.k8s.io/v1beta1`:DRA 所必需。
100-
* `resource.k8s.io/v1beta2`:可选,推荐启用以提升用户体验。
85+
* `resource.k8s.io/v1beta1`
86+
* `resource.k8s.io/v1beta2`
10187

102-
更多信息请参阅[启用或禁用 API 组](/zh-cn/docs/reference/using-api/#enabling-or-disabling)
88+
更多信息请参阅[启用或禁用 API 组](/zh-cn/docs/reference/using-api/#enabling-or-disabling)
10389

10490
<!--
10591
## Verify that DRA is enabled {#verify}
@@ -137,21 +123,22 @@ error: the server doesn't have a resource type "deviceclasses"
137123
<!--
138124
Try the following troubleshooting steps:
139125
140-
1. Ensure that the `kube-scheduler` component has the `DynamicResourceAllocation`
141-
feature gate enabled *and* uses the
142-
[v1 configuration API](/docs/reference/config-api/kube-scheduler-config.v1/).
143-
If you use a custom configuration, you might need to perform additional steps
144-
to enable the `DynamicResource` plugin.
145-
1. Restart the `kube-apiserver` component and the `kube-controller-manager`
146-
component to propagate the API group changes.
126+
1. Reconfigure and restart the `kube-apiserver` component.
127+
128+
1. If the complete `.spec.resourceClaims` field gets removed from Pods, or if
129+
Pods get scheduled without considering the ResourceClaims, then verify
130+
that the `DynamicResourceAllocation` [feature gate](/docs/reference/command-line-tools-reference/feature-gates/) is not turned off
131+
for kube-apiserver, kube-controller-manager, kube-scheduler or the kubelet.
147132
-->
148133
你可以尝试以下排查步骤:
149134

150-
1. 确保 `kube-scheduler` 组件已启用 `DynamicResourceAllocation` 特性门控,并且使用的是
151-
[v1 配置 API](/zh-cn/docs/reference/config-api/kube-scheduler-config.v1/)
152-
如果你使用自定义配置,你可能还需额外启用 `DynamicResource` 插件。
135+
1. 重新配置并重启 `kube-apiserver` 组件。
153136

154-
2. 重启 `kube-apiserver``kube-controller-manager` 组件,以传播 API 组变更。
137+
2. 如果从 Pod 中完全删除了 `.spec.resourceClaims` 字段,
138+
或者 Pod 在不考虑 ResourceClaim 的情况下被调度,
139+
那么请验证 `DynamicResourceAllocation` **特性门控**
140+
kube-apiserver、kube-controller-manager、kube-schedule
141+
或 kubelet 组件中是否被关闭。
155142

156143
<!--
157144
## Install device drivers {#install-drivers}
@@ -186,6 +173,19 @@ cluster-1-device-pool-1-driver.example.com-lqx8x cluster-1-node-1 driver
186173
cluster-1-device-pool-2-driver.example.com-29t7b cluster-1-node-2 driver.example.com cluster-1-device-pool-2-446z 8s
187174
```
188175

176+
<!--
177+
Try the following troubleshooting steps:
178+
179+
1. Check the health of the DRA driver and look for error messages about
180+
publishing ResourceSlices in its log output. The vendor of the driver
181+
may have further instructions about installation and troubleshooting.
182+
-->
183+
尝试以下故障排查步骤:
184+
185+
1. 检查 DRA 驱动的健康状况,并在其日志输出中查找关于发布 ResourceSlice
186+
的错误消息。驱动的供应商可能有关于安装和故障排除的进一步指示。
187+
188+
189189
<!--
190190
## Create DeviceClasses {#create-deviceclasses}
191191
@@ -233,27 +233,25 @@ operators.
233233
-->
234234

235235
```yaml
236-
apiVersion: resource.k8s.io/v1beta1
236+
apiVersion: resource.k8s.io/v1
237237
kind: ResourceSlice
238238
# 为简洁省略部分内容
239239
spec:
240240
devices:
241-
- basic:
242-
attributes:
243-
type:
244-
string: gpu
245-
capacity:
246-
memory:
247-
value: 64Gi
248-
name: gpu-0
249-
- basic:
250-
attributes:
251-
type:
252-
string: gpu
253-
capacity:
254-
memory:
255-
value: 64Gi
256-
name: gpu-1
241+
- attributes:
242+
type:
243+
string: gpu
244+
capacity:
245+
memory:
246+
value: 64Gi
247+
name: gpu-0
248+
- attributes:
249+
type:
250+
string: gpu
251+
capacity:
252+
memory:
253+
value: 64Gi
254+
name: gpu-1
257255
driver: driver.example.com
258256
nodeName: cluster-1-node-1
259257
# 为简洁省略部分内容

0 commit comments

Comments
 (0)