Skip to content

Commit fe75643

Browse files
authored
Merge pull request #44961 from windsonsea/gpuyh
[zh] Add translations to scheduling-gpus.md
2 parents 825784c + 46c701b commit fe75643

File tree

1 file changed

+95
-14
lines changed

1 file changed

+95
-14
lines changed

content/zh-cn/docs/tasks/manage-gpus/scheduling-gpus.md

Lines changed: 95 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -100,22 +100,29 @@ spec:
100100
```
101101
102102
<!--
103-
## Clusters containing different types of GPUs
103+
## Manage clusters with different types of GPUs
104104
105105
If different nodes in your cluster have different types of GPUs, then you
106106
can use [Node Labels and Node Selectors](/docs/tasks/configure-pod-container/assign-pods-nodes/)
107107
to schedule pods to appropriate nodes.
108108
109109
For example:
110110
-->
111-
## 集群内存在不同类型的 GPU {#clusters-containing-different-types-of-gpus}
111+
## 管理配有不同类型 GPU 的集群 {#manage-clusters-with-different-types-of-gpus}
112112
113113
如果集群内部的不同节点上有不同类型的 NVIDIA GPU,
114114
那么你可以使用[节点标签和节点选择器](/zh-cn/docs/tasks/configure-pod-container/assign-pods-nodes/)来将
115115
Pod 调度到合适的节点上。
116116
117117
例如:
118118
119+
<!--
120+
```shell
121+
# Label your nodes with the accelerator type they have.
122+
kubectl label nodes node1 accelerator=example-gpu-x100
123+
kubectl label nodes node2 accelerator=other-gpu-k915
124+
```
125+
-->
119126
```shell
120127
# 为你的节点加上它们所拥有的加速器类型的标签
121128
kubectl label nodes node1 accelerator=example-gpu-x100
@@ -134,18 +141,92 @@ a different label key if you prefer.
134141
## 自动节点标签 {#node-labeller}
135142

136143
<!--
137-
If you're using AMD GPU devices, you can deploy
138-
[Node Labeller](https://github.com/RadeonOpenCompute/k8s-device-plugin/tree/master/cmd/k8s-node-labeller).
139-
Node Labeller is a {{< glossary_tooltip text="controller" term_id="controller" >}} that automatically
140-
labels your nodes with GPU device properties.
144+
As an administrator, you can automatically discover and label all your GPU enabled nodes
145+
by deploying Kubernetes [Node Feature Discovery](https://github.com/kubernetes-sigs/node-feature-discovery) (NFD).
146+
NFD detects the hardware features that are available on each node in a Kubernetes cluster.
147+
Typically, NFD is configured to advertise those features as node labels, but NFD can also add extended resources, annotations, and node taints.
148+
NFD is compatible with all [supported versions](/releases/version-skew-policy/#supported-versions) of Kubernetes.
149+
By default NFD create the [feature labels](https://kubernetes-sigs.github.io/node-feature-discovery/master/usage/features.html) for the detected features.
150+
Administrators can leverage NFD to also taint nodes with specific features, so that only pods that request those features can be scheduled on those nodes.
151+
-->
152+
作为管理员,你可以通过部署 Kubernetes
153+
[Node Feature Discovery](https://github.com/kubernetes-sigs/node-feature-discovery) (NFD)
154+
来自动发现所有启用 GPU 的节点并为其打标签。NFD 检测 Kubernetes 集群中每个节点上可用的硬件特性。
155+
通常,NFD 被配置为以节点标签广告这些特性,但 NFD 也可以添加扩展的资源、注解和节点污点。
156+
NFD 兼容所有[支持版本](/zh-cn/releases/version-skew-policy/#supported-versions)的 Kubernetes。
157+
NFD 默认会为检测到的特性创建[特性标签](https://kubernetes-sigs.github.io/node-feature-discovery/master/usage/features.html)
158+
管理员可以利用 NFD 对具有某些具体特性的节点添加污点,以便只有请求这些特性的 Pod 可以被调度到这些节点上。
159+
160+
<!--
161+
You also need a plugin for NFD that adds appropriate labels to your nodes; these might be generic
162+
labels or they could be vendor specific. Your GPU vendor may provide a third party
163+
plugin for NFD; check their documentation for more details.
164+
-->
165+
你还需要一个 NFD 插件,将适当的标签添加到你的节点上;
166+
这些标签可以是通用的,也可以是供应商特定的。你的 GPU 供应商可能会为 NFD 提供第三方插件;
167+
更多细节请查阅他们的文档。
168+
169+
<!--
170+
{{< highlight yaml "linenos=false,hl_lines=6-18" >}}
171+
apiVersion: v1
172+
kind: Pod
173+
metadata:
174+
name: example-vector-add
175+
spec:
176+
# You can use Kubernetes node affinity to schedule this Pod onto a node
177+
# that provides the kind of GPU that its container needs in order to work
178+
affinity:
179+
nodeAffinity:
180+
requiredDuringSchedulingIgnoredDuringExecution:
181+
nodeSelectorTerms:
182+
- matchExpressions:
183+
- key: "gpu.gpu-vendor.example/installed-memory"
184+
operator: Gt # (greater than)
185+
values: ["40535"]
186+
- key: "feature.node.kubernetes.io/pci-10.present" # NFD Feature label
187+
values: ["true"] # (optional) only schedule on nodes with PCI device 10
188+
restartPolicy: OnFailure
189+
containers:
190+
- name: example-vector-add
191+
image: "registry.example/example-vector-add:v42"
192+
resources:
193+
limits:
194+
gpu-vendor.example/example-gpu: 1 # requesting 1 GPU
195+
{{< /highlight >}}
196+
-->
197+
{{< highlight yaml "linenos=false,hl_lines=6-18" >}}
198+
apiVersion: v1
199+
kind: Pod
200+
metadata:
201+
name: example-vector-add
202+
spec:
203+
# 你可以使用 Kubernetes 节点亲和性将此 Pod 调度到提供其容器所需的那种 GPU 的节点上
204+
affinity:
205+
nodeAffinity:
206+
requiredDuringSchedulingIgnoredDuringExecution:
207+
nodeSelectorTerms:
208+
- matchExpressions:
209+
- key: "gpu.gpu-vendor.example/installed-memory"
210+
operator: Gt #(大于)
211+
values: ["40535"]
212+
- key: "feature.node.kubernetes.io/pci-10.present" # NFD 特性标签
213+
values: ["true"] #(可选)仅调度到具有 PCI 设备 10 的节点上
214+
restartPolicy: OnFailure
215+
containers:
216+
- name: example-vector-add
217+
image: "registry.example/example-vector-add:v42"
218+
resources:
219+
limits:
220+
gpu-vendor.example/example-gpu: 1 # 请求 1 个 GPU
221+
{{< /highlight >}}
222+
223+
<!--
224+
#### GPU vendor implementations
141225
142-
Similar functionality for NVIDIA is provided by
143-
[GPU feature discovery](https://github.com/NVIDIA/gpu-feature-discovery/blob/main/README.md).
226+
- [Intel](https://intel.github.io/intel-device-plugins-for-kubernetes/cmd/gpu_plugin/README.html)
227+
- [NVIDIA](https://github.com/NVIDIA/gpu-feature-discovery/#readme)
144228
-->
145-
如果你在使用 AMD GPU,你可以部署
146-
[Node Labeller](https://github.com/RadeonOpenCompute/k8s-device-plugin/tree/master/cmd/k8s-node-labeller)
147-
它是一个 {{< glossary_tooltip text="控制器" term_id="controller" >}},
148-
会自动给节点打上 GPU 设备属性标签。
229+
#### GPU 供应商实现
149230

150-
对于 NVIDIA GPU,[GPU feature discovery](https://github.com/NVIDIA/gpu-feature-discovery/blob/main/README.md)
151-
提供了类似功能。
231+
- [Intel](https://intel.github.io/intel-device-plugins-for-kubernetes/cmd/gpu_plugin/README.html)
232+
- [NVIDIA](https://github.com/NVIDIA/gpu-feature-discovery/#readme)

0 commit comments

Comments
 (0)