Skip to content

Commit 703f24a

Browse files
authored
Merge pull request #39898 from garymm/patch-1
Reference nvidia gpu feature discovery
2 parents fd3e8ba + 4beceb1 commit 703f24a

File tree

1 file changed

+2
-62
lines changed

1 file changed

+2
-62
lines changed

content/en/docs/tasks/manage-gpus/scheduling-gpus.md

Lines changed: 2 additions & 62 deletions
Original file line numberDiff line numberDiff line change
@@ -88,65 +88,5 @@ If you're using AMD GPU devices, you can deploy
8888
Node Labeller is a {{< glossary_tooltip text="controller" term_id="controller" >}} that automatically
8989
labels your nodes with GPU device properties.
9090

91-
At the moment, that controller can add labels for:
92-
93-
* Device ID (-device-id)
94-
* VRAM Size (-vram)
95-
* Number of SIMD (-simd-count)
96-
* Number of Compute Unit (-cu-count)
97-
* Firmware and Feature Versions (-firmware)
98-
* GPU Family, in two letters acronym (-family)
99-
* SI - Southern Islands
100-
* CI - Sea Islands
101-
* KV - Kaveri
102-
* VI - Volcanic Islands
103-
* CZ - Carrizo
104-
* AI - Arctic Islands
105-
* RV - Raven
106-
107-
```shell
108-
kubectl describe node cluster-node-23
109-
```
110-
111-
```
112-
Name: cluster-node-23
113-
Roles: <none>
114-
Labels: beta.amd.com/gpu.cu-count.64=1
115-
beta.amd.com/gpu.device-id.6860=1
116-
beta.amd.com/gpu.family.AI=1
117-
beta.amd.com/gpu.simd-count.256=1
118-
beta.amd.com/gpu.vram.16G=1
119-
kubernetes.io/arch=amd64
120-
kubernetes.io/os=linux
121-
kubernetes.io/hostname=cluster-node-23
122-
Annotations: node.alpha.kubernetes.io/ttl: 0
123-
124-
```
125-
126-
With the Node Labeller in use, you can specify the GPU type in the Pod spec:
127-
128-
```yaml
129-
apiVersion: v1
130-
kind: Pod
131-
metadata:
132-
name: cuda-vector-add
133-
spec:
134-
restartPolicy: OnFailure
135-
containers:
136-
- name: cuda-vector-add
137-
# https://github.com/kubernetes/kubernetes/blob/v1.7.11/test/images/nvidia-cuda/Dockerfile
138-
image: "registry.k8s.io/cuda-vector-add:v0.1"
139-
resources:
140-
limits:
141-
nvidia.com/gpu: 1
142-
affinity:
143-
nodeAffinity:
144-
requiredDuringSchedulingIgnoredDuringExecution:
145-
nodeSelectorTerms:
146-
– matchExpressions:
147-
– key: beta.amd.com/gpu.family.AI # Arctic Islands GPU family
148-
operator: Exist
149-
```
150-
151-
This ensures that the Pod will be scheduled to a node that has the GPU type
152-
you specified.
91+
Similar functionality for NVIDIA is provied by
92+
[GPU feature discovery](https://github.com/NVIDIA/gpu-feature-discovery/blob/main/README.md).

0 commit comments

Comments
 (0)