Skip to content

Commit 114fa30

Browse files
authored
Merge pull request #44915 from ArangoGutierrez/automatedlabels
Document NFD for GPU Labeling
2 parents 61144f8 + 07b14de commit 114fa30

File tree

1 file changed

+43
-7
lines changed

1 file changed

+43
-7
lines changed

content/en/docs/tasks/manage-gpus/scheduling-gpus.md

Lines changed: 43 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -64,7 +64,7 @@ spec:
6464
gpu-vendor.example/example-gpu: 1 # requesting 1 GPU
6565
```
6666
67-
## Clusters containing different types of GPUs
67+
## Manage clusters with different types of GPUs
6868
6969
If different nodes in your cluster have different types of GPUs, then you
7070
can use [Node Labels and Node Selectors](/docs/tasks/configure-pod-container/assign-pods-nodes/)
@@ -83,10 +83,46 @@ a different label key if you prefer.
8383

8484
## Automatic node labelling {#node-labeller}
8585

86-
If you're using AMD GPU devices, you can deploy
87-
[Node Labeller](https://github.com/RadeonOpenCompute/k8s-device-plugin/tree/master/cmd/k8s-node-labeller).
88-
Node Labeller is a {{< glossary_tooltip text="controller" term_id="controller" >}} that automatically
89-
labels your nodes with GPU device properties.
86+
As an administrator, you can automatically discover and label all your GPU enabled nodes
87+
by deploying Kubernetes [Node Feature Discovery](https://github.com/kubernetes-sigs/node-feature-discovery) (NFD).
88+
NFD detects the hardware features that are available on each node in a Kubernetes cluster.
89+
Typically, NFD is configured to advertise those features as node labels, but NFD can also add extended resources, annotations, and node taints.
90+
NFD is compatible with all [supported versions](/releases/version-skew-policy/#supported-versions) of Kubernetes.
91+
By default NFD create the [feature labels](https://kubernetes-sigs.github.io/node-feature-discovery/master/usage/features.html) for the detected features.
92+
Administrators can leverage NFD to also taint nodes with specific features, so that only pods that request those features can be scheduled on those nodes.
9093

91-
Similar functionality for NVIDIA is provided by
92-
[GPU feature discovery](https://github.com/NVIDIA/gpu-feature-discovery/blob/main/README.md).
94+
You also need a plugin for NFD that adds appropriate labels to your nodes; these might be generic
95+
labels or they could be vendor specific. Your GPU vendor may provide a third party
96+
plugin for NFD; check their documentation for more details.
97+
98+
{{< highlight yaml "linenos=false,hl_lines=6-18" >}}
99+
apiVersion: v1
100+
kind: Pod
101+
metadata:
102+
name: example-vector-add
103+
spec:
104+
# You can use Kubernetes node affinity to schedule this Pod onto a node
105+
# that provides the kind of GPU that its container needs in order to work
106+
affinity:
107+
nodeAffinity:
108+
requiredDuringSchedulingIgnoredDuringExecution:
109+
nodeSelectorTerms:
110+
- matchExpressions:
111+
- key: "gpu.gpu-vendor.example/installed-memory"
112+
operator: Gt # (greater than)
113+
values: ["40535"]
114+
- key: "feature.node.kubernetes.io/pci-10.present" # NFD Feature label
115+
values: ["true"] # (optional) only schedule on nodes with PCI device 10
116+
restartPolicy: OnFailure
117+
containers:
118+
- name: example-vector-add
119+
image: "registry.example/example-vector-add:v42"
120+
resources:
121+
limits:
122+
gpu-vendor.example/example-gpu: 1 # requesting 1 GPU
123+
{{< /highlight >}}
124+
125+
#### GPU vendor implementations
126+
127+
- [Intel](https://intel.github.io/intel-device-plugins-for-kubernetes/cmd/gpu_plugin/README.html)
128+
- [NVIDIA](https://github.com/NVIDIA/gpu-feature-discovery/#readme)

0 commit comments

Comments
 (0)