Skip to content

Commit a469c85

Browse files
authored
Merge pull request kubernetes#2924 from AaronKalair/add-more-documentation
Add some more documentation to clarify how labels and GPUs work with the
2 parents d5c57ae + 19a78f6 commit a469c85

File tree

1 file changed

+36
-0
lines changed
  • cluster-autoscaler/cloudprovider/aws

1 file changed

+36
-0
lines changed

cluster-autoscaler/cloudprovider/aws/README.md

Lines changed: 36 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -162,6 +162,42 @@ If you'd like to scale node groups from 0, an `autoscaling:DescribeLaunchConfigu
162162
}
163163
```
164164

165+
### Gotchas
166+
167+
* Without these tags, when the cluster autoscaler needs to increase the number of nodes, if a node group creates nodes with taints that the pending pod does not tolerate then the cluster autoscaler will only learn about this after the node has been created and it sees that it is tainted. From this point on this information will be cached and subsequent scaling operations will take this into account, but it means that the behaviour of the cluster autoscaler differs between the first and subsequent scale up requests and can lead to confusion.
168+
169+
* The device plugin on nodes which provide GPU resources take a little while to advertise the GPU resource to the APIServer so the AutoScaler may unnecessarily scale up again. See the guidance below for how to avoid this
170+
171+
## GPU Node Groups
172+
173+
If you launch a pod that requires a GPU in it's resource requirements then you must add the following node label to the node (via the kubelet arguments for example)
174+
175+
### Cluster AutoScaler Version < 1.15.x
176+
177+
```bash
178+
--node-labels=cloud.google.com/gke-accelerator=<GPU TYPE YOU ARE USING>
179+
```
180+
181+
E.g. on an AWS P2.X instance
182+
183+
```bash
184+
--kubelet-extra-args '--node-labels=cloud.google.com/gke-accelerator=nvidia-tesla-k80'
185+
```
186+
187+
### Cluster AutoScaler Version >= 1.15.x
188+
189+
```bash
190+
--node-labels=k8s.amazonaws.com/accelerator=<GPU TYPE YOU ARE USING>
191+
```
192+
193+
E.g. on an AWS P2.X instance
194+
195+
```bash
196+
--kubelet-extra-args '--node-labels=k8s.amazonaws.com/accelerator=nvidia-tesla-k80'
197+
```
198+
199+
This is because the GPU resource does not become available immediately after the instance is ready and so without this label, the cluster autoscaler will think that no suitable GPU resource is available and add an additional node.
200+
165201
## Using AutoScalingGroup MixedInstancesPolicy
166202

167203
> Note: The minimum version of cluster autoscaler to support MixedInstancePolicy is v1.14.x.

0 commit comments

Comments
 (0)