@@ -77,6 +77,7 @@ SIG Architecture for cross-cutting KEPs).
77
77
- [ User Stories] ( #user-stories )
78
78
- [ Cluster add-on development] ( #cluster-add-on-development )
79
79
- [ Cluster configuration] ( #cluster-configuration )
80
+ - [ Integration with cluster autoscaling] ( #integration-with-cluster-autoscaling )
80
81
- [ Partial GPU allocation] ( #partial-gpu-allocation )
81
82
- [ Network-attached accelerator] ( #network-attached-accelerator )
82
83
- [ Combined setup of different hardware functions] ( #combined-setup-of-different-hardware-functions )
@@ -113,6 +114,7 @@ SIG Architecture for cross-cutting KEPs).
113
114
- [ Cluster Autoscaler] ( #cluster-autoscaler )
114
115
- [ Generic plugin enhancements] ( #generic-plugin-enhancements )
115
116
- [ DRA scheduler plugin extension mechanism] ( #dra-scheduler-plugin-extension-mechanism )
117
+ - [ Handling claims without vendor code] ( #handling-claims-without-vendor-code )
116
118
- [ Building a custom Cluster Autoscaler binary] ( #building-a-custom-cluster-autoscaler-binary )
117
119
- [ kubelet] ( #kubelet )
118
120
- [ Managing resources] ( #managing-resources )
@@ -432,6 +434,15 @@ parametersRef:
432
434
name: acme-gpu-init
433
435
```
434
436
437
+ #### Integration with cluster autoscaling
438
+
439
+ As a cloud provider, I want to support GPUs as part of a hosted Kubernetes
440
+ environment, including cluster autoscaling. I ensure that the kernel is
441
+ configured as required by the hardware and that the container runtime supports
442
+ CDI. I review the Go code provided by the vendor for simulating cluster scaling
443
+ and build it into a customized cluster autoscaler binary that supports my cloud
444
+ infrastructure.
445
+
435
446
#### Partial GPU allocation
436
447
437
448
As a user, I want to use a GPU as accelerator, but don't need exclusive access
@@ -1930,7 +1941,7 @@ progress.
1930
1941
1931
1942
When [ Cluster
1932
1943
Autoscaler] ( https://github.com/kubernetes/autoscaler/tree/master/cluster-autoscaler#cluster-autoscaler )
1933
- encounters a pod that uses a resource claim, the autoscaler needs assistance by
1944
+ encounters a pod that uses a resource claim for node-local resources , the autoscaler needs assistance by
1934
1945
the resource driver for that claim to make the right decisions. Without that
1935
1946
assistance, the autoscaler might scale up the wrong node group (resource is
1936
1947
provided by nodes in another group) or not scale up (pod is pending because of
@@ -1944,6 +1955,9 @@ vendor code through some RPC mechanism, as WASM plugin, or some generic
1944
1955
code which just needs to be parameterized for specific hardware could be added
1945
1956
later in separate KEPs.
1946
1957
1958
+ Such vendor code is * not* needed for network-attached resources. Adding or
1959
+ removing nodes does not change availability of such resources.
1960
+
1947
1961
The in-tree DRA scheduler plugin is still active. It handles the generic checks
1948
1962
like "can this allocated claim be reserved for this pod" and only calls out to
1949
1963
vendor code when it comes to decisions that only the vendor can handle, like
@@ -2123,6 +2137,26 @@ claim while the autoscaler goes through it's binpacking simulation.
2123
2137
Finally, ` NodeIsReady ` of each vendor plugin is called to implement the
2124
2138
scheduler plugin's own ` NodeIsReady ` .
2125
2139
2140
+ #### Handling claims without vendor code
2141
+
2142
+ When the DRA scheduler plugin does not have specific vendor code for a certain
2143
+ resource class, it falls back to the assumption that resources are unlimited,
2144
+ i.e. allocation will always work. This is how volume provisioning is currently
2145
+ handled during cluster autoscaling.
2146
+
2147
+ If a pod is not getting scheduled because a resource claim cannot be allocated
2148
+ by the real DRA driver, to the autoscaler it will look like the pod should be
2149
+ schedulable and therefore it will not spin up new nodes for it, which is the
2150
+ right decision.
2151
+
2152
+ If a pod is not getting scheduled because some other resource requirement is
2153
+ not satisfied, the autoscaler will simulate scale up and can pick some
2154
+ arbitrary node pool because the DRA scheduler plugin will accept all of those
2155
+ nodes.
2156
+
2157
+ During scale down, moving a running pod to a different node is assumed to work,
2158
+ so that scenario also works.
2159
+
2126
2160
#### Building a custom Cluster Autoscaler binary
2127
2161
2128
2162
Vendors are encouraged to include an "init" package in their driver
0 commit comments