Skip to content

Commit b9c55d8

Browse files
committed
DRA: user story for autoscaling and fallback code
1 parent d4cb60c commit b9c55d8

File tree

1 file changed

+35
-1
lines changed
  • keps/sig-node/3063-dynamic-resource-allocation

1 file changed

+35
-1
lines changed

keps/sig-node/3063-dynamic-resource-allocation/README.md

Lines changed: 35 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -77,6 +77,7 @@ SIG Architecture for cross-cutting KEPs).
7777
- [User Stories](#user-stories)
7878
- [Cluster add-on development](#cluster-add-on-development)
7979
- [Cluster configuration](#cluster-configuration)
80+
- [Integration with cluster autoscaling](#integration-with-cluster-autoscaling)
8081
- [Partial GPU allocation](#partial-gpu-allocation)
8182
- [Network-attached accelerator](#network-attached-accelerator)
8283
- [Combined setup of different hardware functions](#combined-setup-of-different-hardware-functions)
@@ -113,6 +114,7 @@ SIG Architecture for cross-cutting KEPs).
113114
- [Cluster Autoscaler](#cluster-autoscaler)
114115
- [Generic plugin enhancements](#generic-plugin-enhancements)
115116
- [DRA scheduler plugin extension mechanism](#dra-scheduler-plugin-extension-mechanism)
117+
- [Handling claims without vendor code](#handling-claims-without-vendor-code)
116118
- [Building a custom Cluster Autoscaler binary](#building-a-custom-cluster-autoscaler-binary)
117119
- [kubelet](#kubelet)
118120
- [Managing resources](#managing-resources)
@@ -432,6 +434,15 @@ parametersRef:
432434
name: acme-gpu-init
433435
```
434436

437+
#### Integration with cluster autoscaling
438+
439+
As a cloud provider, I want to support GPUs as part of a hosted Kubernetes
440+
environment, including cluster autoscaling. I ensure that the kernel is
441+
configured as required by the hardware and that the container runtime supports
442+
CDI. I review the Go code provided by the vendor for simulating cluster scaling
443+
and build it into a customized cluster autoscaler binary that supports my cloud
444+
infrastructure.
445+
435446
#### Partial GPU allocation
436447

437448
As a user, I want to use a GPU as accelerator, but don't need exclusive access
@@ -1930,7 +1941,7 @@ progress.
19301941

19311942
When [Cluster
19321943
Autoscaler](https://github.com/kubernetes/autoscaler/tree/master/cluster-autoscaler#cluster-autoscaler)
1933-
encounters a pod that uses a resource claim, the autoscaler needs assistance by
1944+
encounters a pod that uses a resource claim for node-local resources, the autoscaler needs assistance by
19341945
the resource driver for that claim to make the right decisions. Without that
19351946
assistance, the autoscaler might scale up the wrong node group (resource is
19361947
provided by nodes in another group) or not scale up (pod is pending because of
@@ -1944,6 +1955,9 @@ vendor code through some RPC mechanism, as WASM plugin, or some generic
19441955
code which just needs to be parameterized for specific hardware could be added
19451956
later in separate KEPs.
19461957

1958+
Such vendor code is *not* needed for network-attached resources. Adding or
1959+
removing nodes does not change availability of such resources.
1960+
19471961
The in-tree DRA scheduler plugin is still active. It handles the generic checks
19481962
like "can this allocated claim be reserved for this pod" and only calls out to
19491963
vendor code when it comes to decisions that only the vendor can handle, like
@@ -2123,6 +2137,26 @@ claim while the autoscaler goes through it's binpacking simulation.
21232137
Finally, `NodeIsReady` of each vendor plugin is called to implement the
21242138
scheduler plugin's own `NodeIsReady`.
21252139

2140+
#### Handling claims without vendor code
2141+
2142+
When the DRA scheduler plugin does not have specific vendor code for a certain
2143+
resource class, it falls back to the assumption that resources are unlimited,
2144+
i.e. allocation will always work. This is how volume provisioning is currently
2145+
handled during cluster autoscaling.
2146+
2147+
If a pod is not getting scheduled because a resource claim cannot be allocated
2148+
by the real DRA driver, to the autoscaler it will look like the pod should be
2149+
schedulable and therefore it will not spin up new nodes for it, which is the
2150+
right decision.
2151+
2152+
If a pod is not getting scheduled because some other resource requirement is
2153+
not satisfied, the autoscaler will simulate scale up and can pick some
2154+
arbitrary node pool because the DRA scheduler plugin will accept all of those
2155+
nodes.
2156+
2157+
During scale down, moving a running pod to a different node is assumed to work,
2158+
so that scenario also works.
2159+
21262160
#### Building a custom Cluster Autoscaler binary
21272161

21282162
Vendors are encouraged to include an "init" package in their driver

0 commit comments

Comments
 (0)