Skip to content

Commit 991e4e9

Browse files
committed
storage capacity: document autoscaler support
1 parent 787e551 commit 991e4e9

File tree

1 file changed

+48
-3
lines changed
  • keps/sig-storage/1472-storage-capacity-tracking

1 file changed

+48
-3
lines changed

keps/sig-storage/1472-storage-capacity-tracking/README.md

Lines changed: 48 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1189,9 +1189,54 @@ based on storage capacity:
11891189
to available storage and thus could run on a new node, the
11901190
simulation may decide otherwise.
11911191

1192-
It may be possible to solve this by pre-configuring some information
1193-
(local storage capacity of future nodes and their CSI topology). This
1194-
needs to be explored further.
1192+
This gets further complicated by the independent development of CSI drivers,
1193+
autoscaler, and cloud provider: autoscaler and cloud provider don't know which
1194+
kinds of volumes a CSI driver will be able to make available on nodes because
1195+
that logic is implemented inside the CSI driver. The CSI driver doesn't know
1196+
about hardware that hasn't been provisioned yet and doesn't know about
1197+
autoscaling.
1198+
1199+
This problem can be solved by the cluster administrator. They can find out how
1200+
much storage will be made available by new nodes, for example by running
1201+
experiments, and then configure the cluster so that this information is
1202+
available to the autoscaler. This can be done with the existing
1203+
CSIStorageCapacity API as follows:
1204+
1205+
- When creating a fictional Node object from an existing Node in
1206+
a node group, autoscaler must modify the topology labels of the CSI
1207+
driver(s) in the cluster so that they define a new topology segment.
1208+
For example, topology.hostpath.csi/node=aks-workerpool.* has to
1209+
be replaced with topology.hostpath.csi/node=aks-workerpool-template.
1210+
Because these labels are opaque to the autoscaler, the cluster
1211+
administrator must configure these transformations, for example
1212+
via regular expression search/replace.
1213+
- For scale up from zero, a label like
1214+
topology.hostpath.csi/node=aks-workerpool-template must be added to the
1215+
configuration of the node pool.
1216+
- For each storage class, the cluster administrator can then create
1217+
CSIStorageCapacity objects that provide the capacity information for these
1218+
fictional topology segments.
1219+
- When the volume binder plugin for the scheduler runs inside the autoscaler,
1220+
it works exactly as in the scheduler and will accept nodes where the manually
1221+
created CSIStorageCapacity indicate that sufficient storage is (or rather,
1222+
will be) available.
1223+
- Because the CSI driver will not run immediately on new nodes, autoscaler has
1224+
to wait for it before considering the node ready. If it doesn't do that, it
1225+
might incorrectly scale up further because storage capacity checks will fail
1226+
for a new, unused node until the CSI driver provides CSIStorageCapacity
1227+
objects for it. This can be implemented in a generic way for all CSI drivers
1228+
by adding a readiness check to the autoscaler that compares the existing
1229+
CSIStorageCapacity objects against the expected ones for the fictional node.
1230+
1231+
A proof-of-concept of this approach is available in
1232+
https://github.com/kubernetes/autoscaler/pull/3887 and has been used
1233+
successfully to scale an Azure cluster up and down with csi-driver-host-path as
1234+
CSI driver.
1235+
1236+
The approach above preserves the separation between the different
1237+
components. Simpler solutions may be possible by adding support for specific
1238+
CSI drivers into custom autoscaler binaries or into operators that control the
1239+
cluster setup.
11951240

11961241
### Alternative solutions
11971242

0 commit comments

Comments
 (0)