|
| 1 | +--- |
| 2 | +layout: blog |
| 3 | +title: "Storage Capacity Tracking reaches GA in Kubernetes 1.24" |
| 4 | +date: 2022-05-06 |
| 5 | +slug: storage-capacity-ga |
| 6 | +--- |
| 7 | + |
| 8 | + **Authors:** Patrick Ohly (Intel) |
| 9 | + |
| 10 | +The v1.24 release of Kubernetes brings [storage capacity](/docs/concepts/storage/storage-capacity/) |
| 11 | +tracking as a generally available feature. |
| 12 | + |
| 13 | +## Problems we have solved |
| 14 | + |
| 15 | +As explained in more detail in the [previous blog post about this |
| 16 | +feature](/blog/2021/04/14/local-storage-features-go-beta/), storage capacity |
| 17 | +tracking allows a CSI driver to publish information about remaining |
| 18 | +capacity. The kube-scheduler then uses that information to pick suitable nodes |
| 19 | +for a Pod when that Pod has volumes that still need to be provisioned. |
| 20 | + |
| 21 | +Without this information, a Pod may get stuck without ever being scheduled onto |
| 22 | +a suitable node because kube-scheduler has to choose blindly and always ends up |
| 23 | +picking a node for which the volume cannot be provisioned because the |
| 24 | +underlying storage system managed by the CSI driver does not have sufficient |
| 25 | +capacity left. |
| 26 | + |
| 27 | +Because CSI drivers publish storage capacity information that gets used at a |
| 28 | +later time when it might not be up-to-date anymore, it can still happen that a |
| 29 | +node is picked that doesn't work out after all. Volume provisioning recovers |
| 30 | +from that by informing the scheduler that it needs to try again with a |
| 31 | +different node. |
| 32 | + |
| 33 | +[Load |
| 34 | +tests](https://github.com/kubernetes-csi/csi-driver-host-path/blob/master/docs/storage-capacity-tracking.md) |
| 35 | +that were done again for promotion to GA confirmed that all storage in a |
| 36 | +cluster can be consumed by Pods with storage capacity tracking whereas Pods got |
| 37 | +stuck without it. |
| 38 | + |
| 39 | +## Problems we have *not* solved |
| 40 | + |
| 41 | +Recovery from a failed volume provisioning attempt has one known limitation: if a Pod |
| 42 | +uses two volumes and only one of them could be provisioned, then all future |
| 43 | +scheduling decisions are limited by the already provisioned volume. If that |
| 44 | +volume is local to a node and the other volume cannot be provisioned there, the |
| 45 | +Pod is stuck. This problem pre-dates storage capacity tracking and while the |
| 46 | +additional information makes it less likely to occur, it cannot be avoided in |
| 47 | +all cases, except of course by only using one volume per Pod. |
| 48 | + |
| 49 | +An idea for solving this was proposed in a [KEP |
| 50 | +draft](https://github.com/kubernetes/enhancements/pull/1703): volumes that were |
| 51 | +provisioned and haven't been used yet cannot have any valuable data and |
| 52 | +therefore could be freed and provisioned again elsewhere. SIG Storage is |
| 53 | +looking for interested developers who want to continue working on this. |
| 54 | + |
| 55 | +Also not solved is support in Cluster Autoscaler for Pods with volumes. For CSI |
| 56 | +drivers with storage capacity tracking, a prototype was developed and discussed |
| 57 | +in [a PR](https://github.com/kubernetes/autoscaler/pull/3887). It was meant to |
| 58 | +work with arbitrary CSI drivers, but that flexibility made it hard to configure |
| 59 | +and slowed down scale up operations: because autoscaler was unable to simulate |
| 60 | +volume provisioning, it only scaled the cluster by one node at a time, which |
| 61 | +was seen as insufficient. |
| 62 | + |
| 63 | +Therefore that PR was not merged and a different approach with tighter coupling |
| 64 | +between autoscaler and CSI driver will be needed. For this a better |
| 65 | +understanding is needed about which local storage CSI drivers are used in |
| 66 | +combination with cluster autoscaling. Should this lead to a new KEP, then users |
| 67 | +will have to try out an implementation in practice before it can move to beta |
| 68 | +or GA. So please reach out to SIG Storage if you have an interest in this |
| 69 | +topic. |
| 70 | + |
| 71 | +## Acknowledgements |
| 72 | + |
| 73 | +Thanks a lot to the members of the community who have contributed to this |
| 74 | +feature or given feedback including members of [SIG |
| 75 | +Scheduling](https://github.com/kubernetes/community/tree/master/sig-scheduling), |
| 76 | +[SIG |
| 77 | +Autoscaling](https://github.com/kubernetes/community/tree/master/sig-autoscaling), |
| 78 | +and of course [SIG |
| 79 | +Storage](https://github.com/kubernetes/community/tree/master/sig-storage)! |
0 commit comments