Skip to content

Commit cc442b1

Browse files
authored
Merge pull request #32659 from pohly/kubernetes-1.24-storage-capacity-ga
storage capacity GA blog post
2 parents d4bbdb5 + 0e56bf9 commit cc442b1

File tree

1 file changed

+79
-0
lines changed
  • content/en/blog/_posts/2022-05-06-storage-capacity-GA

1 file changed

+79
-0
lines changed
Lines changed: 79 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,79 @@
1+
---
2+
layout: blog
3+
title: "Storage Capacity Tracking reaches GA in Kubernetes 1.24"
4+
date: 2022-05-06
5+
slug: storage-capacity-ga
6+
---
7+
8+
**Authors:** Patrick Ohly (Intel)
9+
10+
The v1.24 release of Kubernetes brings [storage capacity](/docs/concepts/storage/storage-capacity/)
11+
tracking as a generally available feature.
12+
13+
## Problems we have solved
14+
15+
As explained in more detail in the [previous blog post about this
16+
feature](/blog/2021/04/14/local-storage-features-go-beta/), storage capacity
17+
tracking allows a CSI driver to publish information about remaining
18+
capacity. The kube-scheduler then uses that information to pick suitable nodes
19+
for a Pod when that Pod has volumes that still need to be provisioned.
20+
21+
Without this information, a Pod may get stuck without ever being scheduled onto
22+
a suitable node because kube-scheduler has to choose blindly and always ends up
23+
picking a node for which the volume cannot be provisioned because the
24+
underlying storage system managed by the CSI driver does not have sufficient
25+
capacity left.
26+
27+
Because CSI drivers publish storage capacity information that gets used at a
28+
later time when it might not be up-to-date anymore, it can still happen that a
29+
node is picked that doesn't work out after all. Volume provisioning recovers
30+
from that by informing the scheduler that it needs to try again with a
31+
different node.
32+
33+
[Load
34+
tests](https://github.com/kubernetes-csi/csi-driver-host-path/blob/master/docs/storage-capacity-tracking.md)
35+
that were done again for promotion to GA confirmed that all storage in a
36+
cluster can be consumed by Pods with storage capacity tracking whereas Pods got
37+
stuck without it.
38+
39+
## Problems we have *not* solved
40+
41+
Recovery from a failed volume provisioning attempt has one known limitation: if a Pod
42+
uses two volumes and only one of them could be provisioned, then all future
43+
scheduling decisions are limited by the already provisioned volume. If that
44+
volume is local to a node and the other volume cannot be provisioned there, the
45+
Pod is stuck. This problem pre-dates storage capacity tracking and while the
46+
additional information makes it less likely to occur, it cannot be avoided in
47+
all cases, except of course by only using one volume per Pod.
48+
49+
An idea for solving this was proposed in a [KEP
50+
draft](https://github.com/kubernetes/enhancements/pull/1703): volumes that were
51+
provisioned and haven't been used yet cannot have any valuable data and
52+
therefore could be freed and provisioned again elsewhere. SIG Storage is
53+
looking for interested developers who want to continue working on this.
54+
55+
Also not solved is support in Cluster Autoscaler for Pods with volumes. For CSI
56+
drivers with storage capacity tracking, a prototype was developed and discussed
57+
in [a PR](https://github.com/kubernetes/autoscaler/pull/3887). It was meant to
58+
work with arbitrary CSI drivers, but that flexibility made it hard to configure
59+
and slowed down scale up operations: because autoscaler was unable to simulate
60+
volume provisioning, it only scaled the cluster by one node at a time, which
61+
was seen as insufficient.
62+
63+
Therefore that PR was not merged and a different approach with tighter coupling
64+
between autoscaler and CSI driver will be needed. For this a better
65+
understanding is needed about which local storage CSI drivers are used in
66+
combination with cluster autoscaling. Should this lead to a new KEP, then users
67+
will have to try out an implementation in practice before it can move to beta
68+
or GA. So please reach out to SIG Storage if you have an interest in this
69+
topic.
70+
71+
## Acknowledgements
72+
73+
Thanks a lot to the members of the community who have contributed to this
74+
feature or given feedback including members of [SIG
75+
Scheduling](https://github.com/kubernetes/community/tree/master/sig-scheduling),
76+
[SIG
77+
Autoscaling](https://github.com/kubernetes/community/tree/master/sig-autoscaling),
78+
and of course [SIG
79+
Storage](https://github.com/kubernetes/community/tree/master/sig-storage)!

0 commit comments

Comments
 (0)