Skip to content

Commit c70953d

Browse files
authored
Merge pull request #51573 from gnufied/add-blog-expansion-failure
Add blog for recover from volume expansion PR
2 parents 645d967 + fda969b commit c70953d

File tree

1 file changed

+98
-0
lines changed
  • content/en/blog/_posts/2025-07-10-recover-failed-expansion

1 file changed

+98
-0
lines changed
Lines changed: 98 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,98 @@
1+
---
2+
layout: blog
3+
title: "Kubernetes v1.34: Recovery From Volume Expansion Failure (GA)"
4+
date: 2025-0X-XXT09:00:00-08:00
5+
draft: true
6+
slug: kubernetes-v1-34-recover-expansion-failure
7+
author: >
8+
[Hemant Kumar](https://github.com/gnufied) (Red Hat)
9+
---
10+
11+
Have you ever made a typo when expanding your persistent volumes in Kubernetes? Meant to specify `2TB`
12+
but specified `20TiB`? This seemingly innocuous problem was kinda hard to fix - and took the project almost 5 years to fix.
13+
[Automated recovery from storage expansion](/docs/concepts/storage/persistent-volumes/#recovering-from-failure-when-expanding-volumes) has been around for a while in beta; however, with the v1.34 release, we have graduated this to
14+
**general availability**.
15+
16+
While it was always possible to recover from failing volume expansions manually, it usually required cluster-admin access and was tedious to do (See aformentioned link for more information).
17+
18+
What if you make a mistake and then realize immediately?
19+
With Kubernetes v1.34, you should be able to reduce the requested size of the PersistentVolumeClaim (PVC) and, as long as the expansion to previously requested
20+
size hadn't finished, you can amend the size requested. Kubernetes will
21+
automatically work to correct it. Any quota consumed by failed expansion will be returned to the user and the associated PersistentVolume should be resized to the
22+
latest size you specified.
23+
24+
I'll walk through an example of how all of this works.
25+
26+
## Reducing PVC size to recover from failed expansion
27+
28+
Imagine that you are running out of disk space for one of your database servers, and you want to expand the PVC from previously
29+
specified `10TB` to `100TB` - but you make a typo and specify `1000TB`.
30+
31+
```yaml
32+
kind: PersistentVolumeClaim
33+
apiVersion: v1
34+
metadata:
35+
name: myclaim
36+
spec:
37+
accessModes:
38+
- ReadWriteOnce
39+
resources:
40+
requests:
41+
storage: 1000TB # newly specified size - but incorrect!
42+
```
43+
44+
Now, you may be out of disk space on your disk array or simply ran out of allocated quota on your cloud-provider. But, assume that expansion to `1000TB` is never going to succeed.
45+
46+
In Kubernetes v1.34, you can simply correct your mistake and request a new PVC size,
47+
that is smaller than the mistake, provided it is still larger than the original size
48+
of the actual PersistentVolume.
49+
50+
```yaml
51+
kind: PersistentVolumeClaim
52+
apiVersion: v1
53+
metadata:
54+
name: myclaim
55+
spec:
56+
accessModes:
57+
- ReadWriteOnce
58+
resources:
59+
requests:
60+
storage: 100TB # Corrected size; has to be greater than 10TB.
61+
# You cannot shrink the volume below its actual size.
62+
```
63+
64+
This requires no admin intervention. Even better, any surplus Kubernetes quota that you temporarily consumed will be automatically returned.
65+
66+
This fault recovery mechanism does have a caveat: whatever new size you specify for the PVC, it **must** be still higher than the original size in `.status.capacity`.
67+
Since Kubernetes doesn't support shrinking your PV objects, you can never go below the size that was originally allocated for your PVC request.
68+
69+
## Improved error handling and observability of volume expansion
70+
71+
Implementing what might look like a relatively minor change also required us to almost
72+
fully redo how volume expansion works under the hood in Kubernetes.
73+
There are new API fields available in PVC objects which you can monitor to observe progress of volume expansion.
74+
75+
### Improved observability of in-progress expansion
76+
77+
You can query `.status.allocatedResourceStatus['storage']` of a PVC to monitor progress of a volume expansion operation.
78+
For a typical block volume, this should transition between `ControllerResizeInProgress`, `NodeResizePending` and `NodeResizeInProgress` and become nil/empty when volume expansion has finished.
79+
80+
If for some reason, volume expansion to requested size is not feasible it should accordingly be in states like - `ControllerResizeInfeasible` or `NodeResizeInfeasible`.
81+
82+
You can also observe size towards which Kubernetes is working by watching `pvc.status.allocatedResources`.
83+
84+
### Improved error handling and reporting
85+
86+
Kubernetes should now retry your failed volume expansions at slower rate, it should make fewer requests to both storage system and Kubernetes apiserver.
87+
88+
Errors observerd during volume expansion are now reported as condition on PVC objects and should persist unlike events. Kubernetes will now populate `pvc.status.conditions` with error keys `ControllerResizeError` or `NodeResizeError` when volume expansion fails.
89+
90+
### Fixes long standing bugs in resizing workflows
91+
92+
This feature also has allowed us to fix long standing bugs in resizing workflow such as [Kubernetes issue #115294](https://github.com/kubernetes/kubernetes/issues/115294).
93+
If you observe anything broken, please report your bugs to [https://github.com/kubernetes/kubernetes/issues](https://github.com/kubernetes/kubernetes/issues/new/choose), along with details about how to reproduce the problem.
94+
95+
Working on this feature through its lifecycle was challenging and it wouldn't have been possible to reach GA
96+
without feedback from [@msau42](https://github.com/msau42), [@jsafrane](https://github.com/jsafrane) and [@xing-yang](https://github.com/xing-yang).
97+
98+
All of the contributors who worked on this also appreciate the input provided by [@thockin](https://github.com/thockin) and [@liggitt](https://github.comliggitt) at various Kubernetes contributor summits.

0 commit comments

Comments
 (0)