You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: content/en/blog/_posts/2025-07-10-recover-failed-expansion/index.md
+24-14Lines changed: 24 additions & 14 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,6 +1,6 @@
1
1
---
2
2
layout: blog
3
-
title: "Kubernetes v1.34: Recover from volume expansion failure"
3
+
title: "Kubernetes v1.34: Recovery From Volume Expansion Failure (GA)"
4
4
date: 2025-0X-XXT09:00:00-08:00
5
5
draft: true
6
6
slug: kubernetes-v1-34-recover-expansion-failure
@@ -15,14 +15,18 @@ but specified `20TiB`? This seemingly innocuous problem was kinda hard to fix -
15
15
16
16
While it was always possible to recover from failing volume expansions manually, it usually required cluster-admin access and was tedious to do (See aformentioned link for more information).
17
17
18
-
With v1.34 users should be able to reduce requested size of the persistentvolume claim(PVC) and as long as
19
-
expansion to previously requested size didn't finish, users can correct the size requested and Kubernetes will automatically work to correct it. Any quota consumed by failed expansion will be returned to the user and PVC should be resized to newly specified size.
18
+
What if you make a mistake and then realize immediately?
19
+
With Kubernetes v1.34, you should be able to reduce the requested size of the PersistentVolumeClaim (PVC) and, as long as the expansion to previously requested
20
+
size hadn't finished, you can amend the size requested. Kubernetes will
21
+
automatically work to correct it. Any quota consumed by failed expansion will be returned to the user and the associated PersistentVolume should be resized to the
22
+
latest size you specified.
20
23
21
-
Lets walk through an example of how all of this works.
24
+
I'll walk through an example of how all of this works.
22
25
23
26
## Reducing PVC size to recover from failed expansion
24
27
25
-
Lets say you are running out of disk space on your database server and you want to expand the PVC from previously specified `10TB` to `100TB` but made a typo and specified `1000TB`.
28
+
Imagine that you are running out of disk space for one of your database servers, and you want to expand the PVC from previously
29
+
specified `10TB` to `100TB` - but you make a typo and specify `1000TB`.
26
30
27
31
```yaml
28
32
kind: PersistentVolumeClaim
@@ -34,12 +38,14 @@ spec:
34
38
- ReadWriteOnce
35
39
resources:
36
40
requests:
37
-
storage: 1000TB --> newly specified size with Typo
41
+
storage: 1000TB # newly specified size - but incorrect!
38
42
```
39
43
40
-
Now, you may be out of disk space on your disk array or simply ran out of allocated quota on your cloud-provider and expansion to `1000TB` is never going to succeed.
44
+
Now, you may be out of disk space on your disk array or simply ran out of allocated quota on your cloud-provider. But, assume that expansion to `1000TB` is never going to succeed.
41
45
42
-
In Kubernetes v1.34, you can simply correct your mistake and request *reduced* pvc size.
46
+
In Kubernetes v1.34, you can simply correct your mistake and request a new PVC size,
47
+
that is smaller than the mistake, provided it is still larger than the original size
48
+
of the actual PersistentVolume.
43
49
44
50
```yaml
45
51
kind: PersistentVolumeClaim
@@ -51,12 +57,14 @@ spec:
51
57
- ReadWriteOnce
52
58
resources:
53
59
requests:
54
-
storage: 100TB --> Fixed new size, has to be greater than 10TB.
60
+
storage: 100TB # Corrected size; has to be greater than 10TB.
61
+
# You cannot shrink the volume below its actual size.
55
62
```
56
63
57
-
This requires no admin intervention and whatever Kubernetes quota you consumed will be automatically returned.
64
+
This requires no admin intervention. Even better, any surplus Kubernetes quota that you temporarily consumed will be automatically returned.
58
65
59
-
This feature does have a caveat that, whatever new size you specify for the PVC, it **MUST** be still higher than what was original size in `.status.capacity`. It should be noted that, since Kubernetes doesn't support shriking your PV objects, you can never go below size that was originally allocatd for your PVC request.
66
+
This fault recovery mechanism does have a caveat: whatever new size you specify for the PVC, it **must** be still higher than the original size in `.status.capacity`.
67
+
Since Kubernetes doesn't support shrinking your PV objects, you can never go below the size that was originally allocated for your PVC request.
60
68
61
69
## Improved error handling and observability of volume expansion
62
70
@@ -66,21 +74,23 @@ There are new API fields available in PVC objects which you can monitor to obser
66
74
67
75
### Improved observability of in-progress expansion
68
76
69
-
Users can use `pvc.status.allocatedResourceStatus['storage']` to monitor progress of their volume expansion operation. For a typical block volume, this should transition between `ControllerResizeInProgress`, `NodeResizePending` and `NodeResizeInProgress` and become nil/empty when volume expansion is finished.
77
+
You can query `.status.allocatedResourceStatus['storage']` of a PVC to monitor progress of a volume expansion operation.
78
+
For a typical block volume, this should transition between `ControllerResizeInProgress`, `NodeResizePending` and `NodeResizeInProgress` and become nil/empty when volume expansion has finished.
70
79
71
80
If for some reason, volume expansion to requested size is not feasible it should accordingly be in states like - `ControllerResizeInfeasible` or `NodeResizeInfeasible`.
72
81
73
82
You can also observe size towards which Kubernetes is working by watching `pvc.status.allocatedResources`.
74
83
75
84
### Improved error handling and reporting
76
85
77
-
Kubernetes should now retry your failed volume expansions at slower rate, it should make fewer requests to both cloudprovider and Kubernetes apiserver.
86
+
Kubernetes should now retry your failed volume expansions at slower rate, it should make fewer requests to both storage system and Kubernetes apiserver.
78
87
79
88
Errors observerd during volume expansion are now reported as condition on PVC objects and should persist unlike events. Kubernetes will now populate `pvc.status.conditions` with error keys `ControllerResizeError` or `NodeResizeError` when volume expansion fails.
80
89
81
90
### Fixes long standing bugs in resizing workflows
82
91
83
-
This feature also has allowed us to fix long standing bugs in resizing workflow such as - https://github.com/kubernetes/kubernetes/issues/115294 . If you observe anything broken please report your bugs to https://github.com/kubernetes/kubernetes/issues .
92
+
This feature also has allowed us to fix long standing bugs in resizing workflow such as [Kubernetes issue #115294](https://github.com/kubernetes/kubernetes/issues/115294).
93
+
If you observe anything broken, please report your bugs to [https://github.com/kubernetes/kubernetes/issues](https://github.com/kubernetes/kubernetes/issues/new/choose), along with details about how to reproduce the problem.
84
94
85
95
Working on this feature through its lifecycle was challenging and it wouldn't have been possible to reach GA
86
96
without feedback from [@msau42](https://github.com/msau42), [@jsafrane](https://github.com/jsafrane) and [@xing-yang](https://github.com/xing-yang).
0 commit comments