Skip to content

Commit fda969b

Browse files
gnufiedlmktfy
andcommitted
Apply suggestions from code review
Co-authored-by: Tim Bannister <[email protected]>
1 parent 8f19e7c commit fda969b

File tree

1 file changed

+24
-14
lines changed
  • content/en/blog/_posts/2025-07-10-recover-failed-expansion

1 file changed

+24
-14
lines changed

content/en/blog/_posts/2025-07-10-recover-failed-expansion/index.md

Lines changed: 24 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
---
22
layout: blog
3-
title: "Kubernetes v1.34: Recover from volume expansion failure"
3+
title: "Kubernetes v1.34: Recovery From Volume Expansion Failure (GA)"
44
date: 2025-0X-XXT09:00:00-08:00
55
draft: true
66
slug: kubernetes-v1-34-recover-expansion-failure
@@ -15,14 +15,18 @@ but specified `20TiB`? This seemingly innocuous problem was kinda hard to fix -
1515

1616
While it was always possible to recover from failing volume expansions manually, it usually required cluster-admin access and was tedious to do (See aformentioned link for more information).
1717

18-
With v1.34 users should be able to reduce requested size of the persistentvolume claim(PVC) and as long as
19-
expansion to previously requested size didn't finish, users can correct the size requested and Kubernetes will automatically work to correct it. Any quota consumed by failed expansion will be returned to the user and PVC should be resized to newly specified size.
18+
What if you make a mistake and then realize immediately?
19+
With Kubernetes v1.34, you should be able to reduce the requested size of the PersistentVolumeClaim (PVC) and, as long as the expansion to previously requested
20+
size hadn't finished, you can amend the size requested. Kubernetes will
21+
automatically work to correct it. Any quota consumed by failed expansion will be returned to the user and the associated PersistentVolume should be resized to the
22+
latest size you specified.
2023

21-
Lets walk through an example of how all of this works.
24+
I'll walk through an example of how all of this works.
2225

2326
## Reducing PVC size to recover from failed expansion
2427

25-
Lets say you are running out of disk space on your database server and you want to expand the PVC from previously specified `10TB` to `100TB` but made a typo and specified `1000TB`.
28+
Imagine that you are running out of disk space for one of your database servers, and you want to expand the PVC from previously
29+
specified `10TB` to `100TB` - but you make a typo and specify `1000TB`.
2630

2731
```yaml
2832
kind: PersistentVolumeClaim
@@ -34,12 +38,14 @@ spec:
3438
- ReadWriteOnce
3539
resources:
3640
requests:
37-
storage: 1000TB --> newly specified size with Typo
41+
storage: 1000TB # newly specified size - but incorrect!
3842
```
3943
40-
Now, you may be out of disk space on your disk array or simply ran out of allocated quota on your cloud-provider and expansion to `1000TB` is never going to succeed.
44+
Now, you may be out of disk space on your disk array or simply ran out of allocated quota on your cloud-provider. But, assume that expansion to `1000TB` is never going to succeed.
4145

42-
In Kubernetes v1.34, you can simply correct your mistake and request *reduced* pvc size.
46+
In Kubernetes v1.34, you can simply correct your mistake and request a new PVC size,
47+
that is smaller than the mistake, provided it is still larger than the original size
48+
of the actual PersistentVolume.
4349

4450
```yaml
4551
kind: PersistentVolumeClaim
@@ -51,12 +57,14 @@ spec:
5157
- ReadWriteOnce
5258
resources:
5359
requests:
54-
storage: 100TB --> Fixed new size, has to be greater than 10TB.
60+
storage: 100TB # Corrected size; has to be greater than 10TB.
61+
# You cannot shrink the volume below its actual size.
5562
```
5663

57-
This requires no admin intervention and whatever Kubernetes quota you consumed will be automatically returned.
64+
This requires no admin intervention. Even better, any surplus Kubernetes quota that you temporarily consumed will be automatically returned.
5865

59-
This feature does have a caveat that, whatever new size you specify for the PVC, it **MUST** be still higher than what was original size in `.status.capacity`. It should be noted that, since Kubernetes doesn't support shriking your PV objects, you can never go below size that was originally allocatd for your PVC request.
66+
This fault recovery mechanism does have a caveat: whatever new size you specify for the PVC, it **must** be still higher than the original size in `.status.capacity`.
67+
Since Kubernetes doesn't support shrinking your PV objects, you can never go below the size that was originally allocated for your PVC request.
6068

6169
## Improved error handling and observability of volume expansion
6270

@@ -66,21 +74,23 @@ There are new API fields available in PVC objects which you can monitor to obser
6674

6775
### Improved observability of in-progress expansion
6876

69-
Users can use `pvc.status.allocatedResourceStatus['storage']` to monitor progress of their volume expansion operation. For a typical block volume, this should transition between `ControllerResizeInProgress`, `NodeResizePending` and `NodeResizeInProgress` and become nil/empty when volume expansion is finished.
77+
You can query `.status.allocatedResourceStatus['storage']` of a PVC to monitor progress of a volume expansion operation.
78+
For a typical block volume, this should transition between `ControllerResizeInProgress`, `NodeResizePending` and `NodeResizeInProgress` and become nil/empty when volume expansion has finished.
7079

7180
If for some reason, volume expansion to requested size is not feasible it should accordingly be in states like - `ControllerResizeInfeasible` or `NodeResizeInfeasible`.
7281

7382
You can also observe size towards which Kubernetes is working by watching `pvc.status.allocatedResources`.
7483

7584
### Improved error handling and reporting
7685

77-
Kubernetes should now retry your failed volume expansions at slower rate, it should make fewer requests to both cloudprovider and Kubernetes apiserver.
86+
Kubernetes should now retry your failed volume expansions at slower rate, it should make fewer requests to both storage system and Kubernetes apiserver.
7887

7988
Errors observerd during volume expansion are now reported as condition on PVC objects and should persist unlike events. Kubernetes will now populate `pvc.status.conditions` with error keys `ControllerResizeError` or `NodeResizeError` when volume expansion fails.
8089

8190
### Fixes long standing bugs in resizing workflows
8291

83-
This feature also has allowed us to fix long standing bugs in resizing workflow such as - https://github.com/kubernetes/kubernetes/issues/115294 . If you observe anything broken please report your bugs to https://github.com/kubernetes/kubernetes/issues .
92+
This feature also has allowed us to fix long standing bugs in resizing workflow such as [Kubernetes issue #115294](https://github.com/kubernetes/kubernetes/issues/115294).
93+
If you observe anything broken, please report your bugs to [https://github.com/kubernetes/kubernetes/issues](https://github.com/kubernetes/kubernetes/issues/new/choose), along with details about how to reproduce the problem.
8494

8595
Working on this feature through its lifecycle was challenging and it wouldn't have been possible to reach GA
8696
without feedback from [@msau42](https://github.com/msau42), [@jsafrane](https://github.com/jsafrane) and [@xing-yang](https://github.com/xing-yang).

0 commit comments

Comments
 (0)