Skip to content

Commit 408a6e6

Browse files
committed
Add more information for new design
1 parent 7e3c9b5 commit 408a6e6

File tree

2 files changed

+21
-0
lines changed

2 files changed

+21
-0
lines changed
184 KB
Loading

keps/sig-storage/1790-recover-resize-failure/README.md

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,7 @@
99
- [Goals](#goals)
1010
- [Non-Goals](#non-goals)
1111
- [Proposal](#proposal)
12+
- [Making allocatedResourceStatus not change unnecessarily for every error in 1.31](#making-allocatedresourcestatus-not-change-unnecessarily-for-every-error-in-131)
1213
- [Making resizeStatus more general in v1.28](#making-resizestatus-more-general-in-v128)
1314
- [Implementation](#implementation)
1415
- [User flow stories](#user-flow-stories)
@@ -108,6 +109,26 @@ As part of this proposal, we are mainly proposing three changes:
108109
- NodeExpansionFailed // state set when expansion has failed in kubelet with a terminal error. Transient errors don't set NodeExpansionFailed.
109110
3. Update quota code to use `max(pvc.Spec.Resources, pvc.Status.AllocatedResources)` when evaluating usage for PVC.
110111

112+
### Making allocatedResourceStatus not change unnecessarily for every error in 1.31
113+
114+
We are trying to reduce number of state changes which can happen when volume expansion on either the kubelet or external-resizer fails.
115+
116+
We are considering following gRPC error codes as "infeasible":
117+
- INVALID_ARGUMENt
118+
- OUT_OF_RANGE
119+
- NOT_FOUND
120+
121+
In the external-resizer if `ControllerExpandVolume` fails with any of the error codes above, controller expansion will be marked as failed and resizing will be retried at slower rate. For all the other errors - an event will be generated and a condition will be added to PVC that expansion has failed, but state change will not be recorded in `allocatedResourceStatus`.
122+
123+
124+
On the node side - `allocatedResourceStatus` will only be updated with failed expansion if:
125+
- `NodeExpandVolume` failed with one of the `infeasible` error codes from above.
126+
- `NodeExpandVolume` failed with a final error and there is a pending pvc size request change from the user.
127+
128+
This will allow external-resizer to recover safely from node expansion failures too.
129+
130+
![New flow kubelet](./Expanding volume - Kubelet Loop.png)
131+
111132
### Making resizeStatus more general in v1.28
112133

113134
After [some discussion](https://github.com/kubernetes/kubernetes/pull/116335#issuecomment-1624566731) with sig-storage folks, we are proposing that we rename `pvc.Status.ResizeStatus` to `pvc.Status.AllocatedResourceStatus` and make it a map.

0 commit comments

Comments
 (0)