You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@@ -107,28 +108,68 @@ As part of this proposal, we are mainly proposing three changes:
107
108
- NodeExpansionFailed // state set when expansion has failed in kubelet with a terminal error. Transient errors don't set NodeExpansionFailed.
108
109
3. Update quota code to use `max(pvc.Spec.Resources, pvc.Status.AllocatedResources)` when evaluating usage for PVC.
109
110
111
+
### Making resizeStatus more general in v1.27
112
+
113
+
After some discussion with sig-storage folks and to accommodate changes coming from https://github.com/kubernetes/enhancements/issues/3751 we are proposing that we rename `pvc.Status.ResizeStatus` to `pvc.Status.AllocatedResourceStatus` and make it a map.
We propose that by relaxing validation on PVC update to allow users to reduce `pvc.Spec.Resources`, it becomes possible to cancel previously issued expansion requests or retry expansion with a lower value if previous request has not been successful. In general - we know that volume plugins are designed to never perform actual shrinking of the volume, for both in-tree and CSI volumes. Moreover if a previously issued expansion has been successful and user
113
154
reduces the PVC request size, for both CSI and in-tree plugins they are designed to return a successful response with NO-OP. So, reducing requested size will be a safe operation and will never result in data loss or actual shrinking of volume.
114
155
115
156
We however do have a problem with quota calculation because if a previously issued expansion is successful but is not recorded(or partially recorded) in api-server and user reduces requested size of the PVC, then quota controller will assume it as actual shrinking of volume and reduce used storage size by the user(incorrectly). Since we know actual size of the volume only after performing expansion(either on node or controller), allowing quota to be reduced on PVC size reduction will allow an user to abuse the quota system.
116
157
117
-
To solve aforementioned problem - we propose that, a new field will be added to PVC, called `pvc.Status.AllocatedResources`. When user expands the PVC, and when expansion-controller starts volume expansion - it will set `pvc.Status.AllocatedResources` to user requested value in `pvc.Spec.Resources` before performing expansion and it will set `pvc.Status.ResizeStatus` to `ControllerExpansionInProgress`. The quota calculation will be updated to use `max(pvc.Spec.Resources, pvc.Status.AllocatedResources)` which will ensure that abusing quota will not be possible.
158
+
To solve aforementioned problem - we propose that, a new field will be added to PVC, called `pvc.Status.AllocatedResources`. When user expands the PVC, and when expansion-controller starts volume expansion - it will set `pvc.Status.AllocatedResources` to user requested value in `pvc.Spec.Resources` before performing expansion and it will set `pvc.Status.AllocatedResourceStatus[storage]` to `ControllerResizeInProgress`. The quota calculation will be updated to use `max(pvc.Spec.Resources, pvc.Status.AllocatedResources)` which will ensure that abusing quota will not be possible.
118
159
119
-
Resizing operation in external resize controller will always work towards full-filling size recorded in `pvc.Status.AllocatedResources` and only when previous operation has finished(i.e `pvc.Status.ResizeStatus` is nil) or when previous operation has failed with a terminal error - it will use new user requested value from `pvc.Spec.Resources`.
160
+
Resizing operation in external resize controller will always work towards full-filling size recorded in `pvc.Status.AllocatedResources` and only when previous operation has finished(i.e `pvc.Status.AllocatedResourceStatus[storage]` is nil) or when previous operation has failed with a terminal error - it will use new user requested value from `pvc.Spec.Resources`.
120
161
121
-
Kubelet on the other hand will only expand volumes for which `pvc.Status.ResizeStatus` is in `NodeExpansionPending` or `NodeExpansionInProgress` state and `pv.Spec.Cap > pvc.Status.Cap`. If a volume expansion fails in kubelet with a terminal error(which will set `NodeExpansionFailed` state) - then it must wait for resize controller in external-resizer to reconcile the state and put it back in `NodeExpansionPending`.
162
+
Kubelet on the other hand will only expand volumes for which `pvc.Status.AllocatedResourceStatus[storage]` is in `NodeResizePending` or `NodeResizeInProgress` state and `pv.Spec.Cap > pvc.Status.Cap`. If a volume expansion fails in kubelet with a terminal error(which will set `NodeResizeFailed` state) - then it must wait for resize controller in external-resizer to reconcile the state and put it back in `NodeResizePending`.
122
163
123
164
When user reduces `pvc.Spec.Resources`, expansion-controller will set `pvc.Status.AllocatedResources` to lower value only if one of the following is true:
124
165
125
-
1. If `pvc.Status.ResizeStatus` is `ControllerExpansionFailed` (indicating that previous expansion to last known `allocatedResources` failed with a final error) and previous control-plane has not succeeded.
126
-
2. If `pvc.Status.ResizeStatus` is `NodeExpansionFailed` and SP supports node-only expansion (indicating that previous expansion to last known `allocatedResources` failed on node with a final error).
127
-
3. If `pvc.Status.ResizeStatus` is `nil` or `empty` and previous `ControllerExpandVolume` has not succeeded.
166
+
1. If `pvc.Status.AllocatedResourceStatus[storage]` is `ControllerResizeFailed` (indicating that previous expansion to last known `allocatedResources` failed with a final error) and previous control-plane has not succeeded.
167
+
2. If `pvc.Status.AllocatedResourceStatus[storage]` is `NodeResizeFailed` and SP supports node-only expansion (indicating that previous expansion to last known `allocatedResources` failed on node with a final error).
168
+
3. If `pvc.Status.AllocatedResourceStatus[storage]` is `nil` or `empty` and previous `ControllerExpandVolume` has not succeeded.
128
169
129
170

130
171
131
-
**Note**: Whenever resize controller or kubelet modifies `pvc.Status` (such as when setting both `AllocatedResources` and `ResizeStatus`) - it is expected that all changes to `pvc.Status` are submitted as part of same patch request to avoid race conditions.
172
+
**Note**: Whenever resize controller or kubelet modifies `pvc.Status` (such as when setting both `AllocatedResources` and `pvc.Status.AllocatedResourceStatus`) - it is expected that all changes to `pvc.Status` are submitted as part of same patch request to avoid race conditions.
132
173
133
174
The complete expansion and recovery flow of both control-plane and kubelet is documented in attached PDF. The attached pdf - documents complete volume expansion flow via state diagrams and is much more exhaustive than the text above.
134
175
@@ -142,9 +183,9 @@ The complete expansion and recovery flow of both control-plane and kubelet is do
142
183
- User increases 10Gi PVC to 100Gi by changing - `pvc.spec.resources.requests["storage"] = "100Gi"`.
143
184
- Quota controller uses `max(pvc.Status.AllocatedResources, pvc.Spec.Resources)` and adds `90Gi` to used quota.
144
185
- Expansion controller starts expanding the volume and sets `pvc.Status.AllocatedResources` to `100Gi`.
145
-
- Expansion controller also sets `pvc.Status.ResizeStatus` to `ControllerExpansionInProgress`.
186
+
- Expansion controller also sets `pvc.Status.AllocatedResourceStatus['storage']` to `ControllerResizeInProgress`.
146
187
- Expansion to 100Gi fails and hence `pv.Spec.Capacity` and `pvc.Status.Capacity `stays at 10Gi.
147
-
- Expansion controller sets `pvc.Status.ResizeStatus` to `ControllerExpansionFailed`.
188
+
- Expansion controller sets `pvc.Status.AllocatedResourceStatus['storage']` to `ControllerResizeFailed`.
148
189
- User requests size to 20Gi.
149
190
- Expansion controler notices that previous expansion to last known `allocatedresources` failed, so it sets new `allocatedResources` to `20G`
150
191
- Expansion succeeds and `pvc.Status.Capacity` and `pv.Spec.Capacity` report new size as `20Gi`.
@@ -154,31 +195,31 @@ The complete expansion and recovery flow of both control-plane and kubelet is do
154
195
- User increases 10Gi PVC to 100Gi by changing - `pvc.spec.resources.requests["storage"] = "100Gi"`
155
196
- Quota controller uses `max(pvc.Status.AllocatedResources, pvc.Spec.Resources)` and adds `90Gi` to used quota.
156
197
- Expansion controller starts expanding the volume and sets `pvc.Status.AllocatedResources` to `100Gi`.
157
-
- Expansion controller also sets `pvc.Status.ResizeStatus` to `ControllerExpansionInProgress`.
198
+
- Expansion controller also sets `pvc.Status.AllocatedResourceStatus['storage']` to `ControllerResizeInProgress`.
158
199
- Since expansion operations in control-plane are NO-OP, expansion in control-plane succeeds and `pv.Spec` is set to `100G`.
159
-
- Expansion controller also sets `pvc.Status.ResizeStatus` to `NodeExpansionPending`.
160
-
- Expansion starts on the node and kubelet sets `pvc.Status.ResizeStatus` to `NodeExpansionInProgress`.
200
+
- Expansion controller also sets `pvc.Status.AllocatedResourceStatus['storage']` to `NodeResizePending`.
201
+
- Expansion starts on the node and kubelet sets `pvc.Status.AllocatedResourceStatus['storage']` to `NodeResizeInProgress`.
161
202
- Expansion fails on the node with a final error.
162
-
- Kubelet sets `pvc.Status.ResizeStatus` to `NodeExpansionFailed`.
163
-
- Since pvc has `pvc.Status.ResizeStatus` set to `NodeExpansionFailed` - kubelet will stop retrying node expansion.
164
-
- At this point Kubelet will wait for `pvc.Status.ResizeStatus` to be `NodeExpansionPending`.
203
+
- Kubelet sets `pvc.Status.AllocatedResourceStatus['storage']` to `NodeResizeFailed`.
204
+
- Since pvc has `pvc.Status.AllocatedResourceStatus['storage']` set to `NodeResizeFailed` - kubelet will stop retrying node expansion.
205
+
- At this point Kubelet will wait for `pvc.Status.AllocatedResourceStatus['storage']` to be `NodeResizePending`.
165
206
- User requests size to 20Gi.
166
207
- Expansion controller starts expanding the volume and sees that last expansion failed on the node and driver does not have control-plane expansion.
167
208
- Expansion controller sets `pvc.Status.AllocatedResources` to `20G`.
168
-
- Expansion controller also sets `pvc.Status.ResizeStatus` to `ControllerExpansionInProgress`.
209
+
- Expansion controller also sets `pvc.Status.AllocatedResourceStatus['storage']` to `ControllerResizeInProgress`.
169
210
- Since expansion operations in control-plane are NO-OP, expansion in control-plane succeeds and `pv.Spec` is set to `20G`.
170
211
- Expansion succeed on the node with latest `allocatedResources` and `pvc.Status.Size` is set to `20G`.
171
-
- Expansion controller also sets `pvc.Status.ResizeStatus` to `NodeExpansionPending`.
212
+
- Expansion controller also sets `pvc.Status.AllocatedResourceStatus['storage']` to `NodeResizePending`.
172
213
- Kubelet can now retry expansion and expansion on node succeeds.
173
-
- Kubelet sets `pvc.Status.ResizeStatus` to empty string and `pvc.Status.Capacity` to new value.
214
+
- Kubelet sets `pvc.Status.AllocatedResourceStatus['storage']` to empty string and `pvc.Status.Capacity` to new value.
174
215
- Quota controller sees a reduction in used quota because `max(pvc.Spec.Resources, pvc.Status.AllocatedResources)` is 20Gi.
175
216
176
217
177
218
##### Case 3 (Malicious user)
178
219
- User increases 10Gi PVC to 100Gi by changing `pvc.spec.resources.requests["storage"] = "100Gi"`
179
220
- Quota controller uses `max(pvc.Status.AllocatedResources, pvc.Spec.Resources)` and adds `90Gi` to used quota.
180
221
- Expansion controller slowly starts expanding the volume and sets `pvc.Status.AllocatedResources` to `100Gi` (before expanding).
181
-
- Expansion controller also sets `pvc.Status.ResizeStatus` to `ControllerExpansionInProgress`.
222
+
- Expansion controller also sets `pvc.Status.AllocatedResourceStatus['storage']` to `ControllerResizeInProgress`.
182
223
- At this point -`pv.Spec.Capacity` and `pvc.Status.Capacity` stays at 10Gi until the resize is finished.
183
224
- While the storage backend is re-sizing the volume, user requests size 20Gi by changing `pvc.spec.resources.requests["storage"] = "20Gi"`
184
225
- Expansion controller notices that previous expansion to last known `allocatedresources` is still in-progress.
@@ -193,7 +234,7 @@ The complete expansion and recovery flow of both control-plane and kubelet is do
193
234
- User expands expands the PVC to 100GB.
194
235
- Quota controller uses `max(pvc.Status.AllocatedResources, pvc.Spec.Resources)` and adds `89.9GB` to used quota.
195
236
- Expansion controller starts expanding the volume and sets `pvc.Status.AllocatedResources` to `100GB` (before expanding).
196
-
- Expansion controller also sets `pvc.Status.ResizeStatus` to `ControllerExpansionInProgress`.
237
+
- Expansion controller also sets `pvc.Status.AllocatedResourceStatus['storage']` to `ControllerResizeInProgress`.
197
238
- At this point -`pv.Spec.Capacity` and `pvc.Status.Capacity` stays at 10.1GB until the resize is finished.
198
239
- while resize was in progress - expansion controler crashes and loses state.
199
240
- User reduces the size of PVC to 10.5GB.
@@ -207,11 +248,11 @@ The complete expansion and recovery flow of both control-plane and kubelet is do
207
248
- User increases 10Gi PVC to 100Gi by changing `pvc.spec.resources.requests["storage"] = "100Gi"`
208
249
- Quota controller uses `max(pvc.Status.AllocatedResources, pvc.Spec.Resources)` and adds `90Gi` to used quota.
209
250
- Expansion controller slowly starts expanding the volume and sets `pvc.Status.AllocatedResources` to `100Gi` (before expanding).
210
-
- Expansion controller also sets `pvc.Status.ResizeStatus` to `ControllerExpansionInProgress`.
251
+
- Expansion controller also sets `pvc.Status.AllocatedResourceStatus['storage']` to `ControllerResizeInProgress`.
211
252
- At this point -`pv.Spec.Capacity` and `pvc.Status.Capacity` stays at 10Gi until the resize is finished.
212
253
- While the storage backend is re-sizing the volume, user requests size 200Gi by changing `pvc.spec.resources.requests["storage"] = "200Gi"`
213
254
- Quota controller uses `max(pvc.Status.AllocatedResources, pvc.Spec.Resources)` and adds `100Gi` to used quota.
214
-
- Since `pvc.Status.ResizeStatus` is in `ControllerExpansionInProgress` - expansion controller still chooses last `pvc.Status.AllocatedResources` as new size.
255
+
- Since `pvc.Status.AllocatedResourceStatus['storage']` is in `ControllerResizeInProgress` - expansion controller still chooses last `pvc.Status.AllocatedResources` as new size.
215
256
- User reduces size back to `20Gi`.
216
257
- Quota controller uses `max(pvc.Status.AllocatedResources, pvc.Spec.Resources)` and *returns*`100Gi` to used quota.
217
258
- Expansion controller notices that previous expansion to last known `allocatedresources` is still in-progress.
@@ -261,7 +302,7 @@ The complete expansion and recovery flow of both control-plane and kubelet is do
261
302
so it should not break any of existing automation. This means that if `pvc.Status.AllocatedResources` is available it will be
262
303
used for calculating quota.
263
304
264
-
To facilitate older kubelet - external resize controller will set `pvc.Status.ResizeStatus` to "''" after entire expansion process is complete. This will ensure that `ResizeStatus` is updated
305
+
To facilitate older kubelet - external resize controller will set `pvc.Status.AllocatedResourceStatus[storage]` to "''" after entire expansion process is complete. This will ensure that `ResizeStatus` is updated
265
306
after expansion is complete even with older kubelet. No recovery from expansion failure will be possible in this case and the workaround will be removed once feature goes GA.
266
307
267
308
One more thing to keep in mind is - enabling this feature in kubelet while keeping it disabled in external-resizer will cause
@@ -406,15 +447,15 @@ _This section must be completed when targeting beta graduation to a release._
406
447
***What are other known failure modes?**
407
448
For each of them fill in the following information by copying the below template:
408
449
- No recovery is possible if volume has been expanded on control-plane and only failing on node.
409
-
- Detection: Expansion is stuck with `ResizeStatus` - `NodeExpansionPending` or `NodeExpansionFailed`.
450
+
- Detection: Expansion is stuck with `AllocatedResourceStatus[storage]` - `NodeResizePending` or `NodeResizeFailed`.
410
451
- Mitigations: This should not affect any of existing PVCs but this was already broken in some sense and if volume has been
411
452
expanded in control-plane then we can't allow users to shrink their PVCs because that would violate the quota.
412
-
- Diagnostics: Expansion is stuck with `ResizeStatus` - `NodeExpansionPending` or `NodeExpansionFailed`.
453
+
- Diagnostics: Expansion is stuck with `AllocatedResourceStatus['storage']` - `NodeResizePending` or `NodeResizeFailed`.
413
454
- Testing: There are some unit tests for this failure mode.
414
455
415
456
***What steps should be taken if SLOs are not being met to determine the problem?**
416
457
If admin notices an increase in expansion failure operations via aformentioned metrics or
417
-
by observing `pvc.Status.ResizeStatus` then:
458
+
by observing `pvc.Status.AllocatedResourceStatus['storage']` then:
418
459
- Check events on the PVC and observe why is PVC expansion failing.
419
460
- Gather logs from kubelet and external-resizer and check for problems.
420
461
@@ -425,6 +466,7 @@ _This section must be completed when targeting beta graduation to a release._
425
466
## Implementation History
426
467
427
468
- 2020-01-27 Initial KEP pull request submitted
469
+
- 2023-02-03 Changing the APIs of `pvc.Status.ResizeStatus` by renaming it to `pvc.Status.AllocatedResourceStatus` and converting it to a map.
0 commit comments