You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: keps/sig-storage/1790-recover-resize-failure/README.md
+63-61Lines changed: 63 additions & 61 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -224,8 +224,7 @@ The complete expansion and recovery flow of both control-plane and kubelet is do
224
224
### Risks and Mitigations
225
225
226
226
- Once expansion is initiated, the lowering of requested size is only allowed upto a value *greater* than `pvc.Status`. It is not possible to entirely go back to previously requested size. This should not be a problem however in-practice because user can retry expansion with slightly higher value than `pvc.Status` and still recover from previously failing expansion request.
227
-
228
-
227
+
229
228
## Graduation Criteria
230
229
231
230
**Alpha* in 1.23 behind `RecoverExpansionFailure` feature gate with set to a default of `false`.
@@ -258,12 +257,12 @@ The complete expansion and recovery flow of both control-plane and kubelet is do
258
257
of a node? (Do not assume `Dynamic Kubelet Config` feature is enabled).
259
258
260
259
***Does enabling the feature change any default behavior?**
261
-
Allow users to reduce size of pvc in `pvc.spec.resources`. In general this was not permitted before
260
+
Allow users to reduce size of pvc in `pvc.spec.resources`. In general this was not permitted before,
262
261
so it should not break any of existing automation. This means that if `pvc.Status.AllocatedResources` is available it will be
263
262
used for calculating quota.
264
263
265
-
To facilitate older kubelets - external resize controller will set `pvc.Status.ResizeStatus` to "''" after entire expansion process is complete. This will ensure that `ResizeStatus` is updated
266
-
after expansion is complete even with older kubelets. No recovery from expansion failure will be possible in this case and the workaround will be removed once feature goes GA.
264
+
To facilitate older kubelet - external resize controller will set `pvc.Status.ResizeStatus` to "''" after entire expansion process is complete. This will ensure that `ResizeStatus` is updated
265
+
after expansion is complete even with older kubelet. No recovery from expansion failure will be possible in this case and the workaround will be removed once feature goes GA.
267
266
268
267
One more thing to keep in mind is - enabling this feature in kubelet while keeping it disabled in external-resizer will cause
269
268
all volume expansions operations to get stuck(similar thing will happen when feature moves to beta and kubelet is newer but external-resizer sidecar is older).
@@ -288,73 +287,70 @@ after expansion is complete even with older kubelets. No recovery from expansion
288
287
289
288
### Rollout, Upgrade and Rollback Planning
290
289
291
-
_This section must be completed when targeting beta graduation to a release._
292
-
293
290
***How can a rollout fail? Can it impact already running workloads?**
294
291
This change should not impact existing workloads and requires user interaction via reducing pvc capacity.
295
292
296
293
***What specific metrics should inform a rollback?**
294
+
No specific metric but if expansion of PVCs are being stuck (can be verified from `pvc.Status.Conditions`)
295
+
then user should plan a rollback.
297
296
298
297
***Were upgrade and rollback tested? Was upgrade->downgrade->upgrade path tested?**
299
-
Describe manual testing that was done and the outcomes.
300
-
Longer term, we may want to require automated upgrade/rollback tests, but we
301
-
are missing a bunch of machinery and tooling and do that now.
298
+
We have not fully tested upgrade and rollback but as part of beta process we will have it tested.
302
299
303
300
***Is the rollout accompanied by any deprecations and/or removals of features,
304
301
APIs, fields of API types, flags, etc.?**
305
-
Even if applying deprecation policies, they may still surprise some users.
302
+
This feature deprecates no existing functionality.
306
303
307
304
### Monitoring requirements
308
305
309
306
_This section must be completed when targeting beta graduation to a release._
310
307
311
308
***How can an operator determine if the feature is in use by workloads?**
312
-
Ideally, this should be a metrics. Operations against Kubernetes API (e.g.
313
-
checking if there are objects with field X set) may be last resort. Avoid
314
-
logs or events for this purpose.
315
-
309
+
Any volume that has been recovered will emit a metric: `operation_operation_volume_recovery_total{state='success', volume_name='pvc-abce'}`.
310
+
316
311
***What are the SLIs (Service Level Indicators) an operator can use to
0 commit comments