diff --git a/keps/sig-storage/3751-volume-attributes-class/README.md b/keps/sig-storage/3751-volume-attributes-class/README.md index 4b9c6816431..59a16b32615 100644 --- a/keps/sig-storage/3751-volume-attributes-class/README.md +++ b/keps/sig-storage/3751-volume-attributes-class/README.md @@ -35,6 +35,11 @@ - [Delete PVC](#delete-pvc) - [Modify PVC](#modify-pvc) - [Implementation & Handling Failure](#implementation--handling-failure) + - [Handling of non-final errors](#handling-of-non-final-errors) + - [Handling of final errors](#handling-of-final-errors) + - [Transition from VAC(A) to VAC(B)](#transition-from-vaca-to-vacb) + - [Transition from nil-VAC to VAC(A)](#transition-from-nil-vac-to-vaca) + - [Handling of infeasible errors](#handling-of-infeasible-errors) - [Test Plan](#test-plan) - [Prerequisite testing updates](#prerequisite-testing-updates) - [Unit tests](#unit-tests) @@ -690,6 +695,43 @@ ModifyVolume is only allowed on bound PVCs. Under the ModifyVolume call, it will ### Implementation & Handling Failure VolumeAttributesClass parameters can be considered as best-effort parameters, the CSI driver should report the status of bad parameters as INVALID_ARGUMENT and the volume would fall back to a workable default configuration. +It is expected that CSI driver will not apply partial application of parameters if one or more parameters are invalid. We are proposing CSI spec change to tighten the wording for this - https://github.com/container-storage-interface/spec/pull/597 + +In general Kubernetes sidecars classify all CSI errors in three different classes. Namely: + +- Non-final errors (such as `DeadlineExceeded`), which indicate a transient error, which may be because of timeout or some other temporary failure. The CSI driver may have already volume modification in-progress. +- Final errors (such as `Internal`), which indicate a definitive error from CSI driver and this typically means CSI driver is no longer processing this request after error is returned. +- Infeasible Errors (e.g., `InvalidArgument`): This is a subset of final errors indicating the request itself is invalid and will never succeed. + +#### Handling of non-final errors + +In general `external-resizer` will not attempt modification to new VAC, if modification to previous applied VAC is failing with some kind of non-final error. + +This policy safeguards against potential quota abuse that can occur if users time their requests strategically. +`external-resizer` will only permit transition to new VAC, only if transition to previous VAC has succeeded or failed with a final error. This is one of the main reasons - `targetVolumeAttributesClassName` field is required in pvc's status. + +In other words, `external-resizer` will keep working towards `targetVolumeAttributesClassName` for non-final errors regardless of user specified change in `.spec.volumeAttributeClassName`. + +#### Handling of final errors + +##### Transition from VAC(A) to VAC(B) + +If volume modification to a VAC is failing with a final error and users wishes to either cancel and move to a different VAC, then they MUST first set VAC of PVC to A. Only after transition to original VAC(A) is successful, is the user allowed to move to a different VAC. + +##### Transition from nil-VAC to VAC(A) + +If volume modification to a VAC is failing with final but not-infeasible error, then external-resizer will keep trying to reconcile to VAC(A), regardless of any user initiated changes in `.spec.volumeAttributeClassName`. Only after transition to VAC(A) is successful, the user is allowed to move the PVC to a different VAC. + +#### Handling of infeasible errors + +If volume modification to a VAC is failing with infeasible error, then users can either set VAC to previously specified value in `status.currentVolumeAttributesClass` or set to `nil` if no VAC was specified. In both the cases, external-resizer will stop trying to reconcile the volume modification. + +Please note if PVC already had a `currentVolumeAttributesClass` in its status, then setting VAC to `nil` is not allowed. + +User can also set VAC to a different VAC if transition to a VAC fails with a infeasible error. This is allowed with the assumption that, volume was not modified when previous VAC application failed with a infeasible error. + +![Error recovery flow](./modify-volume.png) + ### Test Plan diff --git a/keps/sig-storage/3751-volume-attributes-class/modify-volume.png b/keps/sig-storage/3751-volume-attributes-class/modify-volume.png new file mode 100755 index 00000000000..07892c8ddb8 Binary files /dev/null and b/keps/sig-storage/3751-volume-attributes-class/modify-volume.png differ