Skip to content

Commit f533296

Browse files
committed
Update document about error handling
1 parent 889a7c9 commit f533296

File tree

1 file changed

+33
-0
lines changed
  • keps/sig-storage/3751-volume-attributes-class

1 file changed

+33
-0
lines changed

keps/sig-storage/3751-volume-attributes-class/README.md

Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -35,6 +35,9 @@
3535
- [Delete PVC](#delete-pvc)
3636
- [Modify PVC](#modify-pvc)
3737
- [Implementation & Handling Failure](#implementation--handling-failure)
38+
- [Handling of non-final errors](#handling-of-non-final-errors)
39+
- [Handling of final errors](#handling-of-final-errors)
40+
- [Handling of infeasible errors](#handling-of-infeasible-errors)
3841
- [Test Plan](#test-plan)
3942
- [Prerequisite testing updates](#prerequisite-testing-updates)
4043
- [Unit tests](#unit-tests)
@@ -690,6 +693,36 @@ ModifyVolume is only allowed on bound PVCs. Under the ModifyVolume call, it will
690693
### Implementation & Handling Failure
691694

692695
VolumeAttributesClass parameters can be considered as best-effort parameters, the CSI driver should report the status of bad parameters as INVALID_ARGUMENT and the volume would fall back to a workable default configuration.
696+
It is expected that CSI driver will not apply partial application of parameters if one or more parameters are invalid. We are proposing CSI spec change to tighten the wording for this - https://github.com/container-storage-interface/spec/pull/597
697+
698+
In general Kubernetes sidecars classify all CSI errors in three different classes. Namely:
699+
700+
- Non-final errors (such as `DeadlineExceeded`), which indicate a transient error, which may be because of timeout or some other temporary failure. The CSI driver may have already volume modification in-progress.
701+
- Final errors (such as `Internal`), which indicate a definitive error from CSI driver and this typically means CSI driver is no longer processing this request after error is returned.
702+
- Infeasible Errors (e.g., `InvalidArgument`): This is a subset of final errors indicating the request itself is invalid and will never succeed.
703+
704+
#### Handling of non-final errors
705+
706+
In general `external-resizer` will not attempt modification to new VAC, if modification to previous applied VAC is failing with some kind of non-final error.
707+
708+
This policy safeguards against potential quota abuse that can occur if users time their requests strategically.
709+
`external-resizer` will only permit transition to new VAC, only if transition to previous VAC has succeeded or failed with a final error. This is one of the main reasons - `targetVolumeAttributesClassName` field is required in pvc's status.
710+
711+
In other words, `external-resizer` will keep working towards `targetVolumeAttributesClassName` for non-final errors regardless of user specified change in `.spec.volumeAttributeClassName`.
712+
713+
#### Handling of final errors
714+
715+
If volume modification to a VAC is failing with a final error, then users can try rolling forward to a new VAC. This will reset `targetVolumeAttributesClassName` once external-resizer starts processing the request. If user sets VAC to `nil` or `empty` while previous modification to a VAC failed with a final error, then external-resizer
716+
should keep working towards reconciling to previously specified VAC, now recorded in `targetVolumeAttributesClassName`.
717+
718+
#### Handling of infeasible errors
719+
720+
If volume modification to a VAC is failing with infeasible error, then users can either set VAC to previously specified value in `status.currentVolumeAttributesClass` or set to `nil` if no VAC was specified. In both the cases, external-resizer will stop trying to reconcile the volume modification.
721+
722+
Please note if PVC already had a `currentVolumeAttributesClass` in its status, then setting VAC to `nil` is not allowed.
723+
724+
It is possible that if there were one or more partial volume modifications that happened before on the volume, they will not be undone when this happens because for infeasible errors no `ControllerModifyVolume` will be called when user resets the VAC. This mechanism exists only to prevent perpetual call to `ControllerModifyVolume` for volume modifications which are never going to succeed. Storage providers and users are recommended to roll forward to different VAC, if desired behaviour is resetting the VAC to some pre-specified value for all `mutable_parameters`.
725+
693726

694727
### Test Plan
695728

0 commit comments

Comments
 (0)