Skip to content

Update document about error handling #5462

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
33 changes: 33 additions & 0 deletions keps/sig-storage/3751-volume-attributes-class/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,9 @@
- [Delete PVC](#delete-pvc)
- [Modify PVC](#modify-pvc)
- [Implementation & Handling Failure](#implementation--handling-failure)
- [Handling of non-final errors](#handling-of-non-final-errors)
- [Handling of final errors](#handling-of-final-errors)
- [Handling of infeasible errors](#handling-of-infeasible-errors)
- [Test Plan](#test-plan)
- [Prerequisite testing updates](#prerequisite-testing-updates)
- [Unit tests](#unit-tests)
Expand Down Expand Up @@ -690,6 +693,36 @@ ModifyVolume is only allowed on bound PVCs. Under the ModifyVolume call, it will
### Implementation & Handling Failure

VolumeAttributesClass parameters can be considered as best-effort parameters, the CSI driver should report the status of bad parameters as INVALID_ARGUMENT and the volume would fall back to a workable default configuration.
It is expected that CSI driver will not apply partial application of parameters if one or more parameters are invalid. We are proposing CSI spec change to tighten the wording for this - https://github.com/container-storage-interface/spec/pull/597

In general Kubernetes sidecars classify all CSI errors in three different classes. Namely:

- Non-final errors (such as `DeadlineExceeded`), which indicate a transient error, which may be because of timeout or some other temporary failure. The CSI driver may have already volume modification in-progress.
- Final errors (such as `Internal`), which indicate a definitive error from CSI driver and this typically means CSI driver is no longer processing this request after error is returned.
- Infeasible Errors (e.g., `InvalidArgument`): This is a subset of final errors indicating the request itself is invalid and will never succeed.

#### Handling of non-final errors

In general `external-resizer` will not attempt modification to new VAC, if modification to previous applied VAC is failing with some kind of non-final error.

This policy safeguards against potential quota abuse that can occur if users time their requests strategically.
`external-resizer` will only permit transition to new VAC, only if transition to previous VAC has succeeded or failed with a final error. This is one of the main reasons - `targetVolumeAttributesClassName` field is required in pvc's status.

In other words, `external-resizer` will keep working towards `targetVolumeAttributesClassName` for non-final errors regardless of user specified change in `.spec.volumeAttributeClassName`.

#### Handling of final errors

If volume modification to a VAC is failing with a final error, then users can try rolling forward to a new VAC. This will reset `targetVolumeAttributesClassName` once external-resizer starts processing the request. If user sets VAC to `nil` or `empty` while previous modification to a VAC failed with a final error, then external-resizer
should keep working towards reconciling to previously specified VAC, now recorded in `targetVolumeAttributesClassName`.

#### Handling of infeasible errors

If volume modification to a VAC is failing with infeasible error, then users can either set VAC to previously specified value in `status.currentVolumeAttributesClass` or set to `nil` if no VAC was specified. In both the cases, external-resizer will stop trying to reconcile the volume modification.

Please note if PVC already had a `currentVolumeAttributesClass` in its status, then setting VAC to `nil` is not allowed.

It is possible that if there were one or more partial volume modifications that happened before on the volume, they will not be undone when this happens because for infeasible errors no `ControllerModifyVolume` will be called when user resets the VAC. This mechanism exists only to prevent perpetual call to `ControllerModifyVolume` for volume modifications which are never going to succeed. Storage providers and users are recommended to roll forward to different VAC, if desired behaviour is resetting the VAC to some pre-specified value for all `mutable_parameters`.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a developer of CSI driver, and a cluster admin of our infra, I still cannot accept this.

they will not be undone

This means, when I specify my volume to have 2000 IOPS, and PVC.status tells me the reconcile finishes, but my volume may actually have only 1000 IOPS. And I can never observe the abnormal from Kubernetes API, until something more serious goes wrong:

  • If the performance is higher than expected, it will incur extra cost
  • If the performance is lower than expected, it can result in unexpected latency to workload, even catastrophic system failure
  • If we add topology integration to VAC, it also means then PV nodeAffinity can be out-of-sync, which will cause Pod pending or stuck due to scheduled to wrong node.
  • It is also subject to potential quota abuse

This mechanism exists only to prevent perpetual call

This does not make sense. After VAC is rolled back, if the volume is already at the desired state, SP should just return OK and do nothing. There is no reason the call will be perpetual. If the volume is actually partially modified, and cannot be rolled back by SP, it is better to let user notice this, rather than just hide it.

We never end the reconcile process with an failed gRPC call. e.g.

  • We only delete VolumeAttachment if ControllerUnpublishVolume returns OK.
  • We only clear PVC.Status.AllocatedResourceStatuses if ControllerExpandVolume returns OK.

So we should do the same, only clear PVC.Status.ModifyVolumeStatus if ControllerModifyVolume returns OK, and never cancel modification.

Storage providers and users are recommended to roll forward to different VAC, if desired behaviour is resetting the VAC to some pre-specified value for all mutable_parameters.

In Kubernetes, spec specifies the desired state, not action. Which ever state the user specifies, we should try to bring the underlying system to the specified state. It would be ridiculous if two VAC specifies the same state, but only one of them will work.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change is being made in compliance with the CSI spec change container-storage-interface/spec#597. "Infeasible" in this KEP refers to the errors in the CSI spec such as NOT_FOUND and INVALID_ARGUMENT for which the SP must not have made any changes. The proposed sidecar behavior depends on the SP's spec compliance for safety in this case. If the SP returns a final non-infeasible error (such as INTERNAL) then the wording for final errors applies, and it's up to cluster operators to ensure that any new VAC will overwrite any partially applied settings from an older VAC.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my comment at container-storage-interface/spec#597 (comment)
Basically, if some other errors happened before INVALID_ARGUMENT, we still not sure about the volume state.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're taking the stance that, if that happens then the SP has violated the spec, either intentionally or through a bug. In either case, undefined behavior could occur. It is incumbent on the SP to prevent the case you describe (returning INVALID_ARGUMENT after making some changes) in order to remain spec complaint. We realize this is a tightening of the spec, which is why we want to do it while this RPC is still tagged alpha.

Copy link

@huww98 huww98 Aug 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, we are discussing the same case as you replied at container-storage-interface/spec#597 (comment)

I'd state the issue here again. Say initially the PVC has VAC A, and user modified it to B:

retry case

  1. CO calls ControllerModifyVolume(B)
  2. the request failed, maybe due to process restart, or network error, etc
  3. CO retries ControllerModifyVolume(B)
  4. SP return INVALID_ARGUMENT

With the tightened spec, SP cannot have made any change between 3 and 4. But SP can still make change between 1 and 2, and not violating the tightened spec. So assuming the volume is not changed at step 4 is not safe for CO.

A -> B -> C case

  1. CO calls ControllerModifyVolume(B)
  2. the request failed with final error
  3. User changed his mind and set VAC name to C
  4. CO calls ControllerModifyVolume(C)
  5. SP return INVALID_ARGUMENT

Similar to the previous case, SP can still make change between 1 and 2. Now if user try to revert to A, as proposed in this PR, "(partial volume modifications) will not be undone". So the volume will get into the state I described previously: the parameters of A is not applied, but user cannot detect this from Kubernetes API.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd state the issue here again. Say initially the PVC has VAC A, and user modified it to B:

retry case

1. CO calls ControllerModifyVolume(B)
2. the request failed, maybe due to process restart, or network error, etc
3. CO retries ControllerModifyVolume(B)
4. SP return INVALID_ARGUMENT

With the tightened spec, SP cannot have made any change between 3 and 4. But SP can still make change between 1 and 2, and not violating the tightened spec. So assuming the volume is not changed at step 4 is not safe for CO.

If the SP returns INVALID_ARGUMENT in (4), then the there must have been some parameter in B which was invalid. What is invalid should not change over time. That's part of what we mean by infeasible -- it's that the passage of time won't change whether the request succeeds or not. Assuming the SP adheres to this logic, then we know nothing could have changed between 1 and 2 either, regardless of the error code returned.

A -> B -> C case

1. CO calls ControllerModifyVolume(B)
2. the request failed with final error
3. User changed his mind and set VAC name to C
4. CO calls ControllerModifyVolume(C)
5. SP return INVALID_ARGUMENT

Similar to the previous case, SP can still make change between 1 and 2. Now if user try to revert to A, as proposed in this PR, "(partial volume modifications) will not be undone". So the volume will get into the state I described previously: the parameters of A is not applied, but user cannot detect this from Kubernetes API.

In this case, C contained something the SP couldn't understand, and so the SP made no changes between (2) and (5). The actual state of the volume is somewhere between A and B, and the status reflects A. I'm acutally unsure which quota is consumed. I assume it's at least quota from A, and maybe also B and or C (although it can't be all 3).

In any case, this is an admin error, and undefined results are expected if an admin puts bad parameters into a VAC and then does this kind of manipulation. I'm not arguing that the results are perfect or desirable in this case, I'm arguing that it's safe from intentional exploitation (assuming the admin creates sane VACs and quotas), and that it's as good as we can do considering that nothing in the CO remembers the old state, so only forward progress is possible.

Copy link

@huww98 huww98 Aug 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

undefined results are expected if an admin puts bad parameters into a VAC

Whether the parameters are valid may depends on many aspects, such as the region/zone that the volume locate at, the current status of the volume, etc. It is not something admin can completely avoided in advance.

it's as good as we can do considering that nothing in the CO remembers the old state, so only forward progress is possible.

Please take a look at my proposal at https://docs.google.com/document/d/1VebyLSRcngn3_9wqaF5OUmxu60gwX7dN8euXRUqJSXA/edit?usp=sharing

Basically, I'm proposing to call ControllerModifyVolume(A) again in this case, to at least ensure the parameters explicitly specified for A are applied.

Copy link

@huww98 huww98 Aug 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assuming the SP adheres to this logic, then we know nothing could have changed between 1 and 2 either, regardless of the error code returned.

Yes, I agree that SP should do this. But assuming is not something we can rely on. Lets image that SP can do this:

changed, err := modify_part_1()
if err != nil { ... }

err = modify_part_2()
if err != nil {
    if changed { return INTERNAL }
    else { return INVALID_PARAMETER }
}

At first attempt, modify_part_1 succeeded, but modify_part_2 does not. at second attempt, SP would return INVALID_PARAMETER. I agree this is not recommended. But it is compliant to CSI spec.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"With the tightened spec, SP cannot have made any change between 3 and 4. But SP can still make change between 1 and 2, and not violating the tightened spec. So assuming the volume is not changed at step 4 is not safe for CO."

If 2 is a non final error, we do not let a new modification to C happen.



### Test Plan

Expand Down