Skip to content

Commit 1a045a2

Browse files
committed
Add VAC binding details
1 parent bd26125 commit 1a045a2

File tree

1 file changed

+75
-1
lines changed
  • keps/sig-storage/3751-volume-attributes-class

1 file changed

+75
-1
lines changed

keps/sig-storage/3751-volume-attributes-class/README.md

Lines changed: 75 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -25,6 +25,9 @@
2525
- [4. Add new CSI API ControllerModifyVolume, when there is a change of VolumeAttributesClass in PVC, external-resizer triggers a ControllerModifyVolume operation against a CSI endpoint. A Controller Plugin MUST implement this RPC call if it has MODIFY_VOLUME capability.](#4-add-new-csi-api-controllermodifyvolume-when-there-is-a-change-of-volumeattributesclass-in-pvc-external-resizer-triggers-a-controllermodifyvolume-operation-against-a-csi-endpoint-a-controller-plugin-must-implement-this-rpc-call-if-it-has-modify_volume-capability)
2626
- [5. Add new operation metrics for ModifyVolume operations](#5-add-new-operation-metrics-for-modifyvolume-operations)
2727
- [Design Details](#design-details)
28+
- [Binding of PV and PVC](#binding-of-pv-and-pvc)
29+
- [Find a PV matching the PVC](#find-a-pv-matching-the-pvc)
30+
- [Perform the binding](#perform-the-binding)
2831
- [VolumeAttributesClass Deletion Protection](#volumeattributesclass-deletion-protection)
2932
- [Create VolumeAttributesClass](#create-volumeattributesclass)
3033
- [Delete VolumeAttributesClass](#delete-volumeattributesclass)
@@ -435,6 +438,69 @@ Operation metrics from [csiOperationsLatencyMetric](https://github.com/kubernete
435438

436439
## Design Details
437440

441+
#### Binding of PV and PVC
442+
443+
Creating the bidirectional binding between a PV and PVC is delicate, because
444+
there is no transaction support in the kubernetes API to do the whole thing
445+
atomically. Binding with volume attributes adds to this process but does not
446+
fundamentally change the 4 steps performed. To support the VolumeAttributesClass
447+
the binding process will become the following.
448+
449+
A key assumption is that the PVC needs to be immutable until binding is
450+
complete. This is to avoid race conditions that could, among other things,
451+
violate quota. For example, suppose there are two VACs, `fast` and `slow`. The
452+
latter is cheap and unrestricted, but `fast` has limited quota requirements. It
453+
is not possible to downgrade `fast` to `slow`. If
454+
VAC could be changed before binding is complete, a user could create a PVC with
455+
`fast` and change it to `slow` during the binding process. This race with
456+
binding could end up with a `fast` PV bound to the PVC with a `slow` VAC;
457+
because it is not possible to downgrade the PV will remain `fast`. But since VAC
458+
quota only looks at the PVC's VAC, the user is now able to create additional
459+
`fast` volumes beyond their quota.
460+
461+
An alternative of requiring VAC to match at all steps of binding means that if
462+
the VAC is changed by mistake, a PVC and PV could be incompletely bound (for
463+
example, succeed steps 1-3 below but fail at step 4). There is no way to undo
464+
binding, so even if the PV is deleted the PV would be unavailable until the
465+
cluster administrator manually intervenes.
466+
467+
##### Find a PV matching the PVC
468+
469+
In the matching process, a PVC is only matched with a PV if their VACs match:
470+
they are the same class, or are both unspecified. Either a nil or an empty VAC
471+
is considered unspecified, so for example a PVC with a nil VAC **can** match
472+
with a PV whose VAC is the empty string.
473+
474+
The matching process then become the following.
475+
476+
If the PVC is pre-bound to a PV, check if node affinity is satisfied, and the
477+
PVC VAC matches the PV VAC. If so select it, otherwise no PV is selected.
478+
479+
Otherwise, iterate through all unbound PVs. Check if the storage class and
480+
volume attributes class of the PV matches the PVC, and the capacity is equal to
481+
or greater than that requested by the PVC. If so, add it to the list of
482+
candidates. After all unbound PVs have been examined, select the candidate with
483+
the smallest capacity.
484+
485+
##### Perform the binding
486+
487+
1. Bind the PV to the PVC by updating `pv.Spec.ClaimRef` to point to the PVC.
488+
1. Update the PV volume phase to Bound.
489+
1. Bind the PVC to the PV by updating `pvc.Spec.VolumeName` to point to the PV.
490+
1. Update the PVC claim status, including current volume attributes class, phase
491+
and capacity.
492+
493+
The changes from when there was no VolumeAttributesClass are:
494+
* Change the matching algorithm to be aware of volume attributes.
495+
* Change the last step in binding to update the PVC status indicating the
496+
current volume attributes.
497+
498+
Otherwise the algorithm is unchanged.
499+
500+
As discussed above, this assumes the PVC VAC is immutable until it is bound to a
501+
PVC. If a PVC cannot be bound due to an otherwise matching PV having the wrong
502+
VAC, the PVC must be deleted and re-created.
503+
438504
#### VolumeAttributesClass Deletion Protection
439505

440506
While a VolumeAttributesClass is referenced by any PVC, we will prevent the object from being deleted by adding a finalizer([reference](https://github.com/kubernetes/kubernetes/blob/master/plugin/pkg/admission/storage/storageobjectinuseprotection/admission.go)).
@@ -554,7 +620,15 @@ Deleting a PVC will trigger a list PVCs call and decide if we need to remove the
554620

555621
![VolumeAttributesClass Update Flow](./VolumeAttributesClass-Flow.png)
556622

557-
Since VolumeAttributesClass is **immutable**, to update the parameters, the end user can modify the PVC object to set a different VolumeAttributesClass. If the existing VolumeAttributesClass cannot satisfy the end user’s use case, the end user needs to contact the cluster administrator to create a new VolumeAttributesClass.
623+
Since VolumeAttributesClass is **immutable**, to update the parameters, the end
624+
user can modify the PVC object to set a different VolumeAttributesClass. If the
625+
existing VolumeAttributesClass cannot satisfy the end user’s use case, the end
626+
user needs to contact the cluster administrator to create a new
627+
VolumeAttributesClass. The VAC cannot be changed until the PVC is bound to a PV
628+
(see [binding](#binding-of-pv-and-pvc), above). This is enforced in API
629+
validation in the same way as storage capacity, see
630+
[ValidatePersistentVolumeClaimUpdate](https://github.com/kubernetes/kubernetes/blob/master/pkg/apis/core/validation/validation.go#L2389).
631+
558632

559633
**Watching changes in the PVC object**, if the PVC’s VolumeAttributesClass changes, it will trigger a ModifyVolume call.
560634

0 commit comments

Comments
 (0)