|
25 | 25 | - [4. Add new CSI API ControllerModifyVolume, when there is a change of VolumeAttributesClass in PVC, external-resizer triggers a ControllerModifyVolume operation against a CSI endpoint. A Controller Plugin MUST implement this RPC call if it has MODIFY_VOLUME capability.](#4-add-new-csi-api-controllermodifyvolume-when-there-is-a-change-of-volumeattributesclass-in-pvc-external-resizer-triggers-a-controllermodifyvolume-operation-against-a-csi-endpoint-a-controller-plugin-must-implement-this-rpc-call-if-it-has-modify_volume-capability)
|
26 | 26 | - [5. Add new operation metrics for ModifyVolume operations](#5-add-new-operation-metrics-for-modifyvolume-operations)
|
27 | 27 | - [Design Details](#design-details)
|
| 28 | + - [Binding of PV and PVC](#binding-of-pv-and-pvc) |
| 29 | + - [Find a PV matching the PVC](#find-a-pv-matching-the-pvc) |
| 30 | + - [Perform the binding](#perform-the-binding) |
28 | 31 | - [VolumeAttributesClass Deletion Protection](#volumeattributesclass-deletion-protection)
|
29 | 32 | - [Create VolumeAttributesClass](#create-volumeattributesclass)
|
30 | 33 | - [Delete VolumeAttributesClass](#delete-volumeattributesclass)
|
@@ -435,6 +438,69 @@ Operation metrics from [csiOperationsLatencyMetric](https://github.com/kubernete
|
435 | 438 |
|
436 | 439 | ## Design Details
|
437 | 440 |
|
| 441 | +#### Binding of PV and PVC |
| 442 | + |
| 443 | +Creating the bidirectional binding between a PV and PVC is delicate, because |
| 444 | +there is no transaction support in the kubernetes API to do the whole thing |
| 445 | +atomically. Binding with volume attributes adds to this process but does not |
| 446 | +fundamentally change the 4 steps performed. To support the VolumeAttributesClass |
| 447 | +the binding process will become the following. |
| 448 | + |
| 449 | +A key assumption is that the PVC needs to be immutable until binding is |
| 450 | +complete. This is to avoid race conditions that could, among other things, |
| 451 | +violate quota. For example, suppose there are two VACs, `fast` and `slow`. The |
| 452 | +latter is cheap and unrestricted, but `fast` has limited quota requirements. It |
| 453 | +is not possible to downgrade `fast` to `slow`. If |
| 454 | +VAC could be changed before binding is complete, a user could create a PVC with |
| 455 | +`fast` and change it to `slow` during the binding process. This race with |
| 456 | +binding could end up with a `fast` PV bound to the PVC with a `slow` VAC; |
| 457 | +because it is not possible to downgrade the PV will remain `fast`. But since VAC |
| 458 | +quota only looks at the PVC's VAC, the user is now able to create additional |
| 459 | +`fast` volumes beyond their quota. |
| 460 | + |
| 461 | +An alternative of requiring VAC to match at all steps of binding means that if |
| 462 | +the VAC is changed by mistake, a PVC and PV could be incompletely bound (for |
| 463 | +example, succeed steps 1-3 below but fail at step 4). There is no way to undo |
| 464 | +binding, so even if the PV is deleted the PV would be unavailable until the |
| 465 | +cluster administrator manually intervenes. |
| 466 | + |
| 467 | +##### Find a PV matching the PVC |
| 468 | + |
| 469 | +In the matching process, a PVC is only matched with a PV if their VACs match: |
| 470 | +they are the same class, or are both unspecified. Either a nil or an empty VAC |
| 471 | +is considered unspecified, so for example a PVC with a nil VAC **can** match |
| 472 | +with a PV whose VAC is the empty string. |
| 473 | + |
| 474 | +The matching process then become the following. |
| 475 | + |
| 476 | +If the PVC is pre-bound to a PV, check if node affinity is satisfied, and the |
| 477 | +PVC VAC matches the PV VAC. If so select it, otherwise no PV is selected. |
| 478 | + |
| 479 | +Otherwise, iterate through all unbound PVs. Check if the storage class and |
| 480 | +volume attributes class of the PV matches the PVC, and the capacity is equal to |
| 481 | +or greater than that requested by the PVC. If so, add it to the list of |
| 482 | +candidates. After all unbound PVs have been examined, select the candidate with |
| 483 | +the smallest capacity. |
| 484 | + |
| 485 | +##### Perform the binding |
| 486 | + |
| 487 | +1. Bind the PV to the PVC by updating `pv.Spec.ClaimRef` to point to the PVC. |
| 488 | +1. Update the PV volume phase to Bound. |
| 489 | +1. Bind the PVC to the PV by updating `pvc.Spec.VolumeName` to point to the PV. |
| 490 | +1. Update the PVC claim status, including current volume attributes class, phase |
| 491 | +and capacity. |
| 492 | + |
| 493 | +The changes from when there was no VolumeAttributesClass are: |
| 494 | +* Change the matching algorithm to be aware of volume attributes. |
| 495 | +* Change the last step in binding to update the PVC status indicating the |
| 496 | + current volume attributes. |
| 497 | + |
| 498 | +Otherwise the algorithm is unchanged. |
| 499 | + |
| 500 | +As discussed above, this assumes the PVC VAC is immutable until it is bound to a |
| 501 | +PVC. If a PVC cannot be bound due to an otherwise matching PV having the wrong |
| 502 | +VAC, the PVC must be deleted and re-created. |
| 503 | + |
438 | 504 | #### VolumeAttributesClass Deletion Protection
|
439 | 505 |
|
440 | 506 | While a VolumeAttributesClass is referenced by any PVC, we will prevent the object from being deleted by adding a finalizer `kubernetes.io/vac-protection`. It's a best effort to prevent users from making mistakes. It may not be accurate in all cases.
|
@@ -559,7 +625,15 @@ Deleting a PVC will trigger a list PVCs call and decide if we need to remove the
|
559 | 625 |
|
560 | 626 | 
|
561 | 627 |
|
562 |
| -Since VolumeAttributesClass is **immutable**, to update the parameters, the end user can modify the PVC object to set a different VolumeAttributesClass. If the existing VolumeAttributesClass cannot satisfy the end user’s use case, the end user needs to contact the cluster administrator to create a new VolumeAttributesClass. |
| 628 | +Since VolumeAttributesClass is **immutable**, to update the parameters, the end |
| 629 | +user can modify the PVC object to set a different VolumeAttributesClass. If the |
| 630 | +existing VolumeAttributesClass cannot satisfy the end user’s use case, the end |
| 631 | +user needs to contact the cluster administrator to create a new |
| 632 | +VolumeAttributesClass. The VAC cannot be changed until the PVC is bound to a PV |
| 633 | +(see [binding](#binding-of-pv-and-pvc), above). This is enforced in API |
| 634 | +validation in the same way as storage capacity, see |
| 635 | +[ValidatePersistentVolumeClaimUpdate](https://github.com/kubernetes/kubernetes/blob/master/pkg/apis/core/validation/validation.go#L2389). |
| 636 | + |
563 | 637 |
|
564 | 638 | **Watching changes in the PVC object**, if the PVC’s VolumeAttributesClass changes, it will trigger a ModifyVolume call.
|
565 | 639 |
|
|
0 commit comments