You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: keps/sig-storage/3751-volume-attributes-class/README.md
+71-27Lines changed: 71 additions & 27 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -82,15 +82,15 @@
82
82
Items marked with (R) are required *prior to targeting to a milestone / release*.
83
83
84
84
-[X] (R) Enhancement issue in release milestone, which links to KEP dir in [kubernetes/enhancements] (not the initial KEP PR)
85
-
-[] (R) KEP approvers have approved the KEP status as `implementable`
85
+
-[X] (R) KEP approvers have approved the KEP status as `implementable`
86
86
-[X] (R) Design details are appropriately documented
87
-
-[] (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input (including test refactors)
88
-
-[] e2e Tests for all Beta API Operations (endpoints)
89
-
-[] (R) Ensure GA e2e tests for meet requirements for [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md)
90
-
-[] (R) Minimum Two Week Window for GA e2e tests to prove flake free
91
-
-[] (R) Graduation criteria is in place
87
+
-[X] (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input (including test refactors)
88
+
-[X] e2e Tests for all Beta API Operations (endpoints) - [dashboard](https://testgrid.k8s.io/presubmits-kubernetes-nonblocking#pull-kubernetes-e2e-kind-beta-features&width=90&include-filter-by-regex=VolumeAttributesClass)
89
+
-[X] (R) Ensure GA e2e tests for meet requirements for [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md)
90
+
-[X] (R) Minimum Two Week Window for GA e2e tests to prove flake free
91
+
-[X] (R) Graduation criteria is in place
92
92
-[ ] (R) [all GA Endpoints](https://github.com/kubernetes/community/pull/1806) must be hit by [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md)
93
-
-[] (R) Production readiness review completed
93
+
-[X] (R) Production readiness review completed
94
94
-[ ] (R) Production readiness review approved
95
95
-[X] "Implementation History" section is up-to-date for milestone
96
96
-[X] User-facing documentation has been created in [kubernetes/website], for publication to [kubernetes.io]
@@ -240,9 +240,9 @@ The CSI create request will be extended to add mutable parameters. A new Control
240
240
241
241
#### Default VolumeAttributesClass
242
242
243
-
A default VolumeAttributesClass can be specified for the Kubernetes cluster. This default VolumeAttributesClass is then used to dynamically provision storage for PersistentVolumeClaims that do not require any specific VolumeAttributesClass. A cluster admin can use annotation to manage default VolumeAttributesClass. The default VolumeAttributesClass has an annotation volumeattributesclass.kubernetes.io/is-default-class set to true. Any other value or absence of the annotation is interpreted as false.
243
+
For GA, the VolumeAttributesClass feature does not support a default VolumeAttributesClass. This is because there is already a natural default for VolumeAttributesClass: no VolumeAttributesClass associated with the PersistentVolumeClaim. Furthermore, with a default, there would be added overhead for cluster operators in making sure a cluster's default StorageClass and default VolumeAttributesClass are compatible.
244
244
245
-
Note: For Kubernetes versions ≤ v1.31, the VolumeAttributesClass feature does not support a default VolumeAttributesClass. This is because there is already a natural default for VolumeAttributesClass: no VolumeAttributesClass associated with the PersistentVolumeClaim. Furthermore, with a default, there would be added overhead for cluster operators in making sure a cluster's default StorageClass and default VolumeAttributesClass are compatible. Use-cases and support for Default VolumeAttributesClass will be re-evaluated during this feature's beta in Kubernetes v1.31.
245
+
For future design, a default VolumeAttributesClass can be specified for the Kubernetes cluster. This default VolumeAttributesClass is then used to dynamically provision storage for PersistentVolumeClaims that do not require any specific VolumeAttributesClass.
246
246
247
247
#### Pre-provisioned Volume
248
248
@@ -695,10 +695,10 @@ VolumeAttributesClass parameters can be considered as best-effort parameters, th
695
695
696
696
* Basic unit tests for performance and quota system.
697
697
* API conformance tests
698
-
* E2E tests with happy tests in the [K8s storage framework](https://github.com/kubernetes/kubernetes/tree/master/test/e2e/storage/testsuites) for different drivers testing
699
-
* E2E tests using mock driver to cause failure on create, update and recovering cases
- VAC protection controller with large lists of PVCs (2000)
753
-
- Creating a large amount of PVCs (2000) using the same VolumeAttributesClass
752
+
- VAC protection controller with large lists of PVCs (500)
753
+
- Creating a large amount of PVCs (500) using the same VolumeAttributesClass
754
+
755
+
Stress test by EBS CSI Driver:
756
+
757
+
Scale concurrently modifying 500 volumes via VAC. Patched 5 PVCs per second with new VAC and waited for all volumes to modify.
758
+
759
+
Tested against resizer built from kubernetes-csi/external-resizer#487 and EBS CSI Driver v1.44 on EKS 1.33. Used aws-ebs-csi-driver/hack/ebs-scale-test modification test.
760
+
761
+
Resizer CPU peaked at 0.33 cores and Mem at 43 Mb
762
+
763
+
Additional metrics and in gist: https://gist.github.com/AndrewSirenko/24ab0e9b3e66d279b3406e4e26264835
@@ -767,7 +779,15 @@ We expect no non-infra related flakes in the last month as a GA graduation crite
767
779
768
780
- Beta in 1.31: Since this feature is an extension of the external-resizer/external-provisioner usage flow, we are going to move this to beta with enhanced e2e and test coverage. Test cases are covered in sessions above: ``e2e tests``, ``Integration tests`` etc. Controllers will handle VolumeAttributesClass feature gates being on by default, but beta API itself being disabled on cluster by default.
769
781
- Involve 3 different CSI drivers to participate in testing
770
-
- Stress test before GA
782
+
- Rollback and stress test before GA
783
+
- All functionality completed
784
+
- All security enforcement completed
785
+
- All monitoring requirements completed
786
+
- All testing requirements completed
787
+
- All known pre-release issues and gaps resolved
788
+
- Resource quota with scope implementation soaked in 1.33 release
789
+
- Added rollback support based on feedbacks
790
+
- Bug fix for [event emission of non exist VAC](https://github.com/kubernetes-csi/external-resizer/issues/427)
771
791
772
792
#### GA
773
793
@@ -794,6 +814,11 @@ This feature is implemented only in the API server and KCM and controlled by
794
814
| off | on | external-provisioner/external-resizer should not get any event to create PVC with VolumeAttributesClass/update VolumeAttributesClass, the current resize flow stays the same |
795
815
| on | on | New behavior. |
796
816
817
+
After promotion to GA, the feature is enabled by default but can still be disabled.
818
+
Disabling is allowed because it could not be enabled by default during beta due to its
819
+
dependency on the off-by-default API group. Immediately enabling it by default with no
820
+
fallback option could be too risky. Please refer to [Beta Feature Gate Promotion Requirements](https://github.com/kubernetes/enhancements/tree/master/keps/sig-architecture/5241-beta-featuregate-promotion-requirements#proposal).
821
+
797
822
## Production Readiness Review Questionnaire
798
823
799
824
### Feature Enablement and Rollback
@@ -837,14 +862,33 @@ If the feature is rolled out partially on API servers, there will be no impact o
837
862
be processed as if the feature is disabled, the external-provisioner/external-resizer is not acting on the event created yet - that means nothing happens and PVC
838
863
will not be changed with the iops/throughput until external-provisioner/external-resizer is deployed.
839
864
840
-
841
865
###### What specific metrics should inform a rollback?
842
866
843
867
A metric `controller_modify_volume_errors_total` will indicate a problem with the feature.
844
868
845
869
###### Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested?
846
870
847
-
TODO Upgrade and rollback will be tested when the feature gate will change to beta.
871
+
Tested in Beta:
872
+
873
+
1. Enable both feature flag and beta API in api-server, create PVC with VAC1, and then modify to VAC2
874
+
2. Turn off the feature flag first, and then try to modify PVC back to VAC1, got error:
875
+
876
+
```
877
+
The PersistentVolumeClaim "test-pvc" is invalid: spec.volumeAttributesClassName: Forbidden: update
878
+
is forbidden when the VolumeAttributesClass feature gate is disabled
879
+
```
880
+
The pod and volume are both up and running.
881
+
882
+
3. Turn off the beta API, this time ``kubectl get vac`` got error:
883
+
```
884
+
Error from server (NotFound): Unable to list "storage.k8s.io/v1beta1, Resource=volumeattributesclasses":
885
+
the server could not find the requested resource
886
+
```
887
+
888
+
The pod and volume are both up and running.
889
+
890
+
4. Turn on both feature flag and beta API in api-server again. ``kubectl get vac`` shows the VACs again. Change PVC back to VAC1, modify is applied.
891
+
848
892
849
893
###### Is the rollout accompanied by any deprecations and/or removals of features, APIs, fields of API types, flags, etc.?
850
894
@@ -923,7 +967,7 @@ previous answers based on experience in the field.
923
967
924
968
###### Will enabling / using this feature result in any new API calls?
925
969
926
-
Yes. The VAC protection controller will be expensive because it needs to LIST PVCs/PVs but the call volume should be low.
970
+
Yes. The VAC protection controller will be expensive because it needs to LIST PVCs/PVs but the call rate should be low because it is user triggered by changing VAC in the PVC.
927
971
928
972
- API call type: PATCH PVC
929
973
- estimated throughput: low, only once for PVCs that have
@@ -954,7 +998,7 @@ Using this feature may result in non-negligible increase of resource usage IF cu
954
998
- external-resizer CPU and memory will see a non-negligible increase if users increased the number of concurrent operations via the `--workers` flag. We follow the strategy of sharing that limit between `ControllerExpandVolume` and `ControllerModifyVolume` RPCs, similar to how external-provisioner functions.
955
999
- The API-Server may see a spike of CPU when processing relevant changes.
956
1000
957
-
Stress tests will determine increase in resource usage at varying amounts of concurrent volume modifications.
1001
+
Stress tests will determine increase in resource usage at varying amounts of concurrent volume modifications.
0 commit comments