Skip to content

Commit 4bfb40a

Browse files
authored
Merge pull request #5024 from sunnylovestiramisu/updateVAC
KEP-3751: Update release signoff Checklist before GA
2 parents ff01858 + 5f5798f commit 4bfb40a

File tree

3 files changed

+77
-31
lines changed

3 files changed

+77
-31
lines changed

keps/prod-readiness/sig-storage/3751.yaml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,3 +6,5 @@ alpha:
66
approver: "@johnbelamaric"
77
beta:
88
approver: "@johnbelamaric"
9+
stable:
10+
approver: "@johnbelamaric"

keps/sig-storage/3751-volume-attributes-class/README.md

Lines changed: 71 additions & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -82,15 +82,15 @@
8282
Items marked with (R) are required *prior to targeting to a milestone / release*.
8383

8484
- [X] (R) Enhancement issue in release milestone, which links to KEP dir in [kubernetes/enhancements] (not the initial KEP PR)
85-
- [ ] (R) KEP approvers have approved the KEP status as `implementable`
85+
- [X] (R) KEP approvers have approved the KEP status as `implementable`
8686
- [X] (R) Design details are appropriately documented
87-
- [ ] (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input (including test refactors)
88-
- [ ] e2e Tests for all Beta API Operations (endpoints)
89-
- [ ] (R) Ensure GA e2e tests for meet requirements for [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md)
90-
- [ ] (R) Minimum Two Week Window for GA e2e tests to prove flake free
91-
- [ ] (R) Graduation criteria is in place
87+
- [X] (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input (including test refactors)
88+
- [X] e2e Tests for all Beta API Operations (endpoints) - [dashboard](https://testgrid.k8s.io/presubmits-kubernetes-nonblocking#pull-kubernetes-e2e-kind-beta-features&width=90&include-filter-by-regex=VolumeAttributesClass)
89+
- [X] (R) Ensure GA e2e tests for meet requirements for [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md)
90+
- [X] (R) Minimum Two Week Window for GA e2e tests to prove flake free
91+
- [X] (R) Graduation criteria is in place
9292
- [ ] (R) [all GA Endpoints](https://github.com/kubernetes/community/pull/1806) must be hit by [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md)
93-
- [ ] (R) Production readiness review completed
93+
- [X] (R) Production readiness review completed
9494
- [ ] (R) Production readiness review approved
9595
- [X] "Implementation History" section is up-to-date for milestone
9696
- [X] User-facing documentation has been created in [kubernetes/website], for publication to [kubernetes.io]
@@ -240,9 +240,9 @@ The CSI create request will be extended to add mutable parameters. A new Control
240240

241241
#### Default VolumeAttributesClass
242242

243-
A default VolumeAttributesClass can be specified for the Kubernetes cluster. This default VolumeAttributesClass is then used to dynamically provision storage for PersistentVolumeClaims that do not require any specific VolumeAttributesClass. A cluster admin can use annotation to manage default VolumeAttributesClass. The default VolumeAttributesClass has an annotation volumeattributesclass.kubernetes.io/is-default-class set to true. Any other value or absence of the annotation is interpreted as false.
243+
For GA, the VolumeAttributesClass feature does not support a default VolumeAttributesClass. This is because there is already a natural default for VolumeAttributesClass: no VolumeAttributesClass associated with the PersistentVolumeClaim. Furthermore, with a default, there would be added overhead for cluster operators in making sure a cluster's default StorageClass and default VolumeAttributesClass are compatible.
244244

245-
Note: For Kubernetes versions ≤ v1.31, the VolumeAttributesClass feature does not support a default VolumeAttributesClass. This is because there is already a natural default for VolumeAttributesClass: no VolumeAttributesClass associated with the PersistentVolumeClaim. Furthermore, with a default, there would be added overhead for cluster operators in making sure a cluster's default StorageClass and default VolumeAttributesClass are compatible. Use-cases and support for Default VolumeAttributesClass will be re-evaluated during this feature's beta in Kubernetes v1.31.
245+
For future design, a default VolumeAttributesClass can be specified for the Kubernetes cluster. This default VolumeAttributesClass is then used to dynamically provision storage for PersistentVolumeClaims that do not require any specific VolumeAttributesClass.
246246

247247
#### Pre-provisioned Volume
248248

@@ -695,10 +695,10 @@ VolumeAttributesClass parameters can be considered as best-effort parameters, th
695695

696696
* Basic unit tests for performance and quota system.
697697
* API conformance tests
698-
* E2E tests with happy tests in the [K8s storage framework](https://github.com/kubernetes/kubernetes/tree/master/test/e2e/storage/testsuites) for different drivers testing
699-
* E2E tests using mock driver to cause failure on create, update and recovering cases
700-
* [K8s storage framework](https://github.com/kubernetes/kubernetes/tree/master/test/e2e/storage/testsuites)
701-
* [csi-tes](https://github.com/kubernetes-csi/csi-test)
698+
* E2E tests:
699+
* https://github.com/kubernetes/kubernetes/blob/master/test/e2e/storage/volumeattributesclass.go
700+
* https://github.com/kubernetes/kubernetes/blob/master/test/e2e/storage/testsuites/volume_modify.go
701+
* Stress tests: https://github.com/kubernetes/kubernetes/blob/master/test/e2e/storage/testsuites/volume_modify_stress.go
702702
* Test coverage of quota usage with ResourceQuota and LimitRange
703703
* Measure latency impact to CreateVolume during beta and provide feedback to operators
704704
* Upgrade and rollback test when the feature gate changes to beta
@@ -718,12 +718,7 @@ For Alpha, describe what tests will be added to ensure proper quality of the enh
718718
For Beta and GA, add links to added tests together with links to k8s-triage for those tests:
719719
https://storage.googleapis.com/k8s-triage/index.html
720720
-->
721-
722-
- The behavior with feature gate and API turned on/off and mix match
723-
- The happy path with creating and modifying volume successfully with VolumeAttributesClass
724-
- [E2E CSI Test PR](https://github.com/kubernetes/kubernetes/pull/124151/)
725-
- [k8s-triage](https://storage.googleapis.com/k8s-triage/index.html?sig=storage&test=%5C%5BFeature%3AVolumeAttributesClass%5C%5D)
726-
- [Testgrid](https://testgrid.k8s.io/sig-storage-kubernetes#kind-alpha-features&include-filter-by-regex=%5BFeature%3AVolumeAttributesClass%5D&include-filter-by-regex=%5BFeature%3AVolumeAttributesClass%5D&include-filter-by-regex=%5C%5BFeature%3AVolumeAttributesClass%5C%5D)
721+
N/A. Please see e2e tests session below.
727722

728723
##### e2e tests
729724

@@ -737,6 +732,7 @@ https://storage.googleapis.com/k8s-triage/index.html
737732
We expect no non-infra related flakes in the last month as a GA graduation criteria.
738733
-->
739734

735+
- The behavior with feature gate and API turned on/off and mix match
740736
- Create VolumeAttributesClass successfully
741737
- Delete VolumeAttributesClass with finalizer fails
742738
- Delete VolumeAttributesClass without finalizer succeeds
@@ -745,12 +741,28 @@ We expect no non-infra related flakes in the last month as a GA graduation crite
745741
- Give a driver that does not ControllerModifyVolume, CSI volume should not be modified.
746742
- If ControllerModifyVolume fails, PVC should have appropriate events.
747743

748-
[API Conformance Test PR](https://github.com/kubernetes/kubernetes/pull/121849)
744+
- [API Conformance Test PR](https://github.com/kubernetes/kubernetes/pull/121849)
745+
- [E2E CSI Test PR](https://github.com/kubernetes/kubernetes/pull/124151/)
746+
- [Stress Test PR](https://github.com/kubernetes/kubernetes/pull/129918)
747+
- [k8s-triage](https://storage.googleapis.com/k8s-triage/index.html?sig=storage&test=%5C%5BFeature%3AVolumeAttributesClass%5C%5D)
748+
- [Testgrid](https://testgrid.k8s.io/presubmits-kubernetes-nonblocking#pull-kubernetes-e2e-kind-beta-features&width=90&include-filter-by-regex=VolumeAttributesClass)
749749

750750
##### Stress tests
751751

752-
- VAC protection controller with large lists of PVCs (2000)
753-
- Creating a large amount of PVCs (2000) using the same VolumeAttributesClass
752+
- VAC protection controller with large lists of PVCs (500)
753+
- Creating a large amount of PVCs (500) using the same VolumeAttributesClass
754+
755+
Stress test by EBS CSI Driver:
756+
757+
Scale concurrently modifying 500 volumes via VAC. Patched 5 PVCs per second with new VAC and waited for all volumes to modify.
758+
759+
Tested against resizer built from kubernetes-csi/external-resizer#487 and EBS CSI Driver v1.44 on EKS 1.33. Used aws-ebs-csi-driver/hack/ebs-scale-test modification test.
760+
761+
Resizer CPU peaked at 0.33 cores and Mem at 43 Mb
762+
763+
Additional metrics and in gist: https://gist.github.com/AndrewSirenko/24ab0e9b3e66d279b3406e4e26264835
764+
765+
- [Dashboard](https://testgrid.k8s.io/presubmits-kubernetes-nonblocking#pull-kubernetes-e2e-storage-kind-alpha-beta-features-slow&width=90)
754766

755767
### Graduation Criteria
756768

@@ -767,7 +779,15 @@ We expect no non-infra related flakes in the last month as a GA graduation crite
767779

768780
- Beta in 1.31: Since this feature is an extension of the external-resizer/external-provisioner usage flow, we are going to move this to beta with enhanced e2e and test coverage. Test cases are covered in sessions above: ``e2e tests``, ``Integration tests`` etc. Controllers will handle VolumeAttributesClass feature gates being on by default, but beta API itself being disabled on cluster by default.
769781
- Involve 3 different CSI drivers to participate in testing
770-
- Stress test before GA
782+
- Rollback and stress test before GA
783+
- All functionality completed
784+
- All security enforcement completed
785+
- All monitoring requirements completed
786+
- All testing requirements completed
787+
- All known pre-release issues and gaps resolved
788+
- Resource quota with scope implementation soaked in 1.33 release
789+
- Added rollback support based on feedbacks
790+
- Bug fix for [event emission of non exist VAC](https://github.com/kubernetes-csi/external-resizer/issues/427)
771791

772792
#### GA
773793

@@ -794,6 +814,11 @@ This feature is implemented only in the API server and KCM and controlled by
794814
| off | on | external-provisioner/external-resizer should not get any event to create PVC with VolumeAttributesClass/update VolumeAttributesClass, the current resize flow stays the same |
795815
| on | on | New behavior. |
796816

817+
After promotion to GA, the feature is enabled by default but can still be disabled.
818+
Disabling is allowed because it could not be enabled by default during beta due to its
819+
dependency on the off-by-default API group. Immediately enabling it by default with no
820+
fallback option could be too risky. Please refer to [Beta Feature Gate Promotion Requirements](https://github.com/kubernetes/enhancements/tree/master/keps/sig-architecture/5241-beta-featuregate-promotion-requirements#proposal).
821+
797822
## Production Readiness Review Questionnaire
798823

799824
### Feature Enablement and Rollback
@@ -837,14 +862,33 @@ If the feature is rolled out partially on API servers, there will be no impact o
837862
be processed as if the feature is disabled, the external-provisioner/external-resizer is not acting on the event created yet - that means nothing happens and PVC
838863
will not be changed with the iops/throughput until external-provisioner/external-resizer is deployed.
839864

840-
841865
###### What specific metrics should inform a rollback?
842866

843867
A metric `controller_modify_volume_errors_total` will indicate a problem with the feature.
844868

845869
###### Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested?
846870

847-
TODO Upgrade and rollback will be tested when the feature gate will change to beta.
871+
Tested in Beta:
872+
873+
1. Enable both feature flag and beta API in api-server, create PVC with VAC1, and then modify to VAC2
874+
2. Turn off the feature flag first, and then try to modify PVC back to VAC1, got error:
875+
876+
```
877+
The PersistentVolumeClaim "test-pvc" is invalid: spec.volumeAttributesClassName: Forbidden: update
878+
is forbidden when the VolumeAttributesClass feature gate is disabled
879+
```
880+
The pod and volume are both up and running.
881+
882+
3. Turn off the beta API, this time ``kubectl get vac`` got error:
883+
```
884+
Error from server (NotFound): Unable to list "storage.k8s.io/v1beta1, Resource=volumeattributesclasses":
885+
the server could not find the requested resource
886+
```
887+
888+
The pod and volume are both up and running.
889+
890+
4. Turn on both feature flag and beta API in api-server again. ``kubectl get vac`` shows the VACs again. Change PVC back to VAC1, modify is applied.
891+
848892

849893
###### Is the rollout accompanied by any deprecations and/or removals of features, APIs, fields of API types, flags, etc.?
850894

@@ -923,7 +967,7 @@ previous answers based on experience in the field.
923967

924968
###### Will enabling / using this feature result in any new API calls?
925969

926-
Yes. The VAC protection controller will be expensive because it needs to LIST PVCs/PVs but the call volume should be low.
970+
Yes. The VAC protection controller will be expensive because it needs to LIST PVCs/PVs but the call rate should be low because it is user triggered by changing VAC in the PVC.
927971

928972
- API call type: PATCH PVC
929973
- estimated throughput: low, only once for PVCs that have
@@ -954,7 +998,7 @@ Using this feature may result in non-negligible increase of resource usage IF cu
954998
- external-resizer CPU and memory will see a non-negligible increase if users increased the number of concurrent operations via the `--workers` flag. We follow the strategy of sharing that limit between `ControllerExpandVolume` and `ControllerModifyVolume` RPCs, similar to how external-provisioner functions.
955999
- The API-Server may see a spike of CPU when processing relevant changes.
9561000

957-
Stress tests will determine increase in resource usage at varying amounts of concurrent volume modifications.
1001+
Stress tests will determine increase in resource usage at varying amounts of concurrent volume modifications.
9581002

9591003
### Troubleshooting
9601004

keps/sig-storage/3751-volume-attributes-class/kep.yaml

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
title: Kubernetes Volume Provisioned IO
1+
title: Kubernetes VolumeAttributesClass and ModifyVolume
22
kep-number: 3751
33
authors:
44
- "@mattcarry"
@@ -18,18 +18,18 @@ see-also:
1818
replaces:
1919

2020
# The target maturity stage in the current dev cycle for this KEP.
21-
stage: beta
21+
stage: stable
2222

2323
# The most recent milestone for which work toward delivery of this KEP has been
2424
# done. This can be the current (upcoming) milestone, if it is being actively
2525
# worked on.
26-
latest-milestone: "1.31"
26+
latest-milestone: "1.34"
2727

2828
# The milestone at which this feature was, or is targeted to be, at each stage.
2929
milestone:
3030
alpha: "v1.29"
3131
beta: "v1.31"
32-
stable:
32+
stable: "v1.34"
3333

3434
# The following PRR answers are required at alpha release
3535
# List the feature gate name and the components for which it must be enabled

0 commit comments

Comments
 (0)