Skip to content

Commit 9d6fb25

Browse files
authored
Merge pull request kubernetes#3126 from Jiawei0227/migration
KEP-625: Update CSI Migration to GA
2 parents 2259010 + 0943f40 commit 9d6fb25

File tree

4 files changed

+1041
-26
lines changed

4 files changed

+1041
-26
lines changed
Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,5 @@
11
kep-number: 625
22
beta:
33
approver: "@wojtek-t"
4+
stable:
5+
approver: "@wojtek-t"

keps/sig-storage/625-csi-migration/README.md

Lines changed: 47 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -93,7 +93,7 @@ internal APIs.
9393
## Proposal
9494

9595
### Implementation Details/Notes/Constraints
96-
The detailed design was originally implemented as a [design proposal](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/storage/csi-migration.md)
96+
The detailed design was originally implemented as a [design proposal](https://github.com/kubernetes/enhancements/tree/master/keps/sig-storage/625-csi-migration/csi-migration-design.md)
9797

9898
### Risks and Mitigations
9999

@@ -149,7 +149,7 @@ know what configuration it’s running in and validate the expected result.
149149
Configurations to test:
150150

151151
| ADC | Kubelet | Expected Result |
152-
|-------------------|----------------------------------------------------|--------------------------------------------------------------------------|
152+
| ----------------- | -------------------------------------------------- | ------------------------------------------------------------------------ |
153153
| ADC Migration On | Kubelet Migration On | Fully migrated - result should be same as “Migration Shim Testing” above |
154154
| ADC Migration On | Kubelet Migration Off (or Kubelet version too low) | No calls made to driver. All operations serviced by in-tree plugin |
155155
| ADC Migration Off | Kubelet Migration On | Not supported config - Undefined behavior |
@@ -196,7 +196,7 @@ you need any help or guidance.
196196
- [x] Feature gate (also fill in values in `kep.yaml`)
197197
- Feature gate name: CSIMigration, CSIMigration{vendor}, InTreePlugin{vendor}Unregister
198198
- Components depending on the feature gate: kubelet, kube-controller-manager, kube-scheduler
199-
- Please refer to this design doc on the [Step to enable the feature](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/storage/csi-migration.md#upgradedowngrade-migrateunmigrate-scenarios)
199+
- Please refer to this design doc on the [Step to enable the feature](https://github.com/kubernetes/enhancements/tree/master/keps/sig-storage/625-csi-migration/csi-migration-design.md#upgradedowngrade-migrateunmigrate-scenarios)
200200

201201
* **Does enabling the feature change any default behavior?**
202202

@@ -219,17 +219,17 @@ you need any help or guidance.
219219
for it should be enabled align with kube-controller-manager otherwise the volume topology && volume limit function could
220220
be impacted.
221221

222-
| Kube-Controller-Manager| Kubelet | Expected Behavior Change |
223-
|------------------------|----------------------------------------------------|--------------------------------------------------------------------------|
224-
| `CSIMigration{vendor}` On | `CSIMigration{vendor}` On | Fully migrated. All operations serviced by CSI plugin. From user perspective, nothing changed. |
225-
| `CSIMigration{vendor}` On | `CSIMigration{vendor}` Off | `InTreePlugin{vendor}Unregister` enabled on Kubelet: Broken state, Provision/Delete/Attach/Detach by CSI, Mount/Unmount not function. `InTreePlugin{vendor}Unregister` enabled on KCM: Provision/Deletion/Attach/Detach by CSI, Mount/Unmount by in-tree. `InTreePlugin{vendor}Unregister` disabled at all: Provision/Deletion by CSI, other operations by In-tree.|
226-
| `CSIMigration{vendor}` Off | `CSIMigration{vendor}` On | Broken state. Operations like volume provision will still work. But operations like volume Attach/Mount will be broken |
227-
| `CSIMigration{vendor}` Off | `CSIMigration{vendor}` Off | No behavior change |
222+
| Kube-Controller-Manager | Kubelet | Expected Behavior Change |
223+
| -------------------------- | -------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
224+
| `CSIMigration{vendor}` On | `CSIMigration{vendor}` On | Fully migrated. All operations serviced by CSI plugin. From user perspective, nothing changed. |
225+
| `CSIMigration{vendor}` On | `CSIMigration{vendor}` Off | `InTreePlugin{vendor}Unregister` enabled on Kubelet: Broken state, Provision/Delete/Attach/Detach by CSI, Mount/Unmount not function. `InTreePlugin{vendor}Unregister` enabled on KCM: Provision/Deletion/Attach/Detach by CSI, Mount/Unmount by in-tree. `InTreePlugin{vendor}Unregister` disabled at all: Provision/Deletion by CSI, other operations by In-tree. |
226+
| `CSIMigration{vendor}` Off | `CSIMigration{vendor}` On | Broken state. Operations like volume provision will still work. But operations like volume Attach/Mount will be broken |
227+
| `CSIMigration{vendor}` Off | `CSIMigration{vendor}` Off | No behavior change |
228228

229229
* **Can the feature be disabled once it has been enabled (i.e. can we roll back
230230
the enablement)?**
231231
- Yes - can be disabled by disabling feature flags.
232-
Please refer to the [upgrade/downgrade](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/storage/csi-migration.md#upgradedowngrade-migrateunmigrate-scenarios) sections on how to downgrade the cluster to roll back the enablement.
232+
Please refer to the [upgrade/downgrade](https://github.com/kubernetes/enhancements/tree/master/keps/sig-storage/625-csi-migration/csi-migration-design.md#upgradedowngrade-migrateunmigrate-scenarios) sections on how to downgrade the cluster to roll back the enablement.
233233

234234
- For `InTreePlugin{vendor}Unregister`, yes we can disable the feature gate once we enabled. This will register the corresponding
235235
in-tree storage plugin into the supported list and user will be able to use it to do all storage related operations again.
@@ -243,9 +243,14 @@ like Provision/Deletion/Attach/Detach/Mount/Unmount will not be available if CSI
243243

244244
* **Are there any tests for feature enablement/disablement?**
245245
We have CSI Migration e2e test for each plugin that are implemented and maintained by each driver maintainer.
246-
Specifically, for each in-tree plugin corresponding CSI drivers, it will have
246+
Specifically, for each in-tree plugin corresponding CSI drivers, it havs
247247
- Full k8s storage e2e tests
248-
- Migration enabled functional e2e tests.
248+
- Migration enabled functional e2e tests. For example:
249+
- GCE PD [migration testgrid](https://testgrid.k8s.io/provider-gcp-compute-persistent-disk-csi-driver#Migration%20Kubernetes%20Master%20Driver%20Stable).
250+
- AWS EBS [migration testgrid](https://k8s-testgrid.appspot.com/provider-aws-ebs-csi-driver#ci-migration-test)
251+
- Azuredisk [migration testgrid](https://testgrid.k8s.io/provider-azure-azuredisk-csi-driver#pr-azuredisk-csi-driver-e2e-migration).
252+
- Azurefile has [migration testgrid](https://testgrid.k8s.io/provider-azure-azurefile-csi-driver#pr-azurefile-csi-driver-e2e-migration).
253+
- Openstack has CSI migration tests for GCE/AWS/Azure/Cinder at [testgrid](https://testgrid.k8s.io/redhat-openshift-ocp-release-4.10-broken#Summary). And an upgrade test will be added soon in the future.
249254
- Upgrade/downgrade/version skew tests that test the transition from feature turning on to off.
250255

251256
For core K8s, we have unit tests including but not limited to:
@@ -279,9 +284,9 @@ Specifically, for each in-tree plugin corresponding CSI drivers, it will have
279284

280285
* **What specific metrics should inform a rollback?**
281286
We have metrics on the CSI sidecar side called `csi_operation_duration_seconds` and core k8s metrics on both kube-controller-manager and kubelet side called `storage_operation_duration_seconds`.
282-
Both of them will have a `migrated` field to indicate whether this operation is a migrated PV operation.
283-
- For `csi_operation_duration_seconds`, we will have a `grpc_status` field
284-
- For `storage_operation_duration_seconds`, we will have a `status` field
287+
Both of them have a `migrated` field to indicate whether this operation is a migrated PV operation.
288+
- For `csi_operation_duration_seconds`, we have a `grpc_status` field
289+
- For `storage_operation_duration_seconds`, we have a `status` field
285290

286291
If the error ratio of these two metrics has an unusual strike or is keeping at a relatively higher level compared to in-tree model, it means something went wrong and we need a rollback.
287292

@@ -302,7 +307,7 @@ In addition, some CSI drivers are not able to maintain 100% backwards compatibil
302307
### Monitoring Requirements
303308

304309
* **How can an operator determine if the feature is in use by workloads?**
305-
We will have metrics `csi_sidecar_duration_seconds` on the CSI sidecars and `storage_operation_duration_seconds` on the kube-controller-manager and kubelet side to indicate whether this operation is a migrated operation or not. These metrics will have a `migrated` field to indicate if this is a migrated operation.
310+
We have metrics `csi_sidecar_duration_seconds` on the CSI sidecars and `storage_operation_duration_seconds` on the kube-controller-manager and kubelet side to indicate whether this operation is a migrated operation or not. These metrics have a `migrated` field to indicate if this is a migrated operation.
306311

307312
* **What are the SLIs (Service Level Indicators) an operator can use to determine
308313
the health of the service?**
@@ -319,6 +324,7 @@ the health of the service?**
319324
* **Are there any missing metrics that would be useful to have to improve observability
320325
of this feature?**
321326
Node side CSI operation metrics. It will be implemented in the GA phase.
327+
GA Update: It has been implemented in [Kubernetes#PR#98979](https://github.com/kubernetes/kubernetes/pull/98979).
322328

323329
### Dependencies
324330

@@ -415,17 +421,37 @@ Major milestones in the life cycle of a KEP should be tracked in `Implementation
415421

416422
Major milestones for each in-tree plugin CSI migration:
417423

424+
- 1.24
425+
- AWS EBS CSI migration to GA
426+
- Azuredisk CSI migration to GA
427+
- GCE PD CSI migration to GA
428+
- OpenStack Cinder CSI migration to GA
429+
- Azurefile CSI migration to Beta, on by default
430+
- vSphere CSI migration to Beta, on by default
431+
- Cephfs CSI migration to Alpha
432+
- Ceph RBD CSI migration to Beta, off by default
433+
- Portworx CSI migration to Beta, off by default
434+
- 1.23
435+
- AWS EBS CSI migration to Beta, on by default
436+
- Azuredisk CSI migration to Beta, on by default
437+
- GCE PD CSI migration to Beta, on by default
438+
- Portworx CSI migration to Alpha
439+
- Ceph RBD CSI migration to Alpha
418440
- 1.21
419-
- Azurefile CSI migration to Beta
441+
- Azurefile CSI migration to Beta, off by default
442+
- OpenStack Cinder CSI migration to Beta, on by default
420443
- 1.19
421-
- vSphere CSI migration to Beta
422-
- Azuredisk CSI migration to Beta
444+
- vSphere CSI migration to Beta, off by default
445+
- Azuredisk CSI migration to Beta, off by default
446+
- 1.18
447+
- vSphere CSI migration to Alpha
423448
- 1.17
424-
- GCE PD CSI migration to Beta
425-
- AWS EBS CSI migration to Beta
449+
- GCE PD CSI migration to Beta, off by default
450+
- AWS EBS CSI migration to Beta, off by default
426451
- 1.15
427452
- Azuredisk CSI migration to Alpha
428453
- Azurefile CSI migration to Alpha
429454
- 1.14
430455
- GCE PD CSI migration to Alpha
431456
- AWS EBS CSI migration to Alpha
457+
- OpenStack Cinder CSI migration to Alpha

0 commit comments

Comments
 (0)