You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: keps/sig-storage/625-csi-migration/README.md
+47-21Lines changed: 47 additions & 21 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -93,7 +93,7 @@ internal APIs.
93
93
## Proposal
94
94
95
95
### Implementation Details/Notes/Constraints
96
-
The detailed design was originally implemented as a [design proposal](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/storage/csi-migration.md)
96
+
The detailed design was originally implemented as a [design proposal](https://github.com/kubernetes/enhancements/tree/master/keps/sig-storage/625-csi-migration/csi-migration-design.md)
97
97
98
98
### Risks and Mitigations
99
99
@@ -149,7 +149,7 @@ know what configuration it’s running in and validate the expected result.
- Components depending on the feature gate: kubelet, kube-controller-manager, kube-scheduler
199
-
- Please refer to this design doc on the [Step to enable the feature](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/storage/csi-migration.md#upgradedowngrade-migrateunmigrate-scenarios)
199
+
- Please refer to this design doc on the [Step to enable the feature](https://github.com/kubernetes/enhancements/tree/master/keps/sig-storage/625-csi-migration/csi-migration-design.md#upgradedowngrade-migrateunmigrate-scenarios)
200
200
201
201
***Does enabling the feature change any default behavior?**
202
202
@@ -219,17 +219,17 @@ you need any help or guidance.
219
219
for it should be enabled align with kube-controller-manager otherwise the volume topology && volume limit function could
|`CSIMigration{vendor}` On |`CSIMigration{vendor}` On | Fully migrated. All operations serviced by CSI plugin. From user perspective, nothing changed.|
225
-
|`CSIMigration{vendor}` On |`CSIMigration{vendor}` Off |`InTreePlugin{vendor}Unregister` enabled on Kubelet: Broken state, Provision/Delete/Attach/Detach by CSI, Mount/Unmount not function. `InTreePlugin{vendor}Unregister` enabled on KCM: Provision/Deletion/Attach/Detach by CSI, Mount/Unmount by in-tree. `InTreePlugin{vendor}Unregister` disabled at all: Provision/Deletion by CSI, other operations by In-tree.|
226
-
|`CSIMigration{vendor}` Off |`CSIMigration{vendor}` On | Broken state. Operations like volume provision will still work. But operations like volume Attach/Mount will be broken |
227
-
|`CSIMigration{vendor}` Off |`CSIMigration{vendor}` Off | No behavior change|
|`CSIMigration{vendor}` On |`CSIMigration{vendor}` On | Fully migrated. All operations serviced by CSI plugin. From user perspective, nothing changed.|
225
+
|`CSIMigration{vendor}` On |`CSIMigration{vendor}` Off |`InTreePlugin{vendor}Unregister` enabled on Kubelet: Broken state, Provision/Delete/Attach/Detach by CSI, Mount/Unmount not function. `InTreePlugin{vendor}Unregister` enabled on KCM: Provision/Deletion/Attach/Detach by CSI, Mount/Unmount by in-tree. `InTreePlugin{vendor}Unregister` disabled at all: Provision/Deletion by CSI, other operations by In-tree.|
226
+
|`CSIMigration{vendor}` Off |`CSIMigration{vendor}` On | Broken state. Operations like volume provision will still work. But operations like volume Attach/Mount will be broken|
227
+
|`CSIMigration{vendor}` Off |`CSIMigration{vendor}` Off | No behavior change|
228
228
229
229
***Can the feature be disabled once it has been enabled (i.e. can we roll back
230
230
the enablement)?**
231
231
- Yes - can be disabled by disabling feature flags.
232
-
Please refer to the [upgrade/downgrade](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/storage/csi-migration.md#upgradedowngrade-migrateunmigrate-scenarios) sections on how to downgrade the cluster to roll back the enablement.
232
+
Please refer to the [upgrade/downgrade](https://github.com/kubernetes/enhancements/tree/master/keps/sig-storage/625-csi-migration/csi-migration-design.md#upgradedowngrade-migrateunmigrate-scenarios) sections on how to downgrade the cluster to roll back the enablement.
233
233
234
234
- For `InTreePlugin{vendor}Unregister`, yes we can disable the feature gate once we enabled. This will register the corresponding
235
235
in-tree storage plugin into the supported list and user will be able to use it to do all storage related operations again.
@@ -243,9 +243,14 @@ like Provision/Deletion/Attach/Detach/Mount/Unmount will not be available if CSI
243
243
244
244
***Are there any tests for feature enablement/disablement?**
245
245
We have CSI Migration e2e test for each plugin that are implemented and maintained by each driver maintainer.
246
-
Specifically, for each in-tree plugin corresponding CSI drivers, it will have
246
+
Specifically, for each in-tree plugin corresponding CSI drivers, it havs
247
247
- Full k8s storage e2e tests
248
-
- Migration enabled functional e2e tests.
248
+
- Migration enabled functional e2e tests. For example:
- Azurefile has [migration testgrid](https://testgrid.k8s.io/provider-azure-azurefile-csi-driver#pr-azurefile-csi-driver-e2e-migration).
253
+
- Openstack has CSI migration tests for GCE/AWS/Azure/Cinder at [testgrid](https://testgrid.k8s.io/redhat-openshift-ocp-release-4.10-broken#Summary). And an upgrade test will be added soon in the future.
249
254
- Upgrade/downgrade/version skew tests that test the transition from feature turning on to off.
250
255
251
256
For core K8s, we have unit tests including but not limited to:
@@ -279,9 +284,9 @@ Specifically, for each in-tree plugin corresponding CSI drivers, it will have
279
284
280
285
***What specific metrics should inform a rollback?**
281
286
We have metrics on the CSI sidecar side called `csi_operation_duration_seconds` and core k8s metrics on both kube-controller-manager and kubelet side called `storage_operation_duration_seconds`.
282
-
Both of them will have a `migrated` field to indicate whether this operation is a migrated PV operation.
283
-
- For `csi_operation_duration_seconds`, we will have a `grpc_status` field
284
-
- For `storage_operation_duration_seconds`, we will have a `status` field
287
+
Both of them have a `migrated` field to indicate whether this operation is a migrated PV operation.
288
+
- For `csi_operation_duration_seconds`, we have a `grpc_status` field
289
+
- For `storage_operation_duration_seconds`, we have a `status` field
285
290
286
291
If the error ratio of these two metrics has an unusual strike or is keeping at a relatively higher level compared to in-tree model, it means something went wrong and we need a rollback.
287
292
@@ -302,7 +307,7 @@ In addition, some CSI drivers are not able to maintain 100% backwards compatibil
302
307
### Monitoring Requirements
303
308
304
309
***How can an operator determine if the feature is in use by workloads?**
305
-
We will have metrics `csi_sidecar_duration_seconds` on the CSI sidecars and `storage_operation_duration_seconds` on the kube-controller-manager and kubelet side to indicate whether this operation is a migrated operation or not. These metrics will have a `migrated` field to indicate if this is a migrated operation.
310
+
We have metrics `csi_sidecar_duration_seconds` on the CSI sidecars and `storage_operation_duration_seconds` on the kube-controller-manager and kubelet side to indicate whether this operation is a migrated operation or not. These metrics have a `migrated` field to indicate if this is a migrated operation.
306
311
307
312
***What are the SLIs (Service Level Indicators) an operator can use to determine
308
313
the health of the service?**
@@ -319,6 +324,7 @@ the health of the service?**
319
324
***Are there any missing metrics that would be useful to have to improve observability
320
325
of this feature?**
321
326
Node side CSI operation metrics. It will be implemented in the GA phase.
327
+
GA Update: It has been implemented in [Kubernetes#PR#98979](https://github.com/kubernetes/kubernetes/pull/98979).
322
328
323
329
### Dependencies
324
330
@@ -415,17 +421,37 @@ Major milestones in the life cycle of a KEP should be tracked in `Implementation
415
421
416
422
Major milestones for each in-tree plugin CSI migration:
417
423
424
+
- 1.24
425
+
- AWS EBS CSI migration to GA
426
+
- Azuredisk CSI migration to GA
427
+
- GCE PD CSI migration to GA
428
+
- OpenStack Cinder CSI migration to GA
429
+
- Azurefile CSI migration to Beta, on by default
430
+
- vSphere CSI migration to Beta, on by default
431
+
- Cephfs CSI migration to Alpha
432
+
- Ceph RBD CSI migration to Beta, off by default
433
+
- Portworx CSI migration to Beta, off by default
434
+
- 1.23
435
+
- AWS EBS CSI migration to Beta, on by default
436
+
- Azuredisk CSI migration to Beta, on by default
437
+
- GCE PD CSI migration to Beta, on by default
438
+
- Portworx CSI migration to Alpha
439
+
- Ceph RBD CSI migration to Alpha
418
440
- 1.21
419
-
- Azurefile CSI migration to Beta
441
+
- Azurefile CSI migration to Beta, off by default
442
+
- OpenStack Cinder CSI migration to Beta, on by default
0 commit comments