You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
- Components depending on the feature gate: kubelet, kube-controller-manager, kube-scheduler
199
199
- Please refer to this design doc on the [Step to enable the feature](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/storage/csi-migration.md#upgradedowngrade-migrateunmigrate-scenarios)
200
200
201
201
***Does enabling the feature change any default behavior?**
202
-
Yes and No. If only `CSIMigration` feature flag is enabled, nothing will change on the cluster behavior. However, if `CSIMigration` && `CSIMigration{cloud-provider}` are both enabled, the behavior will change. The in-tree volume plugin that the cloud-provider use will be redirect to use the corresponding CSI driver. But from a user perspective, nothing will be noticed.
202
+
203
+
Yes and No.
204
+
- If only `CSIMigration` feature flag is enabled, nothing will change on the cluster behavior. `CSIMigration`
205
+
is a big umbrella feature gate. It takes control of vendor-agnostic controllers. Without this feature gate on,
206
+
the entire CSI Migration feature is disabled.
207
+
- If only `CSIMigration{vendor}` feature flag is enabled, nothing will change on the cluster behavior.
208
+
This feature gate controls the vendor-specific logic.
209
+
- Both `CSIMigration` and `CSIMigration{vendor}` need to be enabled on Kubernetes Components,
210
+
including scheduler, KCM, Kubelet, for CSI Migration to take effect.
211
+
-`InTreePlugin{vendor}Unregister` is a standalone feature gate that can be enabled and disabled
212
+
even out of CSI Migration scope. The name speaks for itself, when enabled, the component will not
213
+
register the specific in-tree storage plugin to the supported list. If the cluster operator only enables this flag,
214
+
they will get an error from PVC saying it cannot find the plugin when the plugin is used. The cluster operator
215
+
may want to enable this regardless of CSI Migration if they do not want to support the legacy in-tree APIs and
216
+
only support CSI going forward.
217
+
- The table below assumes `CSIMigration` is enabled whenever `CSIMigration{vendor}` is on, since if not, there will
218
+
be no effect to the behaviors. The table does not take into account feature gates on kube-scheduler, the feature gates
219
+
for it should be enabled align with kube-controller-manager otherwise the volume topology && volume limit function could
|`CSIMigration{vendor}` On |`CSIMigration{vendor}` On | Fully migrated. All operations serviced by CSI plugin. From user perspective, nothing changed. |
225
+
|`CSIMigration{vendor}` On |`CSIMigration{vendor}` Off |`InTreePlugin{vendor}Unregister` enabled on Kubelet: Broken state, Provision/Delete/Attach/Detach by CSI, Mount/Unmount not function. `InTreePlugin{vendor}Unregister` enabled on KCM: Provision/Deletion/Attach/Detach by CSI, Mount/Unmount by in-tree. `InTreePlugin{vendor}Unregister` disabled at all: Provision/Deletion by CSI, other operations by In-tree.|
226
+
|`CSIMigration{vendor}` Off |`CSIMigration{vendor}` On | Broken state. Operations like volume provision will still work. But operations like volume Attach/Mount will be broken |
227
+
|`CSIMigration{vendor}` Off |`CSIMigration{vendor}` Off | No behavior change |
203
228
204
229
***Can the feature be disabled once it has been enabled (i.e. can we roll back
205
230
the enablement)?**
206
-
Yes - can be disabled by disabling feature flags.
231
+
-Yes - can be disabled by disabling feature flags.
207
232
Please refer to the [upgrade/downgrade](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/storage/csi-migration.md#upgradedowngrade-migrateunmigrate-scenarios) sections on how to downgrade the cluster to roll back the enablement.
208
233
234
+
- For `InTreePlugin{vendor}Unregister`, yes we can disable the feature gate once we enabled. This will register the corresponding
235
+
in-tree storage plugin into the supported list and user will be able to use it to do all storage related operations again.
236
+
209
237
***What happens if we reenable the feature if it was previously rolled back?**
210
-
The CSI migration feature will start to work again. The out-of-tree CSI driver will start to work instead of in-tree plugin again.
238
+
- The CSI migration feature will start to work again. The out-of-tree CSI driver will start to work instead of in-tree plugin again.
239
+
- For `InTreePlugin{vendor}Unregister`, if we enabled the feature the plugin will not be supported. And when we reenable it, it will
240
+
again unregister the storage plugin at the component restart time and then the specific storage plugin will become unavailable again
241
+
to the end user. For any workload that is already using the in-tree plugin, running workloads will not be impacted. But new operations
242
+
like Provision/Deletion/Attach/Detach/Mount/Unmount will not be available if CSI migration for the specific plugin is not enabled.
211
243
212
244
***Are there any tests for feature enablement/disablement?**
213
245
We have CSI Migration e2e test for each plugin that are implemented and maintained by each driver maintainer.
@@ -229,11 +261,21 @@ Specifically, for each in-tree plugin corresponding CSI drivers, it will have
229
261
### Rollout, Upgrade and Rollback Planning
230
262
231
263
***How can a rollout fail? Can it impact already running workloads?**
232
-
- The rollout can fail if the ordering of `CSIMigration{cloud-provider}` flag was wrongly enabled on kubelet and kube-controller-manager. Specifically, if on the node side kubelet enables the flag and control-plane side the flag is not enabled, then the volume will not be able to be mounted successfully.
233
-
- For workloads that running on nodes have not enable CSI migration, those pods will not be impacted.
234
-
- For any pod that is being deleted by node drain before turning on migration and created on new node that has CSI migration turned on, the volume mount will fail and pod will not come up correctly.
235
-
- Additionally, CSI Migration has a strong dependency on CSI drivers. So if the in-tree corresponding CSI driver is not properly installed, any volume related operation could fail.
236
-
- If feature parity is not guaranteed or if any bug exists in the CSI driver/csi-translation-lib, the rollout could fail because pod using the PV could fail to execute provision/delete/attach/detach/mount/unmount/resize operations depend on the bug itself.
264
+
For `CSIMigration` and `CSIMigration{vendor}`
265
+
- The rollout can fail if the ordering of `CSIMigration{vendor}` flag was wrongly enabled on kubelet and kube-controller-manager. Specifically, if on the node side kubelet enables the flag and control-plane side the flag is not enabled, then the volume will not be able to be mounted successfully.
266
+
- For workloads that running on nodes have not enable CSI migration, those pods will not be impacted.
267
+
- For any pod that is being deleted by node drain before turning on migration and created on new node that has CSI migration turned on, the volume mount will fail and pod will not come up correctly.
268
+
- Additionally, CSI Migration has a strong dependency on CSI drivers. So if the in-tree corresponding CSI driver is not properly installed, any volume related operation could fail.
269
+
- If feature parity is not guaranteed or if any bug exists in the CSI driver/csi-translation-lib, the rollout could fail because pod using the PV could fail to execute provision/delete/attach/detach/mount/unmount/resize operations depend on the bug itself.
270
+
271
+
For `InTreePlugin{vendor}Unregister`
272
+
- rollout of the feature gate will not fail. The component(kube-controller-manager, kubelet) will be able to start
273
+
and running without failures.
274
+
- However, it can impact running workloads when the feature is enabled on clusters that still have running workloads using the
275
+
specific in-tree storage plugin, the further operations related to that volume(unmount/detach/delete) will all fail when CSI migration for that
276
+
plugin is not enabled. This is expected and user should not turn on this feature gate without CSI migration when there are still workloads using the
277
+
corresponding in-tree storage plugin.
278
+
- There will be no impact when the feature is disabled at cluster runtime with or without workloads.
237
279
238
280
***What specific metrics should inform a rollback?**
239
281
We have metrics on the CSI sidecar side called `csi_operation_duration_seconds` and core k8s metrics on both kube-controller-manager and kubelet side called `storage_operation_duration_seconds`.
@@ -250,7 +292,7 @@ For GA, we require such test exists in each driver's test CI.
250
292
***Is the rollout accompanied by any deprecations and/or removals of features, APIs,
251
293
fields of API types, flags, etc.?**
252
294
There will not be API removal in CSI migration itself. But eventually when CSI migration is all finished. We will plan to remove all in-tree plugins.
253
-
So we will have in-tree plugin deprecated when CSIMigration{cloud-provider} goes to beta. And code removal will be required eventually.
295
+
So we will have in-tree plugin deprecated when CSIMigration{vendor} goes to beta. And code removal will be required eventually.
254
296
In addition, some CSI drivers are not able to maintain 100% backwards compatibility, so those drivers need to deprecate certain behaviors.
0 commit comments