Skip to content

Commit 14d67b1

Browse files
authored
Merge pull request kubernetes#2966 from Jiawei0227/migration
KEP-625: Update feature gate for CSI Migration kep
2 parents 54e8bed + 3ba198d commit 14d67b1

File tree

2 files changed

+58
-11
lines changed

2 files changed

+58
-11
lines changed

keps/sig-storage/625-csi-migration/README.md

Lines changed: 52 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -194,20 +194,52 @@ you need any help or guidance.
194194

195195
* **How can this feature be enabled / disabled in a live cluster?**
196196
- [x] Feature gate (also fill in values in `kep.yaml`)
197-
- Feature gate name: CSIMigration, CSIMigration{cloud-provider}
197+
- Feature gate name: CSIMigration, CSIMigration{vendor}, InTreePlugin{vendor}Unregister
198198
- Components depending on the feature gate: kubelet, kube-controller-manager, kube-scheduler
199199
- Please refer to this design doc on the [Step to enable the feature](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/storage/csi-migration.md#upgradedowngrade-migrateunmigrate-scenarios)
200200

201201
* **Does enabling the feature change any default behavior?**
202-
Yes and No. If only `CSIMigration` feature flag is enabled, nothing will change on the cluster behavior. However, if `CSIMigration` && `CSIMigration{cloud-provider}` are both enabled, the behavior will change. The in-tree volume plugin that the cloud-provider use will be redirect to use the corresponding CSI driver. But from a user perspective, nothing will be noticed.
202+
203+
Yes and No.
204+
- If only `CSIMigration` feature flag is enabled, nothing will change on the cluster behavior. `CSIMigration`
205+
is a big umbrella feature gate. It takes control of vendor-agnostic controllers. Without this feature gate on,
206+
the entire CSI Migration feature is disabled.
207+
- If only `CSIMigration{vendor}` feature flag is enabled, nothing will change on the cluster behavior.
208+
This feature gate controls the vendor-specific logic.
209+
- Both `CSIMigration` and `CSIMigration{vendor}` need to be enabled on Kubernetes Components,
210+
including scheduler, KCM, Kubelet, for CSI Migration to take effect.
211+
- `InTreePlugin{vendor}Unregister` is a standalone feature gate that can be enabled and disabled
212+
even out of CSI Migration scope. The name speaks for itself, when enabled, the component will not
213+
register the specific in-tree storage plugin to the supported list. If the cluster operator only enables this flag,
214+
they will get an error from PVC saying it cannot find the plugin when the plugin is used. The cluster operator
215+
may want to enable this regardless of CSI Migration if they do not want to support the legacy in-tree APIs and
216+
only support CSI going forward.
217+
- The table below assumes `CSIMigration` is enabled whenever `CSIMigration{vendor}` is on, since if not, there will
218+
be no effect to the behaviors. The table does not take into account feature gates on kube-scheduler, the feature gates
219+
for it should be enabled align with kube-controller-manager otherwise the volume topology && volume limit function could
220+
be impacted.
221+
222+
| Kube-Controller-Manager| Kubelet | Expected Behavior Change |
223+
|------------------------|----------------------------------------------------|--------------------------------------------------------------------------|
224+
| `CSIMigration{vendor}` On | `CSIMigration{vendor}` On | Fully migrated. All operations serviced by CSI plugin. From user perspective, nothing changed. |
225+
| `CSIMigration{vendor}` On | `CSIMigration{vendor}` Off | `InTreePlugin{vendor}Unregister` enabled on Kubelet: Broken state, Provision/Delete/Attach/Detach by CSI, Mount/Unmount not function. `InTreePlugin{vendor}Unregister` enabled on KCM: Provision/Deletion/Attach/Detach by CSI, Mount/Unmount by in-tree. `InTreePlugin{vendor}Unregister` disabled at all: Provision/Deletion by CSI, other operations by In-tree.|
226+
| `CSIMigration{vendor}` Off | `CSIMigration{vendor}` On | Broken state. Operations like volume provision will still work. But operations like volume Attach/Mount will be broken |
227+
| `CSIMigration{vendor}` Off | `CSIMigration{vendor}` Off | No behavior change |
203228

204229
* **Can the feature be disabled once it has been enabled (i.e. can we roll back
205230
the enablement)?**
206-
Yes - can be disabled by disabling feature flags.
231+
- Yes - can be disabled by disabling feature flags.
207232
Please refer to the [upgrade/downgrade](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/storage/csi-migration.md#upgradedowngrade-migrateunmigrate-scenarios) sections on how to downgrade the cluster to roll back the enablement.
208233

234+
- For `InTreePlugin{vendor}Unregister`, yes we can disable the feature gate once we enabled. This will register the corresponding
235+
in-tree storage plugin into the supported list and user will be able to use it to do all storage related operations again.
236+
209237
* **What happens if we reenable the feature if it was previously rolled back?**
210-
The CSI migration feature will start to work again. The out-of-tree CSI driver will start to work instead of in-tree plugin again.
238+
- The CSI migration feature will start to work again. The out-of-tree CSI driver will start to work instead of in-tree plugin again.
239+
- For `InTreePlugin{vendor}Unregister`, if we enabled the feature the plugin will not be supported. And when we reenable it, it will
240+
again unregister the storage plugin at the component restart time and then the specific storage plugin will become unavailable again
241+
to the end user. For any workload that is already using the in-tree plugin, running workloads will not be impacted. But new operations
242+
like Provision/Deletion/Attach/Detach/Mount/Unmount will not be available if CSI migration for the specific plugin is not enabled.
211243

212244
* **Are there any tests for feature enablement/disablement?**
213245
We have CSI Migration e2e test for each plugin that are implemented and maintained by each driver maintainer.
@@ -229,11 +261,21 @@ Specifically, for each in-tree plugin corresponding CSI drivers, it will have
229261
### Rollout, Upgrade and Rollback Planning
230262

231263
* **How can a rollout fail? Can it impact already running workloads?**
232-
- The rollout can fail if the ordering of `CSIMigration{cloud-provider}` flag was wrongly enabled on kubelet and kube-controller-manager. Specifically, if on the node side kubelet enables the flag and control-plane side the flag is not enabled, then the volume will not be able to be mounted successfully.
233-
- For workloads that running on nodes have not enable CSI migration, those pods will not be impacted.
234-
- For any pod that is being deleted by node drain before turning on migration and created on new node that has CSI migration turned on, the volume mount will fail and pod will not come up correctly.
235-
- Additionally, CSI Migration has a strong dependency on CSI drivers. So if the in-tree corresponding CSI driver is not properly installed, any volume related operation could fail.
236-
- If feature parity is not guaranteed or if any bug exists in the CSI driver/csi-translation-lib, the rollout could fail because pod using the PV could fail to execute provision/delete/attach/detach/mount/unmount/resize operations depend on the bug itself.
264+
For `CSIMigration` and `CSIMigration{vendor}`
265+
- The rollout can fail if the ordering of `CSIMigration{vendor}` flag was wrongly enabled on kubelet and kube-controller-manager. Specifically, if on the node side kubelet enables the flag and control-plane side the flag is not enabled, then the volume will not be able to be mounted successfully.
266+
- For workloads that running on nodes have not enable CSI migration, those pods will not be impacted.
267+
- For any pod that is being deleted by node drain before turning on migration and created on new node that has CSI migration turned on, the volume mount will fail and pod will not come up correctly.
268+
- Additionally, CSI Migration has a strong dependency on CSI drivers. So if the in-tree corresponding CSI driver is not properly installed, any volume related operation could fail.
269+
- If feature parity is not guaranteed or if any bug exists in the CSI driver/csi-translation-lib, the rollout could fail because pod using the PV could fail to execute provision/delete/attach/detach/mount/unmount/resize operations depend on the bug itself.
270+
271+
For `InTreePlugin{vendor}Unregister`
272+
- rollout of the feature gate will not fail. The component(kube-controller-manager, kubelet) will be able to start
273+
and running without failures.
274+
- However, it can impact running workloads when the feature is enabled on clusters that still have running workloads using the
275+
specific in-tree storage plugin, the further operations related to that volume(unmount/detach/delete) will all fail when CSI migration for that
276+
plugin is not enabled. This is expected and user should not turn on this feature gate without CSI migration when there are still workloads using the
277+
corresponding in-tree storage plugin.
278+
- There will be no impact when the feature is disabled at cluster runtime with or without workloads.
237279

238280
* **What specific metrics should inform a rollback?**
239281
We have metrics on the CSI sidecar side called `csi_operation_duration_seconds` and core k8s metrics on both kube-controller-manager and kubelet side called `storage_operation_duration_seconds`.
@@ -250,7 +292,7 @@ For GA, we require such test exists in each driver's test CI.
250292
* **Is the rollout accompanied by any deprecations and/or removals of features, APIs,
251293
fields of API types, flags, etc.?**
252294
There will not be API removal in CSI migration itself. But eventually when CSI migration is all finished. We will plan to remove all in-tree plugins.
253-
So we will have in-tree plugin deprecated when CSIMigration{cloud-provider} goes to beta. And code removal will be required eventually.
295+
So we will have in-tree plugin deprecated when CSIMigration{vendor} goes to beta. And code removal will be required eventually.
254296
In addition, some CSI drivers are not able to maintain 100% backwards compatibility, so those drivers need to deprecate certain behaviors.
255297
- vSphere [kubernetes#98546](https://github.com/kubernetes/kubernetes/pull/98546).
256298
- Azure drivers links TBD.

keps/sig-storage/625-csi-migration/kep.yaml

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -44,7 +44,12 @@ feature-gates:
4444
- kube-controller-manager
4545
- kubelet
4646
- kube-scheduler
47-
- name: CSIMigration{cloud-provider}
47+
- name: CSIMigration{vendor}
48+
components:
49+
- kube-controller-manager
50+
- kubelet
51+
- kube-scheduler
52+
- name: InTreePlugin{vendor}Unregister
4853
components:
4954
- kube-controller-manager
5055
- kubelet

0 commit comments

Comments
 (0)