You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
-[Technical Leads are members of the Kubernetes Organization](#technical-leads-are-members-of-the-kubernetes-organization)
47
56
-[Subproject Leads](#subproject-leads)
@@ -50,18 +59,22 @@ status: implementable
50
59
51
60
## Release Signoff Checklist
52
61
53
-
-[X] k/enhancements issue in release milestone and linked to KEP (https://github.com/kubernetes/enhancements/issues/667)
54
-
-[X] KEP approvers have set the KEP status to `implementable`
55
-
-[X] Design details are appropriately documented
56
-
-[X] Test plan is in place, giving consideration to SIG Architecture and SIG Testing input
57
-
-[X] Graduation criteria is in place
58
-
-[X] "Implementation History" section is up-to-date for milestone
59
-
-[X] Supporting documentation e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes
62
+
Items marked with (R) are required *prior to targeting to a milestone / release*.
63
+
64
+
-[x] (R) Enhancement issue in release milestone, which links to KEP dir in [kubernetes/enhancements](https://github.com/kubernetes/enhancements/issues/667).
65
+
-[x] (R) KEP approvers have approved the KEP status as `implementable`
66
+
-[x] (R) Design details are appropriately documented
67
+
-[x] (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input
68
+
-[x] (R) Graduation criteria is in place
69
+
-[x] (R) Production readiness review completed
70
+
-[x] Production readiness review approved
71
+
-[x] "Implementation History" section is up-to-date for milestone
72
+
-[x] User-facing documentation has been created in [kubernetes-sigs/cloud-provider-azure](https://kubernetes-sigs.github.io/cloud-provider-azure/)
73
+
-[x] Supporting documentation—e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes
60
74
61
75
## Summary
62
76
63
-
Build support for the out-of-tree Azure cloud provider. This involves a well-tested version of the cloud-controller-manager
64
-
that has feature parity to the kube-controller-manager.
77
+
Build support for the out-of-tree Azure cloud provider. This involves a well-tested version of the cloud-controller-manager that has feature parity to the kube-controller-manager.
65
78
66
79
## Motivation
67
80
@@ -124,7 +137,7 @@ cloud-provider-azure/
124
137
125
138
- The core of Azure cloud provider would be moved to [kubernetes-sigs/cloud-provider-azure](https://github.com/kubernetes-sigs/cloud-provider-azure).
126
139
- The storage drivers would be moved to [kubernetes-sigs/azuredisk-csi-driver](https://github.com/kubernetes-sigs/azuredisk-csi-driver) and [kubernetes-sigs/azurefile-csi-driver](https://github.com/kubernetes-sigs/azurefile-csi-driver).
127
-
- The credential provider is still under discussion on [kubernetes/cloud-provider#13](https://github.com/kubernetes/cloud-provider/issues/13).
140
+
- The credential provider is tracked by out-of-tree credential provider[KEP](https://github.com/kubernetes/enhancements/blob/master/keps/sig-cloud-provider/20191004-out-of-tree-credential-providers.md) and it won't block the progress of this feature.
128
141
129
142
### Risks and Mitigation
130
143
@@ -162,13 +175,27 @@ See [report](https://testgrid.k8s.io/provider-azure-cloud-provider-azure) for mo
162
175
163
176
- Azure cloud controller manager is moving to GA
164
177
- Feature compatible with KCM
165
-
- Conformance tests are passed and published to testgrid
178
+
- Conformance tests are passed and published to [testgrid](https://testgrid.k8s.io/provider-azure-cloud-provider-azure)
166
179
- CSI drivers for AzureDisk/AzureFile are moving to GA
167
180
- Feature compatible with KCM
168
-
- Conformance tests are passed and published to testgrid
181
+
- Features implemented from CSI API SPEC
182
+
- Conformance tests are passed and published to [testgrid](https://testgrid.k8s.io/provider-azure-azuredisk-csi-driver)
169
183
- Azure credential provider is still supported in Kubelet
170
184
- Feature compatible with KCM
171
-
- Conformance tests are passed and published to testgrid
185
+
- Features implemented from CSI API SPEC
186
+
- Conformance tests are passed and published to [testgrid](https://testgrid.k8s.io/provider-azure-cloud-provider-azure)
187
+
188
+
#### Alpha -> Beta Graduation
189
+
190
+
- E2E tests have been added in [testgrid](https://testgrid.k8s.io/provider-azure-cloud-provider-azure)
191
+
- The same set of tests have been passed with out-of-tree projects
192
+
- All the features from in-tree implementations are still supported
193
+
194
+
#### Beta -> GA Graduation
195
+
196
+
- Code changes are decoupled from in-tree cloud provide (e.g. it shouldn't vendor in-tree implementations directly)
197
+
- E2E tests have been run stably (e.g. no flaky tests)
198
+
- Upgrade tests and scalability tests have been passed
172
199
173
200
### Upgrade / Downgrade Strategy
174
201
@@ -181,6 +208,136 @@ For each Kubernetes minor releases (e.g. v1.15.x), dedicated Azure cloud control
181
208
- The version matrix for Azure cloud controller manager would be documented on [kubernetes/cloud-provider-azure](https://github.com/kubernetes/cloud-provider-azure/blob/master/README.md#current-status).
182
209
- The version matrix for CSI drivers would be documented on [kubernetes-sigs/azuredisk-csi-driver](https://github.com/kubernetes-sigs/azuredisk-csi-driver#container-images--csi-compatibility) and [kubernetes-sigs/azurefile-csi-driver](https://github.com/kubernetes-sigs/azurefile-csi-driver#container-images--csi-compatibility).
183
210
211
+
## Production Readiness Review Questionnaire
212
+
213
+
### Feature Enablement and Rollback
214
+
215
+
_This section must be completed when targeting alpha to a release._
216
+
217
+
***How can this feature be enabled / disabled in a live cluster?**
218
+
-[x] Feature gate (also fill in values in `kep.yaml`)
219
+
- Feature gate name: CSIMigrationAzureDisk and CSIMigrationAzureFile
220
+
- Components depending on the feature gate: kube-controller-manager and kubelet
221
+
-[x] Other
222
+
- Describe the mechanism: deploy cloud-controller-manager, cloud-node-manager and CSI drivers in the cluster.
223
+
- Will enabling / disabling the feature require downtime of the control
224
+
plane? `--cloud-provider=external` should be set for kube-controller-manager.
225
+
- Will enabling / disabling the feature require downtime or reprovisioning
226
+
of a node? --cloud-provider=external` should be set for for kubelet.
227
+
228
+
***Does enabling the feature change any default behavior?**
229
+
230
+
The default behaviors are still same as before.
231
+
232
+
***Can the feature be disabled once it has been enabled (i.e. can we roll back
233
+
the enablement)?**
234
+
235
+
Yes. Delete the cloud-controller-manager and cloud-node-manager, then change the `--cloud-provider`
236
+
option back to `azure` would still work. CSI drivers should be kept to ensure CSI-provisioned PVCs are still working.
237
+
238
+
***What happens if we reenable the feature if it was previously rolled back?**
239
+
240
+
It would still work as expected.
241
+
242
+
***Are there any tests for feature enablement/disablement?**
243
+
244
+
E2E tests have already been added and results are published on testgrid.
245
+
246
+
### Rollout, Upgrade and Rollback Planning
247
+
248
+
_This section must be completed when targeting beta graduation to a release._
249
+
250
+
***How can a rollout fail? Can it impact already running workloads?**
251
+
252
+
Wrong component configurations may cause rollout fail, and running workloads won't be impacted.
253
+
254
+
***What specific metrics should inform a rollback?**
255
+
256
+
Couldn't create a LoadBalancer typed service or AzureDisk PVC indicate the rollout needs to rollback.
257
+
258
+
***Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested?**
259
+
260
+
Manually changing the `--cloud-provider` options have been verified. For upgrade->downgrade,
261
+
the volumes provisioned by CSI drivers should continue to be managed by CSI drivers. They're
262
+
not able to migrate to in-tree drivers.
263
+
264
+
***Is the rollout accompanied by any deprecations and/or removals of features, APIs,
265
+
fields of API types, flags, etc.?**
266
+
267
+
In-tree AzureDisk/AzureFile drivers would be migrated to CSI drivers automatically.
268
+
269
+
### Monitoring Requirements
270
+
271
+
_This section must be completed when targeting beta graduation to a release._
272
+
273
+
***How can an operator determine if the feature is in use by workloads?**
274
+
275
+
Operation specific metrics (e.g. LoadBalancer creation and route table update) have been added.
276
+
277
+
***What are the SLIs (Service Level Indicators) an operator can use to determine
278
+
the health of the service?**
279
+
-[x] Metrics
280
+
- Metric names:
281
+
- cloudprovider_azure_op_duration_seconds
282
+
- cloudprovider_azure_api_request_errors
283
+
- cloudprovider_azure_api_request_throttled_count
284
+
- cloudprovider_azure_op_duration_seconds_bucket
285
+
- Components exposing the metric: cloud-controller-manager and CSI drivers
286
+
287
+
***What are the reasonable SLOs (Service Level Objectives) for the above SLIs?**
288
+
289
+
- 99.5% of read and write ARM requests in the last 5 minutes were successful
290
+
- LoadBalancer service requests in the last 5 minutes are served in 60 seconds @99th percentile
291
+
- Routes for new nodes in the last 5 minutes are served in 90 seconds @99th percentile
292
+
- Disk PVC attach requests in the last 5 minutes are served in 60 seconds @99th percentile
293
+
294
+
### Dependencies
295
+
296
+
_This section must be completed when targeting beta graduation to a release._
297
+
298
+
***Does this feature depend on any specific services running in the cluster?**
299
+
300
+
CSI drivers for AzureDisk/AzureFile are required for out-of-tree cloud provider,
301
+
and their plans has already been added in above designs.
302
+
303
+
### Scalability
304
+
305
+
_For alpha, this section is encouraged: reviewers should consider these questions
306
+
and attempt to answer them._
307
+
308
+
_For beta, this section is required: reviewers must answer these questions._
309
+
310
+
_For GA, this section is required: approvers should be able to confirm the
311
+
previous answers based on experience in the field._
312
+
313
+
***Will enabling / using this feature result in any new API calls?**
314
+
315
+
Yes, CSI drivers for AzureDisk/AzureFile would be introduced.
316
+
317
+
***Will enabling / using this feature result in introducing new API types?**
318
+
319
+
Yes, CSI drivers AzureDisk/AzureFile would be introduced.
320
+
321
+
### Troubleshooting
322
+
323
+
The Troubleshooting section currently serves the `Playbook` role. We may consider
324
+
splitting it into a dedicated `Playbook` document (potentially with some monitoring
325
+
details). For now, we leave it here.
326
+
327
+
_This section must be completed when targeting beta graduation to a release._
328
+
329
+
***How does this feature react if the API server and/or etcd is unavailable?**
***What steps should be taken if SLOs are not being met to determine the problem?**
338
+
339
+
Check the debug logs of cloud-provider-azure since detailed steps are logged in debug level.
340
+
184
341
## Implementation History
185
342
186
343
See [kubernetes/cloud-provider-azure#pulls](https://github.com/kubernetes/cloud-provider-azure/pulls?utf8=%E2%9C%93&q=+is%3Apr+), [kubernetes-sigs/azuredisk-csi-driver#pulls](https://github.com/kubernetes-sigs/azuredisk-csi-driver/pulls?utf8=%E2%9C%93&q=is%3Apr++) and [kubernetes-sigs/azurefile-csi-driver#pulls](https://github.com/kubernetes-sigs/azurefile-csi-driver/pulls?utf8=%E2%9C%93&q=is%3Apr++).
0 commit comments