Skip to content

Commit a042b02

Browse files
authored
Merge pull request kubernetes#2443 from andrewsykim/cloud-provider-feature-gates
KEP-2395 Introduce feature gates DisableCloudProviders and DisableKubeletCloudCredentialProvider
2 parents 6659091 + 49e40d6 commit a042b02

File tree

3 files changed

+199
-1
lines changed

3 files changed

+199
-1
lines changed
Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
kep-number: 2395
2+
alpha:
3+
approver: "@wojtek-t"

keps/sig-cloud-provider/2395-removing-in-tree-cloud-providers/README.md

Lines changed: 174 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -13,8 +13,16 @@
1313
- [Phase 1 - Moving Cloud Provider Code to Staging](#phase-1---moving-cloud-provider-code-to-staging)
1414
- [Phase 2 - Building CCM from Provider Repos](#phase-2---building-ccm-from-provider-repos)
1515
- [Phase 3 - Migrating Provider Code to Provider Repos](#phase-3---migrating-provider-code-to-provider-repos)
16+
- [Phase 4 - Disabling In-Tree Providers](#phase-4---disabling-in-tree-providers)
1617
- [Staging Directory](#staging-directory)
1718
- [Cloud Provider Instances](#cloud-provider-instances)
19+
- [Production Readiness Review Questionnaire](#production-readiness-review-questionnaire)
20+
- [Feature Enablement and Rollback](#feature-enablement-and-rollback)
21+
- [Rollout, Upgrade and Rollback Planning](#rollout-upgrade-and-rollback-planning)
22+
- [Monitoring Requirements](#monitoring-requirements)
23+
- [Dependencies](#dependencies)
24+
- [Scalability](#scalability)
25+
- [Troubleshooting](#troubleshooting)
1826
- [Alternatives](#alternatives)
1927
- [Staging Alternatives](#staging-alternatives)
2028
- [Git Filter-Branch](#git-filter-branch)
@@ -99,10 +107,27 @@ The kube-controller-manager will still import the cloud provider implementations
99107

100108
#### Phase 3 - Migrating Provider Code to Provider Repos
101109

102-
In Phase 3, all code in `k8s.io/kubernetes/staging/src/k8s.io/legacy-cloud-providers/<provider>` will be removed and development of each cloud provider should be done in their respective external repos. It's important that by this phase, both in-tree and out-of-tree cloud providers are tested and production ready. Ideally most Kubernetes clusters in production should be using the out-of-tree provider before in-tree support is removed. A plan to migrate existing clusters from using the `kube-controller-manager` to the `cloud-controller-manager` is currently being developed. More details soon.
110+
In Phase 3, feature development is no longer accepted in `k8s.io/kubernetes/staging/src/k8s.io/legacy-cloud-providers/<provider>` and development of each cloud provider should be done in their respective external repos. Only bug and security fixes are accepted in-tree during this phase. It's important that by this phase, both in-tree and out-of-tree cloud providers are tested and production ready. Ideally most Kubernetes clusters in production should be using the out-of-tree provider before in-tree support is removed. A plan to migrate existing clusters from using the `kube-controller-manager` to the `cloud-controller-manager` is currently being developed. More details soon.
103111

104112
External cloud providers can optionally still import providers from `k8s.io/legacy-cloud-providers` but no core components in `k8s.io/kubernetes` will import the legacy provider and the respective staging directory will be removed along with all its dependencies.
105113

114+
#### Phase 4 - Disabling In-Tree Providers
115+
116+
In Phase 4, two feature gates will be introduced to gradually disable and remove in-tree cloud providers:
117+
1. `DisableCloudProviders` - this feature gate will disable any functionality in kube-apiserver, kube-controller-manager and kubelet related to the `--cloud-provider` component flag.
118+
2. `DisableKubeletCloudCredentialProvider` - this feature gate will disable in-tree functionality in the kubelet to authenticate to the AWS, Azure and GCP container registries for image pull credentials.
119+
120+
Both of these features gates only impacts functionality tied to the `--cloud-provider` flag, specifically in-tree volume plugins are not covered. Users should refer to CSI migration efforts for these.
121+
122+
For alpha, the feature gates will be used for testing purposes. When enabled, tests will ensure that clusters with in-tree cloud providers disabled behaves as expected. This is targeted for v1.21 and will be
123+
disabled by default.
124+
125+
For beta, the feature gates will be on by default, meaning core components will disallow use of in-tree cloud providers. This will act as a warning for users to migrate to external components. Users may
126+
choose to continue using the in-tree provider by explicitly disabling the feature gates. Beta is targeted for v1.23 or v1.24.
127+
128+
For GA, the feature gate will be enabled by default and locked. Users at this point MUST migrate to external components and use of the in-tree cloud providers will be disallowed. One release after GA,
129+
the in-tree cloud providers can be safely removed. GA is targeted for v1.25 or v1.26.
130+
106131
### Staging Directory
107132

108133
There are several sections of code which need to be shared between the K8s/K8s repo and the K8s/Cloud-provider repos.
@@ -169,6 +194,154 @@ import (
169194
)
170195
```
171196

197+
## Production Readiness Review Questionnaire
198+
199+
### Feature Enablement and Rollback
200+
201+
_This section must be completed when targeting alpha to a release._
202+
203+
* **How can this feature be enabled / disabled in a live cluster?**
204+
- [X] Feature gate (also fill in values in `kep.yaml`)
205+
- Feature gate name: DisableCloudProviders
206+
- Components depending on the feature gate: kubelet, kube-apiserver, kube-controller-manager
207+
- [X] Feature gate (also fill in values in `kep.yaml`)
208+
- Feature gate name: DisableKubeletCloudCredentialProvider
209+
- Components depending on the feature gate: kubelet
210+
211+
* **Does enabling the feature change any default behavior?**
212+
Yes, enabling this feature will disable all capabilities enabled when `--cloud-provider` is set in core components.
213+
Users need to ensure they have migrated to out-of-tree components prior to enabling this feature gate.
214+
If appropriate extensions (CCM, credential provider, apiserver-network-proxy, etc) are in use, cloud provider capabilities
215+
should remain the same at the very least.
216+
217+
* **Can the feature be disabled once it has been enabled (i.e. can we roll back
218+
the enablement)?**
219+
Yes, the feature can be disabled once it is enabled. If disabled, users must ensure
220+
that the CCM is no longer running in the cluster. Credential provider plugins and the
221+
apiserver network proxy do not have to be stopped on rollback.
222+
223+
* **What happens if we reenable the feature if it was previously rolled back?**
224+
225+
All capabilities from in-tree cloud providers will be re-disabled.
226+
227+
* **Are there any tests for feature enablement/disablement?**
228+
Adequate unit tests, component integration test and e2e tests will be added for this feature before
229+
it is goes beta and on by default.
230+
231+
### Rollout, Upgrade and Rollback Planning
232+
233+
_This section must be completed when targeting beta graduation to a release._
234+
235+
* **How can a rollout fail? Can it impact already running workloads?**
236+
237+
TBD for beta.
238+
239+
* **What specific metrics should inform a rollback?**
240+
241+
TBD for beta.
242+
243+
* **Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested?**
244+
245+
TBD for beta.
246+
247+
* **Is the rollout accompanied by any deprecations and/or removals of features, APIs,
248+
fields of API types, flags, etc.?**
249+
250+
TBD for beta.
251+
252+
### Monitoring Requirements
253+
254+
_This section must be completed when targeting beta graduation to a release._
255+
256+
* **How can an operator determine if the feature is in use by workloads?**
257+
258+
TBD for beta.
259+
260+
* **What are the SLIs (Service Level Indicators) an operator can use to determine
261+
the health of the service?**
262+
263+
TBD for beta.
264+
265+
* **What are the reasonable SLOs (Service Level Objectives) for the above SLIs?**
266+
267+
TBD for beta.
268+
269+
* **Are there any missing metrics that would be useful to have to improve observability
270+
of this feature?**
271+
272+
TBD for beta.
273+
274+
### Dependencies
275+
276+
_This section must be completed when targeting beta graduation to a release._
277+
278+
* **Does this feature depend on any specific services running in the cluster?**
279+
280+
TBD for beta.
281+
282+
283+
### Scalability
284+
285+
_For alpha, this section is encouraged: reviewers should consider these questions
286+
and attempt to answer them._
287+
288+
_For beta, this section is required: reviewers must answer these questions._
289+
290+
_For GA, this section is required: approvers should be able to confirm the
291+
previous answers based on experience in the field._
292+
293+
* **Will enabling / using this feature result in any new API calls?**
294+
295+
No, if anything it will result in reduced API calls in core components.
296+
297+
* **Will enabling / using this feature result in introducing new API types?**
298+
299+
No.
300+
301+
* **Will enabling / using this feature result in any new calls to the cloud
302+
provider?**
303+
304+
No, it will actually remove calls to the cloud provider in all core components.
305+
306+
* **Will enabling / using this feature result in increasing size or count of
307+
the existing API objects?**
308+
309+
No.
310+
311+
* **Will enabling / using this feature result in increasing time taken by any
312+
operations covered by [existing SLIs/SLOs]?**
313+
314+
No.
315+
316+
* **Will enabling / using this feature result in non-negligible increase of
317+
resource usage (CPU, RAM, disk, IO, ...) in any components?**
318+
319+
No. In fact, it should reduce resource usage.
320+
321+
### Troubleshooting
322+
323+
The Troubleshooting section currently serves the `Playbook` role. We may consider
324+
splitting it into a dedicated `Playbook` document (potentially with some monitoring
325+
details). For now, we leave it here.
326+
327+
_This section must be completed when targeting beta graduation to a release._
328+
329+
* **How does this feature react if the API server and/or etcd is unavailable?**
330+
331+
TBD for beta.
332+
333+
* **What are other known failure modes?**
334+
335+
TBD for beta.
336+
337+
* **What steps should be taken if SLOs are not being met to determine the problem?**
338+
339+
TBD for beta.
340+
341+
[supported limits]: https://git.k8s.io/community//sig-scalability/configs-and-limits/thresholds.md
342+
[existing SLIs/SLOs]: https://git.k8s.io/community/sig-scalability/slos/slos.md#kubernetes-slisslos
343+
344+
172345
## Alternatives
173346

174347
### Staging Alternatives

keps/sig-cloud-provider/2395-removing-in-tree-cloud-providers/kep.yaml

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,29 @@ reviewers:
2020
approvers:
2121
- "@thockin"
2222
- "@liggit"
23+
prr-approvers:
24+
- "@wojtek-t"
2325
editor: TBD
2426
creation-date: 2018-12-18
2527
last-updated: 2019-04-11
2628
status: implementable
29+
30+
# The most recent milestone for which work toward delivery of this KEP has been
31+
# done. This can be the current (upcoming) milestone, if it is being actively
32+
# worked on.
33+
latest-milestone: "v1.21"
34+
35+
stage: alpha
36+
37+
# The following PRR answers are required at alpha release
38+
# List the feature gate name and the components for which it must be enabled
39+
feature-gates:
40+
- name: DisableCloudProviders
41+
components:
42+
- kubelet
43+
- kube-apiserver
44+
- kube-controller-manager
45+
- name: DisableKubeletCloudCredentialProviders
46+
components:
47+
- kubelet
48+
disable-supported: true

0 commit comments

Comments
 (0)