Skip to content

Commit d4b3f4b

Browse files
committed
KEP-2395: add prod readiness review for alpha
Signed-off-by: Andrew Sy Kim <[email protected]>
1 parent b864a3e commit d4b3f4b

File tree

2 files changed

+152
-1
lines changed
  • keps
    • prod-readiness/sig-cloud-provider
    • sig-cloud-provider/2395-removing-in-tree-cloud-providers

2 files changed

+152
-1
lines changed
Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
kep-number: 2395
2+
alpha:
3+
approver: "@wojtek-t"

keps/sig-cloud-provider/2395-removing-in-tree-cloud-providers/README.md

Lines changed: 149 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -109,7 +109,7 @@ In Phase 4, two feature gates will be introduced to gradually disable and remove
109109
1. `DisableCloudProviders` - this feature gate will disable any functionality in kube-apiserver, kube-controller-manager and kubelet related to the `--cloud-provider` component flag.
110110
2. `DisableKubeletCloudCredentialProvider` - this feature gate will disable in-tree functionality in the kubelet to authenticate to the AWS, Azure and GCP container registries for image pull credentials.
111111

112-
Both of these features gates does NOT include any functionality tied to the --cloud-provider flag, specifically in-tree volume plugins are not covered. Users should refer to CSI migration efforts for these.
112+
Both of these features gates only impacts functionality tied to the `--cloud-provider` flag, specifically in-tree volume plugins are not covered. Users should refer to CSI migration efforts for these.
113113

114114
For alpha, the feature gates will be used for testing purposes. When enabled, tests will ensure that clusters with in-tree cloud providers disabled behaves as expected. This is targeted for v1.21 and will be
115115
disabled by default.
@@ -186,6 +186,154 @@ import (
186186
)
187187
```
188188

189+
## Production Readiness Review Questionnaire
190+
191+
### Feature Enablement and Rollback
192+
193+
_This section must be completed when targeting alpha to a release._
194+
195+
* **How can this feature be enabled / disabled in a live cluster?**
196+
- [X] Feature gate (also fill in values in `kep.yaml`)
197+
- Feature gate name: DisableCloudProviders
198+
- Components depending on the feature gate: kubelet, kube-apiserver, kube-controller-manager
199+
- [X] Feature gate (also fill in values in `kep.yaml`)
200+
- Feature gate name: DisableKubeletCloudCredentialProvider
201+
- Components depending on the feature gate: kubelet
202+
203+
* **Does enabling the feature change any default behavior?**
204+
Yes, enabling this feature will disable all capabilities enabled when `--cloud-provider` is set in core components.
205+
Users need to ensure they have migrated to out-of-tree components prior to enabling this feature gate.
206+
If appropriate extensions (CCM, credential provider, apiserver-network-proxy, etc) are in use, cloud provider capabilities
207+
should remain the same at the very least.
208+
209+
* **Can the feature be disabled once it has been enabled (i.e. can we roll back
210+
the enablement)?**
211+
Yes, the feature can be disabled once it is enabled. If disabled, users must ensure
212+
that the CCM is no longer running in the cluster. Credential provider plugins and the
213+
apiserver network proxy do not have to be stopped on rollback.
214+
215+
* **What happens if we reenable the feature if it was previously rolled back?**
216+
217+
All capabilities from in-tree cloud providers will be re-disabled.
218+
219+
* **Are there any tests for feature enablement/disablement?**
220+
Adequate unit tests, component integration test and e2e tests will be added for this feature before
221+
it is goes beta and on by default.
222+
223+
### Rollout, Upgrade and Rollback Planning
224+
225+
_This section must be completed when targeting beta graduation to a release._
226+
227+
* **How can a rollout fail? Can it impact already running workloads?**
228+
229+
TBD for beta.
230+
231+
* **What specific metrics should inform a rollback?**
232+
233+
TBD for beta.
234+
235+
* **Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested?**
236+
237+
TBD for beta.
238+
239+
* **Is the rollout accompanied by any deprecations and/or removals of features, APIs,
240+
fields of API types, flags, etc.?**
241+
242+
TBD for beta.
243+
244+
### Monitoring Requirements
245+
246+
_This section must be completed when targeting beta graduation to a release._
247+
248+
* **How can an operator determine if the feature is in use by workloads?**
249+
250+
TBD for beta.
251+
252+
* **What are the SLIs (Service Level Indicators) an operator can use to determine
253+
the health of the service?**
254+
255+
TBD for beta.
256+
257+
* **What are the reasonable SLOs (Service Level Objectives) for the above SLIs?**
258+
259+
TBD for beta.
260+
261+
* **Are there any missing metrics that would be useful to have to improve observability
262+
of this feature?**
263+
264+
TBD for beta.
265+
266+
### Dependencies
267+
268+
_This section must be completed when targeting beta graduation to a release._
269+
270+
* **Does this feature depend on any specific services running in the cluster?**
271+
272+
TBD for beta.
273+
274+
275+
### Scalability
276+
277+
_For alpha, this section is encouraged: reviewers should consider these questions
278+
and attempt to answer them._
279+
280+
_For beta, this section is required: reviewers must answer these questions._
281+
282+
_For GA, this section is required: approvers should be able to confirm the
283+
previous answers based on experience in the field._
284+
285+
* **Will enabling / using this feature result in any new API calls?**
286+
287+
No, if anything it will result in reduced API calls in core components.
288+
289+
* **Will enabling / using this feature result in introducing new API types?**
290+
291+
No.
292+
293+
* **Will enabling / using this feature result in any new calls to the cloud
294+
provider?**
295+
296+
No, it will actually remove calls to the cloud provider in all core components.
297+
298+
* **Will enabling / using this feature result in increasing size or count of
299+
the existing API objects?**
300+
301+
No.
302+
303+
* **Will enabling / using this feature result in increasing time taken by any
304+
operations covered by [existing SLIs/SLOs]?**
305+
306+
No.
307+
308+
* **Will enabling / using this feature result in non-negligible increase of
309+
resource usage (CPU, RAM, disk, IO, ...) in any components?**
310+
311+
No. In fact, it should reduce resource usage.
312+
313+
### Troubleshooting
314+
315+
The Troubleshooting section currently serves the `Playbook` role. We may consider
316+
splitting it into a dedicated `Playbook` document (potentially with some monitoring
317+
details). For now, we leave it here.
318+
319+
_This section must be completed when targeting beta graduation to a release._
320+
321+
* **How does this feature react if the API server and/or etcd is unavailable?**
322+
323+
TBD for beta.
324+
325+
* **What are other known failure modes?**
326+
327+
TBD for beta.
328+
329+
* **What steps should be taken if SLOs are not being met to determine the problem?**
330+
331+
TBD for beta.
332+
333+
[supported limits]: https://git.k8s.io/community//sig-scalability/configs-and-limits/thresholds.md
334+
[existing SLIs/SLOs]: https://git.k8s.io/community/sig-scalability/slos/slos.md#kubernetes-slisslos
335+
336+
189337
## Alternatives
190338

191339
### Staging Alternatives

0 commit comments

Comments
 (0)