Skip to content

Commit 8f19839

Browse files
authored
Merge pull request kubernetes#2066 from andrewsykim/kep-1864
KEP-1864: add prod readiness review and beta milestone
2 parents 2a33899 + 96999b0 commit 8f19839

File tree

3 files changed

+199
-10
lines changed

3 files changed

+199
-10
lines changed
Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
kep-number: 1864
2+
beta:
3+
approver: "@johnbelamaric"

keps/sig-network/1864-disable-lb-node-ports/README.md

Lines changed: 189 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -12,9 +12,17 @@
1212
- [Test Plan](#test-plan)
1313
- [Graduation Criteria](#graduation-criteria)
1414
- [Alpha](#alpha)
15+
- [Beta](#beta)
1516
- [GA](#ga)
1617
- [Upgrade / Downgrade Strategy](#upgrade--downgrade-strategy)
1718
- [Version Skew Strategy](#version-skew-strategy)
19+
- [Production Readiness Review Questionnaire](#production-readiness-review-questionnaire)
20+
- [Feature Enablement and Rollback](#feature-enablement-and-rollback)
21+
- [Rollout, Upgrade and Rollback Planning](#rollout-upgrade-and-rollback-planning)
22+
- [Monitoring Requirements](#monitoring-requirements)
23+
- [Dependencies](#dependencies)
24+
- [Scalability](#scalability)
25+
- [Troubleshooting](#troubleshooting)
1826
- [Implementation History](#implementation-history)
1927
- [Drawbacks](#drawbacks)
2028
- [Alternatives](#alternatives)
@@ -27,11 +35,11 @@
2735
Items marked with (R) are required *prior to targeting to a milestone / release*.
2836

2937
- [X] (R) Enhancement issue in release milestone, which links to KEP dir in [kubernetes/enhancements] (not the initial KEP PR)
30-
- [ ] (R) KEP approvers have approved the KEP status as `implementable`
31-
- [ ] (R) Design details are appropriately documented
32-
- [ ] (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input
33-
- [ ] (R) Graduation criteria is in place
34-
- [ ] (R) Production readiness review completed
38+
- [X] (R) KEP approvers have approved the KEP status as `implementable`
39+
- [X] (R) Design details are appropriately documented
40+
- [X] (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input
41+
- [X] (R) Graduation criteria is in place
42+
- [X] (R) Production readiness review completed
3543
- [ ] Production readiness review approved
3644
- [ ] "Implementation History" section is up-to-date for milestone
3745
- [ ] User-facing documentation has been created in [kubernetes/website], for publication to [kubernetes.io]
@@ -115,12 +123,19 @@ E2E tests:
115123

116124
### Alpha
117125

118-
Adds new field `allocateLoadBalancerNodePorts` to Service but not implemented, this allows for rollback.
126+
* Adds new field `allocateLoadBalancerNodePorts` to Service, but the field is dropped unless an existing Service has the field set already.
127+
* Only allow the field `allocateLoadBalancerNodePorts` to be set when the feature gate is on.
128+
* There are sufficient unit tests exercising API strategy with the feature gate enabled / disabled.
119129

120-
### GA
130+
### Beta
131+
132+
* E2E tests checking that node ports do not get allocated when `service.spec.allocateLoadBalancerNodePorts=false`.
133+
* Feature gate is on by default.
121134

122-
Feature is enabled when field is set.
135+
### GA
123136

137+
* Feature gate is on by default and locked.
138+
* To safely handle rollback, there has been at least 1 release prior where apiserver understands the new field (covered in alpha).
124139

125140
### Upgrade / Downgrade Strategy
126141

@@ -136,6 +151,172 @@ re-enabling node port should not cause any traffic disruptions.
136151
Version skew from the control plane to kube-proxy should be trivial since kube-proxy's behavior is driven by the `nodePort` field
137152
and not the `allocateLoadBalancerNodePorts` field.
138153

154+
## Production Readiness Review Questionnaire
155+
156+
### Feature Enablement and Rollback
157+
158+
_This section must be completed when targeting alpha to a release._
159+
160+
* **How can this feature be enabled / disabled in a live cluster?**
161+
- [X] Feature gate (also fill in values in `kep.yaml`)
162+
- Feature gate name: ServiceLBNodePortControl
163+
- Components depending on the feature gate: kube-apiserver
164+
- [ ] Other
165+
- Describe the mechanism:
166+
- Will enabling / disabling the feature require downtime of the control
167+
plane?
168+
- Will enabling / disabling the feature require downtime or reprovisioning
169+
of a node? (Do not assume `Dynamic Kubelet Config` feature is enabled).
170+
171+
* **Does enabling the feature change any default behavior?**
172+
173+
No, enabling the feature gate but not setting `spec.allocateLoadBalancerNodePorts` will not
174+
change any default behaviors in Service.
175+
176+
* **Can the feature be disabled once it has been enabled (i.e. can we roll back
177+
the enablement)?**
178+
179+
Yes, if the feature gate is disabled, new Services cannot use the new field, but existing Services
180+
already using the field will continue to have it set. Updates to existing fields are allowed.
181+
182+
* **What happens if we reenable the feature if it was previously rolled back?**
183+
184+
The existing value for `spec.allocateLoadBalancerNodePorts` will remain intact since API strategy
185+
will not drop fields if existing resources have it set.
186+
187+
* **Are there any tests for feature enablement/disablement?**
188+
189+
Yes, there will be unit tests for the Service API strategy which exercises the behavior
190+
with the feature gate enabled and disabled.
191+
192+
### Rollout, Upgrade and Rollback Planning
193+
194+
_This section must be completed when targeting beta graduation to a release._
195+
196+
* **How can a rollout fail? Can it impact already running workloads?**
197+
198+
* By default this should not impact any existing Services since we are not changing any default behaviors.
199+
* Enabling this feature on new clusters can impact workloads if load balancers depend on node ports without users
200+
being aware.
201+
202+
* **What specific metrics should inform a rollback?**
203+
204+
Metrics for node port counts will vary for Service LoadBalancers that set `spec.allocateLoadBalancerNodeports=false`.
205+
If load balancers are misbehaving at the same time node port allocation metric is decreasing, the user may want to
206+
consider rolling back this feature.
207+
208+
* **Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested?**
209+
210+
No, upgrade->downgrade->upgrade has not been tested yet. Like any new API field, on downgrade
211+
any existing Services using the field will continue to have the field set. For these Services,
212+
they will not have node ports allocated. New Services cannot use the new field unless the feature
213+
gate is enabled in the old version when the feature was alpha.
214+
215+
Manual validation of this behavior should be done prior to promoting this feature to beta.
216+
217+
* **Is the rollout accompanied by any deprecations and/or removals of features, APIs,
218+
fields of API types, flags, etc.?**
219+
220+
No.
221+
222+
### Monitoring Requirements
223+
224+
_This section must be completed when targeting beta graduation to a release._
225+
226+
* **How can an operator determine if the feature is in use by workloads?**
227+
228+
Service should have `spec.allocateLoadBalancerNodePorts=false` and Service LoadBalancers will not have node ports allocated.
229+
230+
* **What are the SLIs (Service Level Indicators) an operator can use to determine
231+
the health of the service?**
232+
233+
N/A
234+
235+
* **What are the reasonable SLOs (Service Level Objectives) for the above SLIs?**
236+
237+
N/A
238+
239+
* **Are there any missing metrics that would be useful to have to improve observability
240+
of this feature?**
241+
242+
N/A
243+
244+
### Dependencies
245+
246+
_This section must be completed when targeting beta graduation to a release._
247+
248+
* **Does this feature depend on any specific services running in the cluster?**
249+
250+
This feature is dependent on the Service LoadBalancer implementation of a cluster. This feature
251+
should only be used if the load balancer implementation does not need node ports for the load balancer
252+
data path.
253+
254+
255+
### Scalability
256+
257+
_For alpha, this section is encouraged: reviewers should consider these questions
258+
and attempt to answer them._
259+
260+
_For beta, this section is required: reviewers must answer these questions._
261+
262+
_For GA, this section is required: approvers should be able to confirm the
263+
previous answers based on experience in the field._
264+
265+
* **Will enabling / using this feature result in any new API calls?**
266+
Describe them, providing:
267+
268+
No, enabling this feature should actually reduce the number of operations, since
269+
the feature is to disable an existing behavior with node ports.
270+
271+
* **Will enabling / using this feature result in introducing new API types?**
272+
273+
No
274+
275+
* **Will enabling / using this feature result in any new calls to the cloud
276+
provider?**
277+
278+
No
279+
280+
* **Will enabling / using this feature result in increasing size or count of
281+
the existing API objects?**
282+
283+
No
284+
285+
* **Will enabling / using this feature result in increasing time taken by any
286+
operations covered by [existing SLIs/SLOs]?**
287+
288+
No
289+
290+
* **Will enabling / using this feature result in non-negligible increase of
291+
resource usage (CPU, RAM, disk, IO, ...) in any components?**
292+
293+
No
294+
295+
### Troubleshooting
296+
297+
The Troubleshooting section currently serves the `Playbook` role. We may consider
298+
splitting it into a dedicated `Playbook` document (potentially with some monitoring
299+
details). For now, we leave it here.
300+
301+
_This section must be completed when targeting beta graduation to a release._
302+
303+
* **How does this feature react if the API server and/or etcd is unavailable?**
304+
305+
Not any different from when node ports are used for load balancers.
306+
307+
* **What are other known failure modes?**
308+
309+
If `service.spec.allocateLoadBalancerNodePorts=false` but the load balancer implementation does depend on node ports.
310+
311+
* **What steps should be taken if SLOs are not being met to determine the problem?**
312+
313+
In a scenario where a user sets `service.spec.allocateLoadBalancerNodePorts=false` but the load balancer does require node ports,
314+
the user can re-enable node ports for a Service by setting `service.spec.allocateLoadBalancerNodePorts` back to `true`.
315+
This will trigger node port allocation from kube-apiserver.
316+
317+
[supported limits]: https://git.k8s.io/community//sig-scalability/configs-and-limits/thresholds.md
318+
[existing SLIs/SLOs]: https://git.k8s.io/community/sig-scalability/slos/slos.md#kubernetes-slisslos
319+
139320
## Implementation History
140321

141322
- 2020-06-17: KEP is proposed as implementable

keps/sig-network/1864-disable-lb-node-ports/kep.yaml

Lines changed: 7 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -17,16 +17,21 @@ prr-approvers:
1717
- "@johnbelamaric"
1818

1919
# The target maturity stage in the current dev cycle for this KEP.
20-
stage: stable
20+
stage: beta
2121

2222
# The most recent milestone for which work toward delivery of this KEP has been
2323
# done. This can be the current (upcoming) milestone, if it is being actively
2424
# worked on.
25-
latest-milestone: "v1.20"
25+
latest-milestone: "v1.21"
2626

2727
# The milestone at which this feature was, or is targeted to be, at each stage.
2828
milestone:
2929
alpha: "v1.20"
3030
beta: "v1.21"
3131
stable: "v1.22"
3232

33+
feature-gates:
34+
- name: ServiceLBNodePortControl
35+
components:
36+
- kube-apiserver
37+
disable-supported: true

0 commit comments

Comments
 (0)