Skip to content

Commit 5740d57

Browse files
authored
Merge pull request kubernetes#3168 from andrewsykim/kep-1672
KEP-1672: updates for v1.24
2 parents 5320deb + b95984c commit 5740d57

File tree

3 files changed

+45
-14
lines changed

3 files changed

+45
-14
lines changed
Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,5 @@
11
kep-number: 1672
22
alpha:
33
approver: "@wojtek-t"
4+
beta:
5+
approver: "@wojtek-t"

keps/sig-network/1672-tracking-terminating-endpoints/README.md

Lines changed: 40 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,7 @@
1616
- [Graduation Criteria](#graduation-criteria)
1717
- [Alpha](#alpha)
1818
- [Beta](#beta)
19+
- [GA](#ga)
1920
- [Upgrade / Downgrade Strategy](#upgrade--downgrade-strategy)
2021
- [Version Skew Strategy](#version-skew-strategy)
2122
- [Production Readiness Review Questionnaire](#production-readiness-review-questionnaire)
@@ -32,13 +33,13 @@
3233
## Release Signoff Checklist
3334

3435
- [X] Enhancement issue in release milestone, which links to KEP dir in [kubernetes/enhancements] (not the initial KEP PR)
35-
- [ ] KEP approvers have approved the KEP status as `implementable`
36-
- [ ] Design details are appropriately documented
37-
- [ ] Test plan is in place, giving consideration to SIG Architecture and SIG Testing input
38-
- [ ] Graduation criteria is in place
39-
- [ ] "Implementation History" section is up-to-date for milestone
40-
- [ ] User-facing documentation has been created in [kubernetes/website], for publication to [kubernetes.io]
41-
- [ ] Supporting documentation e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes
36+
- [X] KEP approvers have approved the KEP status as `implementable`
37+
- [X] Design details are appropriately documented
38+
- [X] Test plan is in place, giving consideration to SIG Architecture and SIG Testing input
39+
- [X] Graduation criteria is in place
40+
- [X] "Implementation History" section is up-to-date for milestone
41+
- [X] User-facing documentation has been created in [kubernetes/website], for publication to [kubernetes.io]
42+
- [X] Supporting documentation e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes
4243

4344
[kubernetes.io]: https://kubernetes.io/
4445
[kubernetes/enhancements]: https://git.k8s.io/enhancements
@@ -164,6 +165,13 @@ E2E tests:
164165
* `EndpointSliceTerminatingCondition` is enabled by default.
165166
* Consensus on scalability implications resulting from additional EndpointSlice writes with approval from sig-scalability.
166167

168+
#### GA
169+
170+
* E2E tests validating that terminating pods are properly reflected in EndpointSlice API.
171+
* Ensure there are no performance/scalability regressions when enabling additional endpointslice writes for terminating endpoints.
172+
* This will be validated by running the existing scalability test suites where pods handle SIGTERM from kubelet before terminating.
173+
* All necessary metrics are in place to provide adequate observability and monitoring for this feature.
174+
167175
### Upgrade / Downgrade Strategy
168176

169177
Since this is an addition to the EndpointSlice API, the upgrade/downgrade strategy will follow that
@@ -200,6 +208,10 @@ EndpointSlice will continue to have the `terminating` and `serving` condition se
200208

201209
Yes, there will be strategy API unit tests validating if the new API field is allowed based on the feature gate.
202210

211+
These tests can be found here:
212+
- https://github.com/kubernetes/kubernetes/blob/master/test/integration/endpointslice/endpointsliceterminating_test.go#L44
213+
- https://github.com/kubernetes/kubernetes/blob/master/pkg/registry/discovery/endpointslice/strategy_test.go#L42-L137
214+
203215
### Rollout, Upgrade and Rollback Planning
204216

205217
###### How can a rollout fail? Can it impact already running workloads?
@@ -209,7 +221,15 @@ It is assumed that almost all consumers of EndpointSlice check the `ready` condi
209221

210222
###### What specific metrics should inform a rollback?
211223

212-
Application-level traffic indicating packet-loss or error rates.
224+
EndpointSlice controller supports the following metrics that would be relevant for this feature:
225+
- endpoint_slice_controller_endpoints_added_per_sync
226+
- endpoint_slice_controller_endpoints_removed_per_sync
227+
- endpoint_slice_controller_changes
228+
- endpoint_slice_controller_endpointslices_changed_per_sync
229+
- endpoint_slice_controller_syncs
230+
231+
The following metrics can be used to see if the introduction of this change resulted in a significantly
232+
large number of traffic to the apiserver.
213233

214234
###### Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested?
215235

@@ -228,21 +248,29 @@ on how the new conditions are being used.
228248

229249
###### What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service?
230250

231-
Metrics will be added for total endpoints with the `serving` and `terminating` condition set.
251+
The existing SLI can be used to determine the health of this feature:
252+
253+
```
254+
Latency of programming in-cluster load balancing mechanism (e.g. iptables), measured from when service spec or list of its Ready pods change to when it is reflected in load balancing mechanism, measured as 99th percentile over last 5 minutes aggregated across all programmers
255+
```
232256

233257
###### What are the reasonable SLOs (Service Level Objectives) for the above SLIs?
234258

235-
N/A
259+
It's hard to gauge an exact number here, because the existing SLI does not have a target SLO yet.
260+
However, we should assume that the addition of the `serving` and `terminating` conditions do not
261+
significantly impact the latency of kube-proxy syncing load balancer rules.
236262

237263
###### Are there any missing metrics that would be useful to have to improve observability of this feature?
238264

239-
N/A
265+
Adapting the existing endpoint slice controller metrics to also include endpoint conditions
266+
as a label could be useful since a user can distinguish if the endpoint churn is happening due
267+
to the addition of terminating endpoints or for another reason.
240268

241269
### Dependencies
242270

243271
###### Does this feature depend on any specific services running in the cluster?
244272

245-
N/A
273+
None aside from the existing core Kubernetes components, specifically kube-apiserver and kube-controller-manager.
246274

247275
### Scalability
248276

keps/sig-network/1672-tracking-terminating-endpoints/kep.yaml

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -20,16 +20,17 @@ see-also:
2020
replaces: []
2121

2222
# The target maturity stage in the current dev cycle for this KEP.
23-
stage: alpha
23+
stage: beta
2424

2525
# The most recent milestone for which work toward delivery of this KEP has been
2626
# done. This can be the current (upcoming) milestone, if it is being actively
2727
# worked on.
28-
latest-milestone: "v1.22"
28+
latest-milestone: "v1.24"
2929

3030
# The milestone at which this feature was, or is targeted to be, at each stage.
3131
milestone:
3232
alpha: "v1.20"
33+
beta: "v1.22"
3334

3435
# The following PRR answers are required at alpha release
3536
# List the feature gate name and the components for which it must be enabled

0 commit comments

Comments
 (0)