Skip to content

Commit 00b3eaf

Browse files
authored
KEP-1669: promote ProxyTerminatingEndpoints to Beta (kubernetes#3505)
* KEP-1669: promote ProxyTerminatingEndpoints to Beta Signed-off-by: Andrew Sy Kim <[email protected]> * KEP-1669: add note that upgrade/downgrade testing should be done before promotion to Beta Signed-off-by: Andrew Sy Kim <[email protected]> * KEP-1669: add more details about metric sync_proxy_rules_no_local_endpoints_total and fix typo Signed-off-by: Andrew Sy Kim <[email protected]> * KEP-1669: use the new test plan format, including links to existing tests Signed-off-by: Andrew Sy Kim <[email protected]> * KEP-1669: answer PRR question 'What steps should be taken if SLOs are not being met to determine the problem?' Signed-off-by: Andrew Sy Kim <[email protected]> Signed-off-by: Andrew Sy Kim <[email protected]>
1 parent e0f2894 commit 00b3eaf

File tree

3 files changed

+48
-19
lines changed

3 files changed

+48
-19
lines changed
Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,5 @@
11
kep-number: 1669
22
alpha:
33
approver: "@wojtek-t"
4+
beta:
5+
approver: "@wojtek-t"

keps/sig-network/1669-proxy-terminating-endpoints/README.md

Lines changed: 43 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -19,8 +19,10 @@
1919
- [Additions to EndpointSlice](#additions-to-endpointslice)
2020
- [kube-proxy](#kube-proxy)
2121
- [Test Plan](#test-plan)
22-
- [Unit Tests](#unit-tests)
23-
- [E2E Tests](#e2e-tests)
22+
- [Prerequisite testing updates](#prerequisite-testing-updates)
23+
- [Unit tests](#unit-tests)
24+
- [Integration tests](#integration-tests)
25+
- [e2e tests](#e2e-tests)
2426
- [Graduation Criteria](#graduation-criteria)
2527
- [Alpha](#alpha)
2628
- [Beta](#beta)
@@ -156,21 +158,38 @@ In addition, kube-proxy's node port health check should fail if there are only `
156158

157159
### Test Plan
158160

159-
#### Unit Tests
161+
[X] I/we understand the owners of the involved components may require updates to
162+
existing tests to make this code solid enough prior to committing the changes necessary
163+
to implement this enhancement.
160164

161-
kube-proxy unit tests:
165+
##### Prerequisite testing updates
162166

163-
* Unit tests will validate the correct behavior when there are only local terminating endpoints.
164-
* Unit tests will validate the changein behavior against the matrix of possible Service configurations using both internalTrafficPolicy and externalTrafficPolicy.
165-
* Existing unit tests will validate that terminating endpoints are only used when there are no ready endpoints, otherwise ready && !terminating endpoints are used.
166-
* Unit tests will validate health check node port succeeds only when there are ready && !terminating endpoints.
167+
##### Unit tests
167168

168-
#### E2E Tests
169+
- `pkg/proxy`: `07/2021` - Validating behavior in iptables and ipvs proxier. Also tests feature gate enablement.
170+
- `pkg/proxy`: `03/2022` - All tests updated to cover all traffic policies (not just Local)
171+
172+
Links to added tests:
173+
- https://github.com/kubernetes/kubernetes/blob/d436f5d0b7eb87f78eb31c12466e2591c24eef59/pkg/proxy/iptables/proxier_test.go#L5373
174+
- https://github.com/kubernetes/kubernetes/blob/d436f5d0b7eb87f78eb31c12466e2591c24eef59/pkg/proxy/iptables/proxier_test.go#L6158
175+
- https://github.com/kubernetes/kubernetes/blob/6e9845f766e4d34620835aaa1e5f864211471a50/pkg/proxy/ipvs/proxier_test.go#L4964
176+
- https://github.com/kubernetes/kubernetes/blob/6e9845f766e4d34620835aaa1e5f864211471a50/pkg/proxy/ipvs/proxier_test.go#L5316
177+
- https://github.com/kubernetes/kubernetes/blob/f2e5c16545027fbe04cc33d4ef59cd01de6b9967/pkg/proxy/topology_test.go#L48
178+
179+
##### Integration tests
180+
181+
N/A
182+
183+
##### e2e tests
169184

170185
E2E tests will be added to validate that no traffic is dropped during a rolling update for a Service. E2E tests should cover all permutations of externalTrafficPolicy
171186
and internalTrafficPolicy.
172187

173-
All existing E2E tests for Services should continue to pass.
188+
- E2E test validating health check node port behavior: https://github.com/kubernetes/kubernetes/blob/4bc1398c0834a63370952702eef24d5e74c736f6/test/e2e/network/service.go#L2790
189+
- E2E test validating fallback behavior for terminating endpoints when `externalTrafficPolicy: Cluster`: https://github.com/kubernetes/kubernetes/blob/4bc1398c0834a63370952702eef24d5e74c736f6/test/e2e/network/service.go#L3060
190+
- E2E test validating fallback behavior for terminating endpoints when `externalTrafficPolicy: Local`: https://github.com/kubernetes/kubernetes/blob/4bc1398c0834a63370952702eef24d5e74c736f6/test/e2e/network/service.go#L3145
191+
- E2E test validating fallback behaviro for terminating endpoints when `internalTrafficPolicy: Cluster`: https://github.com/kubernetes/kubernetes/blob/4bc1398c0834a63370952702eef24d5e74c736f6/test/e2e/network/service.go#L2889
192+
- E2E test validating fallback behaviro for terminating endpoints when `internalTrafficPolicy: Local`: https://github.com/kubernetes/kubernetes/blob/4bc1398c0834a63370952702eef24d5e74c736f6/test/e2e/network/service.go#L2972
174193

175194
### Graduation Criteria
176195

@@ -179,12 +198,13 @@ All existing E2E tests for Services should continue to pass.
179198
* kube-proxy internally tracks the `terminating` and `serving` condition from EndpointSlice
180199
* kube-proxy falls back to terminating endpoints if and only if they are the only available endpoints.
181200
* feature is only enabled if the feature gate `ProxyTerminatingEndpoints` is on.
182-
* unit tests in kube-proxy.
201+
* unit tests in kube-proxy (see [Test Plan](#test-plan) section)
183202

184203
#### Beta
185204

186-
* E2E tests are in place, exercising all permutations of internalTrafficPolicy and externalTrafficPolicy.
205+
* E2E tests are in place, exercising all permutations of internalTrafficPolicy and externalTrafficPolicy (see [Test Plan](#test-plan) section)
187206
* Metrics to publish how many Services/Endpoints are routing traffic to terminating endpoints.
207+
* Manual or automated rollback testing (see [Test Plan](#test-plan) section)
188208

189209
### Upgrade / Downgrade Strategy
190210

@@ -246,13 +266,14 @@ When the rollout happens, workloads may unexpectedly receive traffic when termin
246266

247267
###### What specific metrics should inform a rollback?
248268

249-
There will be metrics added to publish how many Services/Endpoints are routing to terminating pods. It may be expected that clusters
250-
route to many terminating pods at once, especially during rolling updates, but users can correlate this metric with other factors to
251-
gauge if a rollback is necessary.
269+
`sync_proxy_rules_no_local_endpoints_total` can be used to inform rollback in scenarios where Services are dropping traffic to local endpoints.
270+
If this metric increases dramatically (especially when there are no rollouts happening), it could mean there is a programming error in kube-proxy.
271+
In general, we expect this metric to decrease during roll outs when this feature is enabled since nodes that only have terminating endpoints should
272+
no longer be included in this metric.
252273

253274
###### Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested?
254275

255-
Upgrade->downgrade->upgrade path has not been tested yet. We may want to require this for beta or GA.
276+
Upgrade->downgrade->upgrade testing (manual or automated) will be required for Beta. If tested manually, the steps will be documented in this KEP.
256277

257278
###### Is the rollout accompanied by any deprecations and/or removals of features, APIs, fields of API types, flags, etc.?
258279

@@ -269,7 +290,7 @@ regardless of their termination state. If this is undesired, workloads should be
269290
###### What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service?
270291

271292
- [X] Metrics
272-
- Metric name: TBD
293+
- Metric name: `sync_proxy_rules_no_local_endpoints_total`
273294
- [Optional] Aggregation method:
274295
- Components exposing the metric:
275296
- kube-proxy
@@ -360,10 +381,15 @@ For each of them, fill in the following information by copying the below templat
360381

361382
###### What steps should be taken if SLOs are not being met to determine the problem?
362383

384+
It is highly recommended that all workloads have a readiness probe configured and handles termination signals from the kubelet appropriately.
385+
As a last resort, the EndpointSlice resource and proxy rules from kube-proxy can be examined to determine why traffic may not be routing correctly
386+
to terminating endpoints on a specific node.
387+
363388
## Implementation History
364389

365390
- [x] 2020-04-23: KEP accepted as implementable for v1.19
366391
- [x] 2021-01-21: KEP scope expanded to include both internal and external traffic.
392+
- [x] 1.24: implementation updated to handle all types of traffic policies.
367393

368394
## Drawbacks
369395

keps/sig-network/1669-proxy-terminating-endpoints/kep.yaml

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -19,16 +19,17 @@ see-also:
1919
- https://github.com/kubernetes/kubernetes/issues/85643
2020

2121
# The target maturity stage in the current dev cycle for this KEP.
22-
stage: alpha
22+
stage: beta
2323

2424
# The most recent milestone for which work toward delivery of this KEP has been
2525
# done. This can be the current (upcoming) milestone, if it is being actively
2626
# worked on.
27-
latest-milestone: "v1.24"
27+
latest-milestone: "v1.26"
2828

2929
# The milestone at which this feature was, or is targeted to be, at each stage.
3030
milestone:
3131
alpha: "v1.22"
32+
beta: "v1.26"
3233

3334
# The following PRR answers are required at alpha release
3435
# List the feature gate name and the components for which it must be enabled

0 commit comments

Comments
 (0)