Skip to content

Commit 49c9788

Browse files
committed
KEP 1669: Graceful Termination for Local External Traffic Policy
Signed-off-by: Andrew Sy Kim <[email protected]>
1 parent b6ea9d6 commit 49c9788

File tree

2 files changed

+173
-0
lines changed
  • keps/sig-network/1669-graceful-termination-local-external-traffic-policy

2 files changed

+173
-0
lines changed
Lines changed: 154 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,154 @@
1+
# KEP-1669: Graceful Termination for Local External Traffic Policy
2+
3+
<!-- toc -->
4+
- [Release Signoff Checklist](#release-signoff-checklist)
5+
- [Summary](#summary)
6+
- [Motivation](#motivation)
7+
- [Goals](#goals)
8+
- [Non-Goals](#non-goals)
9+
- [Proposal](#proposal)
10+
- [User Stories (optional)](#user-stories-optional)
11+
- [Story 1](#story-1)
12+
- [Risks and Mitigations](#risks-and-mitigations)
13+
- [Design Details](#design-details)
14+
- [Additions to EndpointSlice](#additions-to-endpointslice)
15+
- [kube-proxy](#kube-proxy)
16+
- [Test Plan](#test-plan)
17+
- [Unit Tests](#unit-tests)
18+
- [E2E Tests](#e2e-tests)
19+
- [Graduation Criteria](#graduation-criteria)
20+
- [Upgrade / Downgrade Strategy](#upgrade--downgrade-strategy)
21+
- [Version Skew Strategy](#version-skew-strategy)
22+
- [Implementation History](#implementation-history)
23+
- [Drawbacks](#drawbacks)
24+
- [Alternatives](#alternatives)
25+
<!-- /toc -->
26+
27+
## Release Signoff Checklist
28+
29+
- [X] Enhancement issue in release milestone, which links to KEP dir in [kubernetes/enhancements] (not the initial KEP PR)
30+
- [ ] KEP approvers have approved the KEP status as `implementable`
31+
- [ ] Design details are appropriately documented
32+
- [ ] Test plan is in place, giving consideration to SIG Architecture and SIG Testing input
33+
- [ ] Graduation criteria is in place
34+
- [ ] "Implementation History" section is up-to-date for milestone
35+
- [ ] User-facing documentation has been created in [kubernetes/website], for publication to [kubernetes.io]
36+
- [ ] Supporting documentation e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes
37+
38+
[kubernetes.io]: https://kubernetes.io/
39+
[kubernetes/enhancements]: https://git.k8s.io/enhancements
40+
[kubernetes/kubernetes]: https://git.k8s.io/kubernetes
41+
[kubernetes/website]: https://git.k8s.io/website
42+
43+
## Summary
44+
45+
Services with externalTrafficPolicy=Local lack the ability to gracefully handle traffic from a loadbalancer when it goes from N to 0 endpoints.
46+
Since terminating pods are never considered "ready" in Endpoints/EndpointSlice, a node with only terminating endpoints would drop traffic even though
47+
it may still be part of a loadbalancer's node pool. Even with loadbalancer health checks, there is usually a delay between when the health check
48+
fails and when a node is completely decommissioned. This KEP proposes changes to gracefully handle traffic to a node that has only terminating endpoints
49+
for a Service with externalTrafficPolicy=Local.
50+
51+
## Motivation
52+
53+
### Goals
54+
55+
* enable zero downtime rolling updates for Services with ExternalTrafficPolicy=Local via nodeports/loadbalancerIPs/externalIPs.
56+
57+
### Non-Goals
58+
59+
* changing the behavior of terminating pods/endpoints outside the scope of Services with ExternalTrafficPolicy=Local via a nodeport/loadbalancerIPs/externalIPs.
60+
61+
## Proposal
62+
63+
This KEP proposes that if all endpoints for a given Service (with externalTrafficPolicy=Local) within the bounds of a node are terminating (i.e pod.DeletionTimestamp != nil),
64+
then all external traffic on this node should be sent to **ready** and **not ready** terminating endpoints, preferring the former if there are any. This ensures that traffic
65+
is not dropped between the time a node fails its health check (has 0 endpoints) and when a node is decommissioned from the loadbalancer's node pool.
66+
67+
The proposed changes in this KEP depend on KEP-1672 and the EndpointSlice API.
68+
69+
### User Stories (optional)
70+
71+
#### Story 1
72+
73+
As a user I would like to do a rolling update of a Deployment fronted by a Service Type=LoadBalancer with ExternalTrafficPolicy=Local.
74+
If a node that has only 1 pod of said deployment goes into the `Terminating` state, all traffic to that node is dropped until either a new pod
75+
comes up or my cloud provider removes the node from the loadbalancer's node pool. Ideally the terminating pod should gracefully handle traffic to this node
76+
until either one of the conditions are satisfied.
77+
78+
### Risks and Mitigations
79+
80+
There are scalability implications to tracking termination state in EndpointSlice. For now we are assuming that the performance trade-offs are worthwhile but
81+
future testing may change this decision. See KEP 1672 for more details.
82+
83+
## Design Details
84+
85+
### Additions to EndpointSlice
86+
87+
This work depends on the `Terminating` condition existing on the EndpointSlice API (see KEP 1672) in order to check the termination state of an endpoint.
88+
89+
### kube-proxy
90+
91+
Updates to kube-proxy when watching EndpointSlice:
92+
* update kube-proxy endpoints info to track terminating endpoints based on endpoint.condition.terminating in EndpointSlice.
93+
* update kube-proxy endpoints info to track endpoint readiness based on endpoint.condition.ready in EndpointSlice
94+
* if externalTrafficPolicy=Local, record all local endpoints that are ready && terminating and endpoints that are !ready && terminating. When there are no local ready endpoints, fall back in the preferred order:
95+
* local ready & terminating endpoints
96+
* local not ready & terminating endpoints
97+
* blackhole traffic
98+
* for all other traffic (i.e. externalTrafficPolicy=Cluster), preserve existing behavior where traffic is only sent to ready && !terminating endpoints.
99+
100+
In addition, kube-proxy's node port health check should fail if there are only `Terminating` endpoints, regardless of their readiness in order to:
101+
* remove the node from a loadbalancer's node pool as quickly as possible
102+
* gracefully handle any new connections that arrive before the loadbalancer is able to remove the node
103+
* allow existing connections to gracefully terminate
104+
105+
### Test Plan
106+
107+
#### Unit Tests
108+
109+
kube-proxy unit tests:
110+
111+
* Unit tests will validate the correct behavior when there are only local terminating endpoints.
112+
* Unit tests will validate the new change in behavior only applies for Services with ExternalTrafficPolicy=Local via nodeports/loadbalancerIPs/externalIPs.
113+
* Existing unit tests will validate that terminating endpoints are only used when there are no ready endpoints AND externalTrafficPolicy=Local, otherwise ready && !terminating endpoints are used.
114+
* Unit tests will validate health check node port succeeds only when there are ready && !terminating endpoints.
115+
116+
#### E2E Tests
117+
118+
E2E tests will be added to validate that no traffic is dropped during a rolling update for a Service with ExternalTrafficPolicy=Local.
119+
This test may be marked "Flaky" as the behavior is largely also dependant on the cloud provider's loadbalancer.
120+
121+
All existing E2E tests for Services should continue to pass.
122+
123+
### Graduation Criteria
124+
125+
The graduation criteria of this KEP will largely depend on the graduation status of the EndpointSlice API. Once the `terminating` field is added to EndpointSlice API,
126+
this change in behavior will kick-in as soon as kube-proxy consumes EndpointSlice.
127+
128+
### Upgrade / Downgrade Strategy
129+
130+
Behavioral changes to terminating endpoints will apply once kube-proxy is upgraded to v1.19 and the `EndpointSlice`/`EndpointSliceProxying` feature gates are enabled.
131+
On downgrade, the worse case scenario is that kube-proxy falls back to the existing behavior. See [Version Skew Strategy](#version-skew-strategy) below.
132+
133+
### Version Skew Strategy
134+
135+
The worse case version skew scenario is that kube-proxy falls back to the existing behavior today where traffic does not fall back to terminating endpoints.
136+
This would either happen if a version of the control plane was not aware of the additions to EndpointSlice or if the version of kube-proxy did not know to consume the additions to EndpointSlice.
137+
138+
There's not much risk involved as the worse case scenario is falling back to existing behavior.
139+
140+
## Implementation History
141+
142+
- [x] 2020-04-23: KEP accepted as implementable for v1.19
143+
144+
## Drawbacks
145+
146+
* scalability: this KEP (and KEP 1672) would add more writes per endpoint to EndpointSlice as each terminating endpoint adds at least 1 and at
147+
most 2 additional writes - 1 write for marking an endpoint as "terminating" and another if an endpoint changes it's readiness during termination.
148+
* complexity: an additional corner case is added to kube-proxy adding to it's complexity.
149+
150+
## Alternatives
151+
152+
Some users work around this issue today by adding a preStop hook that sleeps for some duration. Though this may work in some scenarios, better handling from kube-proxy
153+
would alleviate the need for this work around altogether.
154+
Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,19 @@
1+
---
2+
title: Graceful Termination for Local External Traffic Policy
3+
authors:
4+
- "@andrewsykim"
5+
owning-sig: sig-network
6+
participating-sigs:
7+
- sig-scalability
8+
reviewers:
9+
- "@thockin"
10+
- "@wojtek-t"
11+
- "@smarterclayton"
12+
approvers:
13+
- "@thockin"
14+
creation-date: 2020-04-07
15+
last-updated: 2020-04-07
16+
status: implementable
17+
see-also:
18+
- "/keps/sig-network/1672-tracking-terminating-endpoints/README.md"
19+
- https://github.com/kubernetes/kubernetes/issues/85643

0 commit comments

Comments
 (0)