Skip to content

Commit d1e0d0e

Browse files
committed
KEP 1672: Track Terminating Endpoints in EndpointSlice API
Signed-off-by: Andrew Sy Kim <[email protected]>
1 parent 49c9788 commit d1e0d0e

File tree

2 files changed

+172
-0
lines changed

2 files changed

+172
-0
lines changed
Lines changed: 152 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,152 @@
1+
# KEP-1672: Tracking Terminating Endpoints in the EndpointSlice API
2+
3+
<!-- toc -->
4+
- [Release Signoff Checklist](#release-signoff-checklist)
5+
- [Summary](#summary)
6+
- [Motivation](#motivation)
7+
- [Goals](#goals)
8+
- [Non-Goals](#non-goals)
9+
- [Proposal](#proposal)
10+
- [User Stories (optional)](#user-stories-optional)
11+
- [Story 1](#story-1)
12+
- [Notes/Constraints/Caveats (optional)](#notesconstraintscaveats-optional)
13+
- [Risks and Mitigations](#risks-and-mitigations)
14+
- [Design Details](#design-details)
15+
- [Test Plan](#test-plan)
16+
- [Graduation Criteria](#graduation-criteria)
17+
- [Upgrade / Downgrade Strategy](#upgrade--downgrade-strategy)
18+
- [Version Skew Strategy](#version-skew-strategy)
19+
- [Implementation History](#implementation-history)
20+
- [Drawbacks](#drawbacks)
21+
<!-- /toc -->
22+
23+
## Release Signoff Checklist
24+
25+
- [X] Enhancement issue in release milestone, which links to KEP dir in [kubernetes/enhancements] (not the initial KEP PR)
26+
- [ ] KEP approvers have approved the KEP status as `implementable`
27+
- [ ] Design details are appropriately documented
28+
- [ ] Test plan is in place, giving consideration to SIG Architecture and SIG Testing input
29+
- [ ] Graduation criteria is in place
30+
- [ ] "Implementation History" section is up-to-date for milestone
31+
- [ ] User-facing documentation has been created in [kubernetes/website], for publication to [kubernetes.io]
32+
- [ ] Supporting documentation e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes
33+
34+
[kubernetes.io]: https://kubernetes.io/
35+
[kubernetes/enhancements]: https://git.k8s.io/enhancements
36+
[kubernetes/kubernetes]: https://git.k8s.io/kubernetes
37+
[kubernetes/website]: https://git.k8s.io/website
38+
39+
## Summary
40+
41+
Today, terminating endpoints are considered "not ready" regardless of their actual readiness.
42+
Before any work is done in improving how terminating endpoints are handled, there must be a way
43+
to track whether an endpoint is terminating without having to watch the associated pods. This
44+
KEP proposes a means to track the terminating state of an endpoint via the EndpointSlice API.
45+
This would enable consumers of the API to make smarter decisions when it comes to handling
46+
terminating endpoints (see KEP-1669 as an example).
47+
48+
## Motivation
49+
50+
### Goals
51+
52+
* Provide a mechanism to track whether an endpoint is terminating by only watching the EndpointSlice API.
53+
54+
### Non-Goals
55+
56+
* Consumption of the new API field is out of scope for this KEP but future KEPs will leverage
57+
the work done here to improve graceful terminination of pods in certain scenarios (see issue [85643](https://github.com/kubernetes/kubernetes/issues/85643))
58+
59+
## Proposal
60+
61+
This KEP proposes to keep "terminating" pods in the set of endpoints in EndpointSlice with
62+
additions to the API to indicate whether a given endpoint is terminating or not. If consumers
63+
of the API (e.g. kube-proxy) are required to treat terminating endpoints differently, they
64+
may do so by checking this condition.
65+
66+
The criteria for a ready endpoint (pod phase + readiness probe) will not change based on the
67+
terminating state of pods, but consumers of the API may choose to prefer endpoints that are both ready and not terminating.
68+
69+
### User Stories (optional)
70+
71+
#### Story 1
72+
73+
A consumer of the EndpointSlice API (e.g. kube-proxy) may want to know which endpoints are
74+
terminating without having to watch Pods directly for scalability reasons.
75+
76+
One example would be the IPVS proxier which should set the weight of an endpoint to 0
77+
during termination and finally remove the real server when the endpoint is removed.
78+
Without knowing when a pod is done terminating, the IPVS proxy makes a best-effort guess
79+
at when the pod is terminated by looking at the connection tracking table.
80+
81+
### Notes/Constraints/Caveats (optional)
82+
83+
### Risks and Mitigations
84+
85+
Tracking the terminating state of endpoints poses some scalability concerns as each
86+
terminating endpoint adds additional writes to the API. Today, a terminating pod
87+
results in 1 write in Endpoints (removing the endpoint). With the proposed changes,
88+
each terminating endpoint could result in at least 2 writes (ready -> terminating -> removed)
89+
and possibly more depending on how many times readiness changes during termination.
90+
91+
## Design Details
92+
93+
To track whether an endpoint is terminating, a `terminating` field would be added as part of
94+
the `EndpointCondition` type in the EndpointSlice API.
95+
96+
```go
97+
// EndpointConditions represents the current condition of an endpoint.
98+
type EndpointConditions struct {
99+
// ready indicates that this endpoint is prepared to receive traffic,
100+
// according to whatever system is managing the endpoint. A nil value
101+
// indicates an unknown state. In most cases consumers should interpret this
102+
// unknown state as ready.
103+
// +optional
104+
Ready *bool `json:"ready,omitempty" protobuf:"bytes,1,name=ready"`
105+
106+
// terminating indicates if this endpoint is terminating. Consumers should assume a
107+
// nil value indicates the endpoint is not terminating.
108+
// +optional
109+
Terminating *bool `json:"terminating,omitempty" protobuf:"bytes,2,name=terminating"`
110+
}
111+
```
112+
113+
NOTE: A nil value for `Terminating` indicates that the endpoint is not terminating.
114+
115+
Updates to endpointslice controller:
116+
* include pods with a deletion timestamp in endpointslice
117+
* any pod with a deletion timestamp will have condition.terminating = true
118+
* allow endpoint ready condition to change during termination
119+
120+
### Test Plan
121+
122+
endpointslice controller unit tests:
123+
* Unit tests will validate pods with a deletion timestamp are included with condition.teriminating = true
124+
* Unit tests will validate that the ready condition can change for terminating endpoints
125+
126+
There will be no e2e tests since consumption of this new API is out-of-scope for this KEP.
127+
Any future KEP that consumes this API should have e2e tests to ensure behavior for terminating
128+
endpoints is correct.
129+
130+
### Graduation Criteria
131+
132+
Since this is an addition to the EndpointSlice API, graduation will follow the graduation
133+
timeline for the [EndpointSlice API work](/keps/sig-network/20190603-endpointslices/README.md).
134+
135+
### Upgrade / Downgrade Strategy
136+
137+
Since this is an addition to the EndpointSlice API, the upgrade/downgrade strategy will follow that
138+
of the [EndpointSlice API work](/keps/sig-network/20190603-endpointslices/README.md).
139+
140+
### Version Skew Strategy
141+
142+
Since this is an addition to the EndpointSlice API, the version skew strategy will follow that
143+
of the [EndpointSlice API work](/keps/sig-network/20190603-endpointslices/README.md).
144+
145+
## Implementation History
146+
147+
- [x] 2020-04-23: KEP accepted as implementable for v1.19
148+
149+
## Drawbacks
150+
151+
There are some scalability draw backs as tracking terminating endpoints requires at least 1 additional write per endpoint.
152+
Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
title: Tracking Terminating Endpoints in EndpointSlice
2+
kep-number: 1672
3+
authors:
4+
- "@andrewsykim"
5+
owning-sig: sig-network
6+
participating-sigs:
7+
- sig-scalability
8+
status: implementable
9+
creation-date: 2020-04-07
10+
reviewers:
11+
- "@thockin"
12+
- "@robscott"
13+
- "@freehan"
14+
- "@smarterclayton"
15+
- "@wojtek-t"
16+
approvers:
17+
- "@thockin"
18+
see-also:
19+
- /kep/sig-network/20190603-EndpointSlice-API.md
20+
replaces: []

0 commit comments

Comments
 (0)