Skip to content

Commit 33d7836

Browse files
authored
Merge pull request kubernetes#2031 from robscott/endpointslice-subsetting
KEP 2030: EndpointSlice Subsetting and Selection
2 parents c336b02 + 73ea6b2 commit 33d7836

File tree

2 files changed

+309
-0
lines changed

2 files changed

+309
-0
lines changed
Lines changed: 268 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,268 @@
1+
# KEP-2030: EndpointSlice Subsetting and Selection
2+
3+
<!-- toc -->
4+
- [Release Signoff Checklist](#release-signoff-checklist)
5+
- [Summary](#summary)
6+
- [Motivation](#motivation)
7+
- [Goals](#goals)
8+
- [Non-Goals](#non-goals)
9+
- [Proposal](#proposal)
10+
- [Risks and Mitigations](#risks-and-mitigations)
11+
- [Design Details](#design-details)
12+
- [API Consumption](#api-consumption)
13+
- [Select EndpointSlices Without Labels](#select-endpointslices-without-labels)
14+
- [Select EndpointSlices With Matching Zone](#select-endpointslices-with-matching-zone)
15+
- [Select EndpointSlices With Matching Region](#select-endpointslices-with-matching-region)
16+
- [Kube-Proxy](#kube-proxy)
17+
- [Controller Implementation](#controller-implementation)
18+
- [Backwards Compatibility](#backwards-compatibility)
19+
- [Test Plan](#test-plan)
20+
- [Unit Tests](#unit-tests)
21+
- [Graduation Criteria](#graduation-criteria)
22+
- [Alpha Release](#alpha-release)
23+
- [Alpha -&gt; Beta Graduation](#alpha---beta-graduation)
24+
- [Upgrade / Downgrade Strategy](#upgrade--downgrade-strategy)
25+
- [Version Skew Strategy](#version-skew-strategy)
26+
- [Implementation History](#implementation-history)
27+
- [Drawbacks](#drawbacks)
28+
- [Alternatives](#alternatives)
29+
<!-- /toc -->
30+
31+
## Release Signoff Checklist
32+
33+
Items marked with (R) are required *prior to targeting to a milestone / release*.
34+
35+
- [ ] (R) Enhancement issue in release milestone, which links to KEP dir in [kubernetes/enhancements] (not the initial KEP PR)
36+
- [ ] (R) KEP approvers have approved the KEP status as `implementable`
37+
- [ ] (R) Design details are appropriately documented
38+
- [ ] (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input
39+
- [ ] (R) Graduation criteria is in place
40+
- [ ] (R) Production readiness review completed
41+
- [ ] Production readiness review approved
42+
- [ ] "Implementation History" section is up-to-date for milestone
43+
- [ ] User-facing documentation has been created in [kubernetes/website], for publication to [kubernetes.io]
44+
- [ ] Supporting documentation—e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes
45+
46+
[kubernetes.io]: https://kubernetes.io/
47+
[kubernetes/enhancements]: https://git.k8s.io/enhancements
48+
[kubernetes/kubernetes]: https://git.k8s.io/kubernetes
49+
[kubernetes/website]: https://git.k8s.io/website
50+
51+
## Summary
52+
53+
Segment EndpointSlices into logical chunks enabling API consumers like
54+
kube-proxy to watch only a subset of all EndpointSlices. This provides a natural
55+
starting point for topology aware routing and dramatically increases
56+
scalability.
57+
58+
## Motivation
59+
60+
There has been some discussion that the Alpha Service Topology API requires the
61+
user to do too much. Users generally want the same behavior for all services
62+
where topology is concerned: avoid cross-zone traffic when in-zone endpoints are
63+
available and have enough capacity. This problem is exacerbated by the
64+
introduction of Multi-Cluster services where cross-region services become a
65+
reality and some regions may be more desirable than others as failover
66+
locations.
67+
68+
As clusters and services grow, we’re also seeing that the current proxy
69+
implementations are stretched to their limits as all endpoints for the entire
70+
cluster are tracked independently by each node.
71+
72+
The best user experience seems like it would be to have the platform - with a
73+
first class controller and/or a provider specific implementation - intelligently
74+
prioritize endpoints based on topology, capacity, and any other useful cost
75+
metrics to aid in traffic shaping, or just to reduce the amount of global
76+
resource tracking required by every node. However, we are currently missing a
77+
way to allow endpoints to be targeted to specific subsets of nodes.
78+
79+
### Goals
80+
81+
- Provide the building blocks to allow EndpointSlices to target specific subsets
82+
of nodes.
83+
- EndpointSlices subsetting will be fully backwards compatible for older
84+
consumers of the EndpointSlice API.
85+
- Design is flexible enough for multiple implementations and experimentation.
86+
- Minimal duplication of data.
87+
- Room for future enhancements, for example weighted endpoints or Slices.
88+
89+
### Non-Goals
90+
91+
- Define how subsetting should be used.
92+
- Design the controller responsible for subsetting endpoints.
93+
- An API for telling controllers how a service should be subsetted.
94+
95+
Many of these are being tackled by the follow up [KEP
96+
#2004](https://github.com/kubernetes/enhancements/issues/2004).
97+
98+
## Proposal
99+
100+
Two new topology based labels will be introduced for EndpointSlices to support
101+
subsetting:
102+
103+
```
104+
endpointslice.kubernetes.io/for-zone
105+
endpointslice.kubernetes.io/for-region
106+
```
107+
108+
In the future this pattern may be expanded to include other concepts or
109+
topologies. A simple pattern like this will allow EndpointSlices to be delivered
110+
to consumers in a specific zone or region.
111+
112+
113+
### Risks and Mitigations
114+
115+
This approach does not allow a single EndpointSlice to target multiple zones or
116+
regions. Any approach that enabled that would be significantly more complicated.
117+
The initial proposal in [KEP
118+
#2004](https://github.com/kubernetes/enhancements/issues/2004) suggests that
119+
this won't be an issue.
120+
121+
## Design Details
122+
123+
### API Consumption
124+
125+
EndpointSlice API consumers can use the following selectors to consume
126+
EndpointSlices:
127+
128+
#### Select EndpointSlices Without Labels
129+
This is required to support producers of EndpointSlices that don't set these
130+
labels, including older versions of the EndpointSlice controller.
131+
132+
```
133+
matchExpressions:
134+
- {key: endpointslice.kubernetes.io/for-zone, operator: DoesNotExist}
135+
- {key: endpointslice.kubernetes.io/for-region, operator: DoesNotExist}
136+
```
137+
138+
#### Select EndpointSlices With Matching Zone
139+
```
140+
matchLabels:
141+
endpointslice.kubernetes.io/for-zone: example-zone
142+
```
143+
144+
#### Select EndpointSlices With Matching Region
145+
```
146+
matchLabels:
147+
endpointslice.kubernetes.io/for-region: example-region
148+
```
149+
150+
#### Kube-Proxy
151+
When the `EndpointSliceSubsetting` feature gate is set to true on `Kube-Proxy`,
152+
it will use these selectors to filter EndpointSlices.
153+
154+
### Controller Implementation
155+
Although a controller implementation is out of scope for this KEP, it is worth
156+
discussing what that might look like. For reference, [KEP
157+
#2004](https://github.com/kubernetes/enhancements/issues/2004) discusses how this
158+
could be implemented for the EndpointSlice controller. That proposal involves 3
159+
potential approaches - Original, PreferZone, and RequireZone.
160+
161+
None of the proposed approaches would involve data duplication. Each
162+
Pod/endpoint would continue to live in a single EndpointSlice. The reason they
163+
might end up with more EndpointSlices would be less efficient packing. Here's
164+
the number of EndpointSlices that would result based on the number of endpoints
165+
a Service has in a 3 zone cluster. In each case, the numbers in parentheses
166+
represent how many endpoints would exist in each slice.
167+
168+
| # endpoints | Original # slices | PreferZone # slices | RequireZone # slices |
169+
|-|-|-|-|
170+
| 6 | 1 (6) | 1 (6) | 3 (2) |
171+
| 90 | 1 (90) | 3 (30) | 3 (30) |
172+
| 270 | 3 (90) | 3 (90) | 3 (90) |
173+
174+
The RequireZone approach requires at least one EndpointSlice per zone per
175+
Service. The PreferZone also has the same requirement unless the minimum
176+
threshold has been met. Before that threshold, a single shared EndpointSlice (no
177+
additional labels) is used. There's some padding involved here to make sure
178+
we're not flapping back and forth between these states.
179+
180+
With this approach, EndpointSlices can be delivered everywhere (no additional
181+
labels), or to a zone (for-zone), or to a region (for-region). None of the
182+
proposed approaches involve a single Service having separate sets of
183+
EndpointSlices for each use case. As defined by [KEP
184+
1659](https://github.com/kubernetes/enhancements/tree/master/keps/sig-architecture/1659-standard-topology-labels),
185+
"region" and "zone" are strictly hierarchical ("zones" are subsets of "regions")
186+
and zone names are unique across regions.
187+
188+
### Backwards Compatibility
189+
We don't need create EndpointSlices without labels for backwards compatibility,
190+
we just need to ensure that consumers always support consuming EndpointSlices
191+
without these labels. Even if we updated the EndpointSlice controller to
192+
consistently label these EndpointSlices with `for-zone` or `for-region`, we
193+
couldn't guarantee that other producers would.
194+
195+
There's nothing in any current consumer implementation that would break if
196+
additional labels like `for-zone` or `for-region` were added to EndpointSlices.
197+
All consumers will need to care about for this or the original approach is the
198+
`kubernetes.io/service-name` label. If they want to support subsetting, they can
199+
update their selectors as described in this KEP, but subsetting won't actually
200+
break any existing functionality.
201+
202+
## Test Plan
203+
This KEP is quite small in scope. The only new functionality being added will be
204+
an adjustment to the EndpointSlices kube-proxy consumes when a feature gate is
205+
enabled. We will need to add more test coverage for when this feature is enabled
206+
or disabled.
207+
208+
### Unit Tests
209+
* Ensure kube-proxy will continue to consume all EndpointSlices when this
210+
feature is disabled.
211+
* Ensure EndpointSlices delivered to a specific zone will be consumed by
212+
kube-proxy running in the same zone when this feature is enabled.
213+
* Ensure EndpointSlices delivered to a specific zone will not be consumed by
214+
kube-proxy running in a different zone when this feature is enabled.
215+
* Ensure EndpointSlices delivered to a specific region will be consumed by
216+
kube-proxy running in the same region when this feature is enabled.
217+
* Ensure EndpointSlices delivered to a specific region will not be consumed by
218+
kube-proxy running in a different region when this feature is enabled.
219+
220+
## Graduation Criteria
221+
222+
### Alpha Release
223+
224+
- Proposed labels are added as well known labels in Discovery API types.
225+
- Implement new selectors in kube-proxy.
226+
- Implement test plan.
227+
228+
### Alpha -> Beta Graduation
229+
230+
- EndpointSlice controller supports publishing EndpointSlices in subsets. (See
231+
[KEP 2004](https://github.com/kubernetes/enhancements/issues/2004) for more
232+
info).
233+
234+
## Upgrade / Downgrade Strategy
235+
236+
This functionality will be guarded by the `EndpointSliceSubsetting` feature gate
237+
on kube-proxy. This will be fully backwards compatible and will only make a
238+
difference in a cluster if EndpointSlices are being published with the labels
239+
described in this KEP.
240+
241+
## Version Skew Strategy
242+
243+
This is designed with backwards compatibility in mind. Enabling this feature is
244+
not reliant on any other feature being enabled in any other release. [KEP
245+
#2004](https://github.com/kubernetes/enhancements/issues/2004) will be dependent
246+
on this KEP though.
247+
248+
## Implementation History
249+
250+
September 2020: Initial Proposal Submitted
251+
252+
## Drawbacks
253+
254+
Although an optional feature, this adds more complexity to the consumption of
255+
EndpointSlices for anyone that wants to support the feature.
256+
257+
## Alternatives
258+
259+
An alternative would be to use an approach that would allow delivery to multiple
260+
zones. With labels, this would require including the zone name in the label key:
261+
262+
```
263+
endpointslice.kubernetes.io/for-zone-a
264+
endpointslice.kubernetes.io/for-zone-b
265+
```
266+
267+
Unfortunately it would be much more difficult to build backwards compatible
268+
selectors to consume these labels.
Lines changed: 41 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,41 @@
1+
title: EndpointSlice Subsetting and Selection
2+
kep-number: 2030
3+
authors:
4+
- "@robscott"
5+
- "@jeremyot"
6+
owning-sig: sig-network
7+
participating-sigs:
8+
- sig-multicluster
9+
- sig-scalability
10+
status: implementable
11+
creation-date: 2020-09-29
12+
reviewers:
13+
- "@andrewsykim"
14+
- "@bowei"
15+
- "@danwinship"
16+
- "@thockin"
17+
- "@wojtek-t"
18+
approvers:
19+
- "@thockin"
20+
21+
# The target maturity stage in the current dev cycle for this KEP.
22+
stage: alpha
23+
24+
# The most recent milestone for which work toward delivery of this KEP has been
25+
# done. This can be the current (upcoming) milestone, if it is being actively
26+
# worked on.
27+
latest-milestone: "v1.20"
28+
29+
# The milestone at which this feature was, or is targeted to be, at each stage.
30+
milestone:
31+
alpha: "v1.20"
32+
beta: "v1.21"
33+
stable: "v1.23"
34+
35+
# The following PRR answers are required at alpha release
36+
# List the feature gate name and the components for which it must be enabled
37+
feature-gates:
38+
- name: EndpointSliceSubsetting
39+
components:
40+
- kube-proxy
41+
disable-supported: true

0 commit comments

Comments
 (0)