|
| 1 | +# KEP-2030: EndpointSlice Subsetting and Selection |
| 2 | + |
| 3 | +<!-- toc --> |
| 4 | +- [Release Signoff Checklist](#release-signoff-checklist) |
| 5 | +- [Summary](#summary) |
| 6 | +- [Motivation](#motivation) |
| 7 | + - [Goals](#goals) |
| 8 | + - [Non-Goals](#non-goals) |
| 9 | +- [Proposal](#proposal) |
| 10 | + - [Risks and Mitigations](#risks-and-mitigations) |
| 11 | +- [Design Details](#design-details) |
| 12 | + - [API Consumption](#api-consumption) |
| 13 | + - [Select EndpointSlices Without Labels](#select-endpointslices-without-labels) |
| 14 | + - [Select EndpointSlices With Matching Zone](#select-endpointslices-with-matching-zone) |
| 15 | + - [Select EndpointSlices With Matching Region](#select-endpointslices-with-matching-region) |
| 16 | + - [Kube-Proxy](#kube-proxy) |
| 17 | + - [Controller Implementation](#controller-implementation) |
| 18 | + - [Backwards Compatibility](#backwards-compatibility) |
| 19 | +- [Test Plan](#test-plan) |
| 20 | + - [Unit Tests](#unit-tests) |
| 21 | +- [Graduation Criteria](#graduation-criteria) |
| 22 | + - [Alpha Release](#alpha-release) |
| 23 | + - [Alpha -> Beta Graduation](#alpha---beta-graduation) |
| 24 | +- [Upgrade / Downgrade Strategy](#upgrade--downgrade-strategy) |
| 25 | +- [Version Skew Strategy](#version-skew-strategy) |
| 26 | +- [Implementation History](#implementation-history) |
| 27 | +- [Drawbacks](#drawbacks) |
| 28 | +- [Alternatives](#alternatives) |
| 29 | +<!-- /toc --> |
| 30 | + |
| 31 | +## Release Signoff Checklist |
| 32 | + |
| 33 | +Items marked with (R) are required *prior to targeting to a milestone / release*. |
| 34 | + |
| 35 | +- [ ] (R) Enhancement issue in release milestone, which links to KEP dir in [kubernetes/enhancements] (not the initial KEP PR) |
| 36 | +- [ ] (R) KEP approvers have approved the KEP status as `implementable` |
| 37 | +- [ ] (R) Design details are appropriately documented |
| 38 | +- [ ] (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input |
| 39 | +- [ ] (R) Graduation criteria is in place |
| 40 | +- [ ] (R) Production readiness review completed |
| 41 | +- [ ] Production readiness review approved |
| 42 | +- [ ] "Implementation History" section is up-to-date for milestone |
| 43 | +- [ ] User-facing documentation has been created in [kubernetes/website], for publication to [kubernetes.io] |
| 44 | +- [ ] Supporting documentation—e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes |
| 45 | + |
| 46 | +[kubernetes.io]: https://kubernetes.io/ |
| 47 | +[kubernetes/enhancements]: https://git.k8s.io/enhancements |
| 48 | +[kubernetes/kubernetes]: https://git.k8s.io/kubernetes |
| 49 | +[kubernetes/website]: https://git.k8s.io/website |
| 50 | + |
| 51 | +## Summary |
| 52 | + |
| 53 | +Segment EndpointSlices into logical chunks enabling API consumers like |
| 54 | +kube-proxy to watch only a subset of all EndpointSlices. This provides a natural |
| 55 | +starting point for topology aware routing and dramatically increases |
| 56 | +scalability. |
| 57 | + |
| 58 | +## Motivation |
| 59 | + |
| 60 | +There has been some discussion that the Alpha Service Topology API requires the |
| 61 | +user to do too much. Users generally want the same behavior for all services |
| 62 | +where topology is concerned: avoid cross-zone traffic when in-zone endpoints are |
| 63 | +available and have enough capacity. This problem is exacerbated by the |
| 64 | +introduction of Multi-Cluster services where cross-region services become a |
| 65 | +reality and some regions may be more desirable than others as failover |
| 66 | +locations. |
| 67 | + |
| 68 | +As clusters and services grow, we’re also seeing that the current proxy |
| 69 | +implementations are stretched to their limits as all endpoints for the entire |
| 70 | +cluster are tracked independently by each node. |
| 71 | + |
| 72 | +The best user experience seems like it would be to have the platform - with a |
| 73 | +first class controller and/or a provider specific implementation - intelligently |
| 74 | +prioritize endpoints based on topology, capacity, and any other useful cost |
| 75 | +metrics to aid in traffic shaping, or just to reduce the amount of global |
| 76 | +resource tracking required by every node. However, we are currently missing a |
| 77 | +way to allow endpoints to be targeted to specific subsets of nodes. |
| 78 | + |
| 79 | +### Goals |
| 80 | + |
| 81 | +- Provide the building blocks to allow EndpointSlices to target specific subsets |
| 82 | + of nodes. |
| 83 | +- EndpointSlices subsetting will be fully backwards compatible for older |
| 84 | + consumers of the EndpointSlice API. |
| 85 | +- Design is flexible enough for multiple implementations and experimentation. |
| 86 | +- Minimal duplication of data. |
| 87 | +- Room for future enhancements, for example weighted endpoints or Slices. |
| 88 | + |
| 89 | +### Non-Goals |
| 90 | + |
| 91 | +- Define how subsetting should be used. |
| 92 | +- Design the controller responsible for subsetting endpoints. |
| 93 | +- An API for telling controllers how a service should be subsetted. |
| 94 | + |
| 95 | +Many of these are being tackled by the follow up [KEP |
| 96 | +#2004](https://github.com/kubernetes/enhancements/issues/2004). |
| 97 | + |
| 98 | +## Proposal |
| 99 | + |
| 100 | +Two new topology based labels will be introduced for EndpointSlices to support |
| 101 | +subsetting: |
| 102 | + |
| 103 | +``` |
| 104 | +endpointslice.kubernetes.io/for-zone |
| 105 | +endpointslice.kubernetes.io/for-region |
| 106 | +``` |
| 107 | + |
| 108 | +In the future this pattern may be expanded to include other concepts or |
| 109 | +topologies. A simple pattern like this will allow EndpointSlices to be delivered |
| 110 | +to consumers in a specific zone or region. |
| 111 | + |
| 112 | + |
| 113 | +### Risks and Mitigations |
| 114 | + |
| 115 | +This approach does not allow a single EndpointSlice to target multiple zones or |
| 116 | +regions. Any approach that enabled that would be significantly more complicated. |
| 117 | +The initial proposal in [KEP |
| 118 | +#2004](https://github.com/kubernetes/enhancements/issues/2004) suggests that |
| 119 | +this won't be an issue. |
| 120 | + |
| 121 | +## Design Details |
| 122 | + |
| 123 | +### API Consumption |
| 124 | + |
| 125 | +EndpointSlice API consumers can use the following selectors to consume |
| 126 | +EndpointSlices: |
| 127 | + |
| 128 | +#### Select EndpointSlices Without Labels |
| 129 | +This is required to support producers of EndpointSlices that don't set these |
| 130 | +labels, including older versions of the EndpointSlice controller. |
| 131 | + |
| 132 | +``` |
| 133 | +matchExpressions: |
| 134 | + - {key: endpointslice.kubernetes.io/for-zone, operator: DoesNotExist} |
| 135 | + - {key: endpointslice.kubernetes.io/for-region, operator: DoesNotExist} |
| 136 | +``` |
| 137 | + |
| 138 | +#### Select EndpointSlices With Matching Zone |
| 139 | +``` |
| 140 | +matchLabels: |
| 141 | + endpointslice.kubernetes.io/for-zone: example-zone |
| 142 | +``` |
| 143 | + |
| 144 | +#### Select EndpointSlices With Matching Region |
| 145 | +``` |
| 146 | +matchLabels: |
| 147 | + endpointslice.kubernetes.io/for-region: example-region |
| 148 | +``` |
| 149 | + |
| 150 | +#### Kube-Proxy |
| 151 | +When the `EndpointSliceSubsetting` feature gate is set to true on `Kube-Proxy`, |
| 152 | +it will use these selectors to filter EndpointSlices. |
| 153 | + |
| 154 | +### Controller Implementation |
| 155 | +Although a controller implementation is out of scope for this KEP, it is worth |
| 156 | +discussing what that might look like. For reference, [KEP |
| 157 | +#2004](https://github.com/kubernetes/enhancements/issues/2004) discusses how this |
| 158 | +could be implemented for the EndpointSlice controller. That proposal involves 3 |
| 159 | +potential approaches - Original, PreferZone, and RequireZone. |
| 160 | + |
| 161 | +None of the proposed approaches would involve data duplication. Each |
| 162 | +Pod/endpoint would continue to live in a single EndpointSlice. The reason they |
| 163 | +might end up with more EndpointSlices would be less efficient packing. Here's |
| 164 | +the number of EndpointSlices that would result based on the number of endpoints |
| 165 | +a Service has in a 3 zone cluster. In each case, the numbers in parentheses |
| 166 | +represent how many endpoints would exist in each slice. |
| 167 | + |
| 168 | +| # endpoints | Original # slices | PreferZone # slices | RequireZone # slices | |
| 169 | +|-|-|-|-| |
| 170 | +| 6 | 1 (6) | 1 (6) | 3 (2) | |
| 171 | +| 90 | 1 (90) | 3 (30) | 3 (30) | |
| 172 | +| 270 | 3 (90) | 3 (90) | 3 (90) | |
| 173 | + |
| 174 | +The RequireZone approach requires at least one EndpointSlice per zone per |
| 175 | +Service. The PreferZone also has the same requirement unless the minimum |
| 176 | +threshold has been met. Before that threshold, a single shared EndpointSlice (no |
| 177 | +additional labels) is used. There's some padding involved here to make sure |
| 178 | +we're not flapping back and forth between these states. |
| 179 | + |
| 180 | +With this approach, EndpointSlices can be delivered everywhere (no additional |
| 181 | +labels), or to a zone (for-zone), or to a region (for-region). None of the |
| 182 | +proposed approaches involve a single Service having separate sets of |
| 183 | +EndpointSlices for each use case. As defined by [KEP |
| 184 | +1659](https://github.com/kubernetes/enhancements/tree/master/keps/sig-architecture/1659-standard-topology-labels), |
| 185 | +"region" and "zone" are strictly hierarchical ("zones" are subsets of "regions") |
| 186 | +and zone names are unique across regions. |
| 187 | + |
| 188 | +### Backwards Compatibility |
| 189 | +We don't need create EndpointSlices without labels for backwards compatibility, |
| 190 | +we just need to ensure that consumers always support consuming EndpointSlices |
| 191 | +without these labels. Even if we updated the EndpointSlice controller to |
| 192 | +consistently label these EndpointSlices with `for-zone` or `for-region`, we |
| 193 | +couldn't guarantee that other producers would. |
| 194 | + |
| 195 | +There's nothing in any current consumer implementation that would break if |
| 196 | +additional labels like `for-zone` or `for-region` were added to EndpointSlices. |
| 197 | +All consumers will need to care about for this or the original approach is the |
| 198 | +`kubernetes.io/service-name` label. If they want to support subsetting, they can |
| 199 | +update their selectors as described in this KEP, but subsetting won't actually |
| 200 | +break any existing functionality. |
| 201 | + |
| 202 | +## Test Plan |
| 203 | +This KEP is quite small in scope. The only new functionality being added will be |
| 204 | +an adjustment to the EndpointSlices kube-proxy consumes when a feature gate is |
| 205 | +enabled. We will need to add more test coverage for when this feature is enabled |
| 206 | +or disabled. |
| 207 | + |
| 208 | +### Unit Tests |
| 209 | +* Ensure kube-proxy will continue to consume all EndpointSlices when this |
| 210 | + feature is disabled. |
| 211 | +* Ensure EndpointSlices delivered to a specific zone will be consumed by |
| 212 | + kube-proxy running in the same zone when this feature is enabled. |
| 213 | +* Ensure EndpointSlices delivered to a specific zone will not be consumed by |
| 214 | + kube-proxy running in a different zone when this feature is enabled. |
| 215 | +* Ensure EndpointSlices delivered to a specific region will be consumed by |
| 216 | + kube-proxy running in the same region when this feature is enabled. |
| 217 | +* Ensure EndpointSlices delivered to a specific region will not be consumed by |
| 218 | + kube-proxy running in a different region when this feature is enabled. |
| 219 | + |
| 220 | +## Graduation Criteria |
| 221 | + |
| 222 | +### Alpha Release |
| 223 | + |
| 224 | +- Proposed labels are added as well known labels in Discovery API types. |
| 225 | +- Implement new selectors in kube-proxy. |
| 226 | +- Implement test plan. |
| 227 | + |
| 228 | +### Alpha -> Beta Graduation |
| 229 | + |
| 230 | +- EndpointSlice controller supports publishing EndpointSlices in subsets. (See |
| 231 | + [KEP 2004](https://github.com/kubernetes/enhancements/issues/2004) for more |
| 232 | + info). |
| 233 | + |
| 234 | +## Upgrade / Downgrade Strategy |
| 235 | + |
| 236 | +This functionality will be guarded by the `EndpointSliceSubsetting` feature gate |
| 237 | +on kube-proxy. This will be fully backwards compatible and will only make a |
| 238 | +difference in a cluster if EndpointSlices are being published with the labels |
| 239 | +described in this KEP. |
| 240 | + |
| 241 | +## Version Skew Strategy |
| 242 | + |
| 243 | +This is designed with backwards compatibility in mind. Enabling this feature is |
| 244 | +not reliant on any other feature being enabled in any other release. [KEP |
| 245 | +#2004](https://github.com/kubernetes/enhancements/issues/2004) will be dependent |
| 246 | +on this KEP though. |
| 247 | + |
| 248 | +## Implementation History |
| 249 | + |
| 250 | +September 2020: Initial Proposal Submitted |
| 251 | + |
| 252 | +## Drawbacks |
| 253 | + |
| 254 | +Although an optional feature, this adds more complexity to the consumption of |
| 255 | +EndpointSlices for anyone that wants to support the feature. |
| 256 | + |
| 257 | +## Alternatives |
| 258 | + |
| 259 | +An alternative would be to use an approach that would allow delivery to multiple |
| 260 | +zones. With labels, this would require including the zone name in the label key: |
| 261 | + |
| 262 | +``` |
| 263 | +endpointslice.kubernetes.io/for-zone-a |
| 264 | +endpointslice.kubernetes.io/for-zone-b |
| 265 | +``` |
| 266 | + |
| 267 | +Unfortunately it would be much more difficult to build backwards compatible |
| 268 | +selectors to consume these labels. |
0 commit comments