Skip to content

Commit ce7e330

Browse files
authored
Merge pull request #36673 from sftim/20220825_favor_endpointslices
Favor EndpointSlice over Endpoints
2 parents 4a1fa16 + 1eef742 commit ce7e330

File tree

12 files changed

+192
-106
lines changed

12 files changed

+192
-106
lines changed

content/en/docs/concepts/architecture/cloud-controller.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -107,7 +107,7 @@ routes appropriately. It requires Get access to Node objects.
107107

108108
### Service controller {#authorization-service-controller}
109109

110-
The service controller listens to Service object Create, Update and Delete events and then configures Endpoints for those Services appropriately.
110+
The service controller listens to Service object Create, Update and Delete events and then configures Endpoints for those Services appropriately (for EndpointSlices, the kube-controller-manager manages these on demand).
111111

112112
To access Services, it requires List, and Watch access. To update Services, it requires Patch and Update access.
113113

content/en/docs/concepts/overview/components.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -53,8 +53,8 @@ Some types of these controllers are:
5353
* Node controller: Responsible for noticing and responding when nodes go down.
5454
* Job controller: Watches for Job objects that represent one-off tasks, then creates
5555
Pods to run those tasks to completion.
56-
* Endpoints controller: Populates the Endpoints object (that is, joins Services & Pods).
57-
* Service Account & Token controllers: Create default accounts and API access tokens for new namespaces.
56+
* EndpointSlice controller: Populates EndpointSlice objects (to provide a link between Services and Pods).
57+
* ServiceAccount controller: Create default ServiceAccounts for new namespaces.
5858

5959
### cloud-controller-manager
6060

content/en/docs/concepts/services-networking/connect-applications-service.md

Lines changed: 11 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -91,10 +91,14 @@ my-nginx ClusterIP 10.0.162.149 <none> 80/TCP 21s
9191
```
9292
9393
As mentioned previously, a Service is backed by a group of Pods. These Pods are
94-
exposed through `endpoints`. The Service's selector will be evaluated continuously
95-
and the results will be POSTed to an Endpoints object also named `my-nginx`.
96-
When a Pod dies, it is automatically removed from the endpoints, and new Pods
97-
matching the Service's selector will automatically get added to the endpoints.
94+
exposed through
95+
{{<glossary_tooltip term_id="endpoint-slice" text="EndpointSlices">}}.
96+
The Service's selector will be evaluated continuously and the results will be POSTed
97+
to an EndpointSlice that is connected to the Service using a
98+
{{< glossary_tooltip text="labels" term_id="label" >}}.
99+
When a Pod dies, it is automatically removed from the EndpointSlices that contain it
100+
as an endpoint. New Pods that match the Service's selector will automatically get added
101+
to an EndpointSlice for that Service.
98102
Check the endpoints, and note that the IPs are the same as the Pods created in
99103
the first step:
100104
@@ -115,11 +119,11 @@ Session Affinity: None
115119
Events: <none>
116120
```
117121
```shell
118-
kubectl get ep my-nginx
122+
kubectl get endpointslices -l kubernetes.io/service-name=my-nginx
119123
```
120124
```
121-
NAME ENDPOINTS AGE
122-
my-nginx 10.244.2.5:80,10.244.3.4:80 1m
125+
NAME ADDRESSTYPE PORTS ENDPOINTS AGE
126+
my-nginx-7vzhx IPv4 80 10.244.2.5,10.244.3.4 21s
123127
```
124128
125129
You should now be able to curl the nginx Service on `<CLUSTER-IP>:<PORT>` from

content/en/docs/concepts/services-networking/dns-pod-service.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -186,8 +186,8 @@ the same namespace, the Pod will see its own FQDN as
186186
A or AAAA record at that name, pointing to the Pod's IP. Both Pods "`busybox1`" and
187187
"`busybox2`" can have their distinct A or AAAA records.
188188

189-
The Endpoints object can specify the `hostname` for any endpoint addresses,
190-
along with its IP.
189+
An {{<glossary_tooltip term_id="endpoint-slice" text="EndpointSlice">}} can specify
190+
the DNS hostname for any endpoint addresses, along with its IP.
191191

192192
{{< note >}}
193193
Because A or AAAA records are not created for Pod names, `hostname` is required for the Pod's A or AAAA

content/en/docs/concepts/services-networking/endpoint-slices.md

Lines changed: 46 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -23,24 +23,7 @@ Endpoints.
2323

2424
<!-- body -->
2525

26-
## Motivation
27-
28-
The Endpoints API has provided a simple and straightforward way of
29-
tracking network endpoints in Kubernetes. Unfortunately as Kubernetes clusters
30-
and {{< glossary_tooltip text="Services" term_id="service" >}} have grown to handle and
31-
send more traffic to more backend Pods, limitations of that original API became
32-
more visible.
33-
Most notably, those included challenges with scaling to larger numbers of
34-
network endpoints.
35-
36-
Since all network endpoints for a Service were stored in a single Endpoints
37-
resource, those resources could get quite large. That affected the performance
38-
of Kubernetes components (notably the master control plane) and resulted in
39-
significant amounts of network traffic and processing when Endpoints changed.
40-
EndpointSlices help you mitigate those issues as well as provide an extensible
41-
platform for additional features such as topological routing.
42-
43-
## EndpointSlice resources {#endpointslice-resource}
26+
## EndpointSlice API {#endpointslice-resource}
4427

4528
In Kubernetes, an EndpointSlice contains references to a set of network
4629
endpoints. The control plane automatically creates EndpointSlices
@@ -52,7 +35,7 @@ Service name.
5235
The name of a EndpointSlice object must be a valid
5336
[DNS subdomain name](/docs/concepts/overview/working-with-objects/names#dns-subdomain-names).
5437

55-
As an example, here's a sample EndpointSlice resource for the `example`
38+
As an example, here's a sample EndpointSlice object, that's owned by the `example`
5639
Kubernetes Service.
5740

5841
```yaml
@@ -85,8 +68,7 @@ flag, up to a maximum of 1000.
8568

8669
EndpointSlices can act as the source of truth for
8770
{{< glossary_tooltip term_id="kube-proxy" text="kube-proxy" >}} when it comes to
88-
how to route internal traffic. When enabled, they should provide a performance
89-
improvement for services with large numbers of endpoints.
71+
how to route internal traffic.
9072

9173
### Address types
9274

@@ -96,6 +78,10 @@ EndpointSlices support three address types:
9678
* IPv6
9779
* FQDN (Fully Qualified Domain Name)
9880

81+
Each `EndpointSlice` object represents a specific IP address type. If you have
82+
a Service that is available via IPv4 and IPv6, there will be at least two
83+
`EndpointSlice` objects (one for IPv4, and one for IPv6).
84+
9985
### Conditions
10086

10187
The EndpointSlice API stores conditions about endpoints that may be useful for consumers.
@@ -245,11 +231,45 @@ getting replaced.
245231

246232
Due to the nature of EndpointSlice changes, endpoints may be represented in more
247233
than one EndpointSlice at the same time. This naturally occurs as changes to
248-
different EndpointSlice objects can arrive at the Kubernetes client watch/cache
249-
at different times. Implementations using EndpointSlice must be able to have the
250-
endpoint appear in more than one slice. A reference implementation of how to
251-
perform endpoint deduplication can be found in the `EndpointSliceCache`
252-
implementation in `kube-proxy`.
234+
different EndpointSlice objects can arrive at the Kubernetes client watch / cache
235+
at different times.
236+
237+
{{< note >}}
238+
Clients of the EndpointSlice API must be able to handle the situation where
239+
a particular endpoint address appears in more than one slice.
240+
241+
You can find a reference implementation for how to perform this endpoint deduplication
242+
as part of the `EndpointSliceCache` code within `kube-proxy`.
243+
{{< /note >}}
244+
245+
## Comparison with Endpoints {#motivation}
246+
247+
The original Endpoints API provided a simple and straightforward way of
248+
tracking network endpoints in Kubernetes. As Kubernetes clusters
249+
and {{< glossary_tooltip text="Services" term_id="service" >}} grew to handle
250+
more traffic and to send more traffic to more backend Pods, the
251+
limitations of that original API became more visible.
252+
Most notably, those included challenges with scaling to larger numbers of
253+
network endpoints.
254+
255+
Since all network endpoints for a Service were stored in a single Endpoints
256+
object, those Endpoints objects could get quite large. For Services that stayed
257+
stable (the same set of endpoints over a long period of time) the impact was
258+
less noticeable; even then, some use cases of Kubernetes weren't well served.
259+
260+
When a Service had a lot of backend endpoints and the workload was either
261+
scaling frequently, or rolling out new changes frequently, each update to
262+
the single Endpoints object for that Service meant a lot of traffic between
263+
Kubernetes cluster components (within the control plane, and also between
264+
nodes and the API server). This extra traffic also had a cost in terms of
265+
CPU use.
266+
267+
With EndpointSlices, adding or removing a single Pod triggers the same _number_
268+
of updates to clients that are watching for changes, but the size of those
269+
update message is much smaller at large scale.
270+
271+
EndpointSlices also enabled innovation around new features such dual-stack
272+
networking and topology-aware routing.
253273

254274
## {{% heading "whatsnext" %}}
255275

content/en/docs/concepts/services-networking/service.md

Lines changed: 105 additions & 51 deletions
Original file line numberDiff line numberDiff line change
@@ -63,7 +63,8 @@ The Service abstraction enables this decoupling.
6363

6464
If you're able to use Kubernetes APIs for service discovery in your application,
6565
you can query the {{< glossary_tooltip text="API server" term_id="kube-apiserver" >}}
66-
for Endpoints, that get updated whenever the set of Pods in a Service changes.
66+
for matching EndpointSlices. Kubernetes updates the EndpointSlices for a Service
67+
whenever the set of Pods in a Service changes.
6768

6869
For non-native applications, Kubernetes offers ways to place a network port or load
6970
balancer in between your application and the backend Pods.
@@ -161,8 +162,12 @@ Each port definition can have the same `protocol`, or a different one.
161162
### Services without selectors
162163

163164
Services most commonly abstract access to Kubernetes Pods thanks to the selector,
164-
but when used with a corresponding Endpoints object and without a selector, the Service can abstract other kinds of backends,
165-
including ones that run outside the cluster. For example:
165+
but when used with a corresponding set of
166+
{{<glossary_tooltip term_id="endpoint-slice" text="EndpointSlices">}}
167+
objects and without a selector, the Service can abstract other kinds of backends,
168+
including ones that run outside the cluster.
169+
170+
For example:
166171

167172
* You want to have an external database cluster in production, but in your
168173
test environment you use your own databases.
@@ -186,73 +191,119 @@ spec:
186191
targetPort: 9376
187192
```
188193

189-
Because this Service has no selector, the corresponding Endpoints object is not
190-
created automatically. You can manually map the Service to the network address and port
191-
where it's running, by adding an Endpoints object manually:
194+
Because this Service has no selector, the corresponding EndpointSlice (and
195+
legacy Endpoints) objects are not created automatically. You can manually map the Service
196+
to the network address and port where it's running, by adding an EndpointSlice
197+
object manually. For example:
192198

193199
```yaml
194-
apiVersion: v1
195-
kind: Endpoints
200+
apiVersion: discovery.k8s.io/v1
201+
kind: EndpointSlice
196202
metadata:
197-
# the name here should match the name of the Service
198-
name: my-service
199-
subsets:
203+
name: my-service-1 # by convention, use the name of the Service
204+
# as a prefix for the name of the EndpointSlice
205+
labels:
206+
# You should set the "kubernetes.io/service-name" label.
207+
# Set its value to match the name of the Service
208+
kubernetes.io/service-name: my-service
209+
addressType: IPv4
210+
ports:
211+
- name: '' # empty because port 9376 is not assigned as a well-known
212+
# port (by IANA)
213+
appProtocol: http
214+
protocol: TCP
215+
port: 9376
216+
endpoints:
200217
- addresses:
201-
- ip: 192.0.2.42
202-
ports:
203-
- port: 9376
218+
- "10.4.5.6" # the IP addresses in this list can appear in any order
219+
- "10.1.2.3"
204220
```
205221

206-
The name of the Endpoints object must be a valid
207-
[DNS subdomain name](/docs/concepts/overview/working-with-objects/names#dns-subdomain-names).
222+
#### Custom EndpointSlices
208223

209-
When you create an [Endpoints](/docs/reference/kubernetes-api/service-resources/endpoints-v1/)
210-
object for a Service, you set the name of the new object to be the same as that
211-
of the Service.
224+
When you create an [EndpointSlice](#endpointslices) object for a Service, you can
225+
use any name for the EndpointSlice. Each EndpointSlice in a namespace must have a
226+
unique name. You link an EndpointSlice to a Service by setting the
227+
`kubernetes.io/service-name` {{< glossary_tooltip text="label" term_id="label" >}}
228+
on that EndpointSlice.
212229

213230
{{< note >}}
214231
The endpoint IPs _must not_ be: loopback (127.0.0.0/8 for IPv4, ::1/128 for IPv6), or
215232
link-local (169.254.0.0/16 and 224.0.0.0/24 for IPv4, fe80::/64 for IPv6).
216233

217-
Endpoint IP addresses cannot be the cluster IPs of other Kubernetes Services,
234+
The endpoint IP addresses cannot be the cluster IPs of other Kubernetes Services,
218235
because {{< glossary_tooltip term_id="kube-proxy" >}} doesn't support virtual IPs
219236
as a destination.
220237
{{< /note >}}
221238

222-
Accessing a Service without a selector works the same as if it had a selector.
223-
In the example above, traffic is routed to the single endpoint defined in
224-
the YAML: `192.0.2.42:9376` (TCP).
239+
For an EndpointSlice that you create yourself, or in your own code,
240+
you should also pick a value to use for the [`endpointslice.kubernetes.io/managed-by`](/docs/reference/labels-annotations-taints/#endpointslicekubernetesiomanaged-by) label.
241+
If you create your own controller code to manage EndpointSlices, consider using a
242+
value similar to `"my-domain.example/name-of-controller"`. If you are using a third
243+
party tool, use the name of the tool in all-lowercase and change spaces and other
244+
punctuation to dashes (`-`).
245+
If people are directly using a tool such as `kubectl` to manage EndpointSlices,
246+
use a name that describes this manual management, such as `"staff"` or
247+
`"cluster-admins"`. You should
248+
avoid using the reserved value `"controller"`, which identifies EndpointSlices
249+
managed by Kubernetes' own control plane.
225250

226-
{{< note >}}
227-
The Kubernetes API server does not allow proxying to endpoints that are not mapped to
228-
pods. Actions such as `kubectl proxy <service-name>` where the service has no
229-
selector will fail due to this constraint. This prevents the Kubernetes API server
230-
from being used as a proxy to endpoints the caller may not be authorized to access.
231-
{{< /note >}}
251+
#### Accessing a Service without a selector {#service-no-selector-access}
252+
253+
Accessing a Service without a selector works the same as if it had a selector.
254+
In the [example](#services-without-selectors) for a Service without a selector, traffic is routed to one of the two endpoints defined in
255+
the EndpointSlice manifest: a TCP connection to 10.1.2.3 or 10.4.5.6, on port 9376.
232256

233257
An ExternalName Service is a special case of Service that does not have
234258
selectors and uses DNS names instead. For more information, see the
235259
[ExternalName](#externalname) section later in this document.
236260

237-
### Over Capacity Endpoints
238-
If an Endpoints resource has more than 1000 endpoints then a Kubernetes v1.22 (or later)
239-
cluster annotates that Endpoints with `endpoints.kubernetes.io/over-capacity: truncated`.
240-
This annotation indicates that the affected Endpoints object is over capacity and that
241-
the endpoints controller has truncated the number of endpoints to 1000.
242-
243261
### EndpointSlices
244262

245263
{{< feature-state for_k8s_version="v1.21" state="stable" >}}
246264

247-
EndpointSlices are an API resource that can provide a more scalable alternative
248-
to Endpoints. Although conceptually quite similar to Endpoints, EndpointSlices
249-
allow for distributing network endpoints across multiple resources. By default,
250-
an EndpointSlice is considered "full" once it reaches 100 endpoints, at which
251-
point additional EndpointSlices will be created to store any additional
252-
endpoints.
265+
[EndpointSlices](/docs/concepts/services-networking/endpoint-slices/) are objects that
266+
represent a subset (a _slice_) of the backing network endpoints for a Service.
267+
268+
Your Kubernetes cluster tracks how many endpoints each EndpointSlice represents.
269+
If there are so many endpoints for a Service that a threshold is reached, then
270+
Kubernetes adds another empty EndpointSlice and stores new endpoint information
271+
there.
272+
By default, Kubernetes makes a new EndpointSlice once the existing EndpointSlices
273+
all contain at least 100 endpoints. Kubernetes does not make the new EndpointSlice
274+
until an extra endpoint needs to be added.
275+
276+
See [EndpointSlices](/docs/concepts/services-networking/endpoint-slices/) for more
277+
information about this API.
278+
279+
### Endpoints
280+
281+
In the Kubernetes API, an
282+
[Endpoints](/docs/reference/kubernetes-api/service-resources/endpoints-v1/)
283+
(the resource kind is plural) defines a list of network endpoints, typically
284+
referenced by a Service to define which Pods the traffic can be sent to.
285+
286+
The EndpointSlice API is the recommended replacement for Endpoints.
287+
288+
#### Over-capacity endpoints
289+
290+
Kubernetes limits the number of endpoints that can fit in a single Endpoints
291+
object. When there are over 1000 backing endpoints for a Service, Kubernetes
292+
truncates the data in the Endpoints object. Because a Service can be linked
293+
with more than one EndpointSlice, the 1000 backing endpoint limit only
294+
affects the legacy Endpoints API.
295+
296+
In that case, Kubernetes selects at most 1000 possible backend endpoints to store
297+
into the Endpoints object, and sets an
298+
{{< glossary_tooltip text="annotation" term_id="annotation" >}} on the
299+
Endpoints:
300+
[`endpoints.kubernetes.io/over-capacity: truncated`](/docs/reference/labels-annotations-taints/#endpoints-kubernetes-io-over-capacity).
301+
The control plane also removes that annotation if the number of backend Pods drops below 1000.
302+
303+
Traffic is still sent to backends, but any load balancing mechanism that relies on the
304+
legacy Endpoints API only sends traffic to at most 1000 of the available backing endpoints.
253305

254-
EndpointSlices provide additional attributes and functionality which is
255-
described in detail in [EndpointSlices](/docs/concepts/services-networking/endpoint-slices/).
306+
The same API limit means that you cannot manually update an Endpoints to have more than 1000 endpoints.
256307

257308
### Application protocol
258309

@@ -573,19 +624,22 @@ selectors defined:
573624

574625
### With selectors
575626

576-
For headless Services that define selectors, the endpoints controller creates
577-
`Endpoints` records in the API, and modifies the DNS configuration to return
578-
A records (IP addresses) that point directly to the `Pods` backing the `Service`.
627+
For headless Services that define selectors, the Kubernetes control plane creates
628+
EndpointSlice objects in the Kubernetes API, and modifies the DNS configuration to return
629+
A or AAAA records (IPv4 or IPv6 addresses) that point directly to the Pods backing
630+
the Service.
579631

580632
### Without selectors
581633

582-
For headless Services that do not define selectors, the endpoints controller does
583-
not create `Endpoints` records. However, the DNS system looks for and configures
634+
For headless Services that do not define selectors, the control plane does
635+
not create EndpointSlice objects. However, the DNS system looks for and configures
584636
either:
585637

586-
* CNAME records for [`ExternalName`](#externalname)-type Services.
587-
* A records for any `Endpoints` that share a name with the Service, for all
588-
other types.
638+
* DNS CNAME records for [`type: ExternalName`](#externalname) Services.
639+
* DNS A / AAAA records for all IP addresses of the Service's ready endpoints,
640+
for all Service types other than `ExternalName`.
641+
* For IPv4 endpoints, the DNS system creates A records.
642+
* For IPv6 endpoints, the DNS system creates AAAA records.
589643

590644
## Publishing Services (ServiceTypes) {#publishing-services-service-types}
591645

content/en/docs/concepts/services-networking/topology-aware-hints.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@ description: >-
1717

1818
_Topology Aware Hints_ enable topology aware routing by including suggestions
1919
for how clients should consume endpoints. This approach adds metadata to enable
20-
consumers of EndpointSlice and / or Endpoints objects, so that traffic to
20+
consumers of EndpointSlice (or Endpoints) objects, so that traffic to
2121
those network endpoints can be routed closer to where it originated.
2222

2323
For example, you can route traffic within a locality to reduce

0 commit comments

Comments
 (0)