Skip to content

Commit bd3fcac

Browse files
authored
Merge pull request #37648 from andrewsykim/blog-kep-2086
Add blog article: Kubernetes v1.26: Advancements in Kubernetes Traffic Engineering
2 parents 19a22d4 + 8592fa2 commit bd3fcac

9 files changed

+117
-0
lines changed
75.1 KB
Loading
76.9 KB
Loading
Lines changed: 117 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,117 @@
1+
---
2+
layout: blog
3+
title: "Kubernetes v1.26: Advancements in Kubernetes Traffic Engineering"
4+
date: 2022-12-30
5+
slug: advancements-in-kubernetes-traffic-engineering
6+
---
7+
8+
**Authors:** Andrew Sy Kim (Google)
9+
10+
Kubernetes v1.26 includes significant advancements in network traffic engineering with the graduation of
11+
two features (Service internal traffic policy support, and EndpointSlice terminating conditions) to GA,
12+
and a third feature (Proxy terminating endpoints) to beta. The combination of these enhancements aims
13+
to address short-comings in traffic engineering that people face today, and unlock new capabilities for the future.
14+
15+
## Traffic Loss from Load Balancers During Rolling Updates
16+
17+
Prior to Kubernetes v1.26, clusters could experience [loss of traffic](https://github.com/kubernetes/kubernetes/issues/85643)
18+
from Service load balancers during rolling updates when setting the `externalTrafficPolicy` field to `Local`.
19+
There are a lot of moving parts at play here so a quick overview of how Kubernetes manages load balancers might help!
20+
21+
In Kubernetes, you can create a Service with `type: LoadBalancer` to expose an application externally with a load balancer.
22+
The load balancer implementation varies between clusters and platforms, but the Service provides a generic abstraction
23+
representing the load balancer that is consistent across all Kubernetes installations.
24+
25+
```yaml
26+
apiVersion: v1
27+
kind: Service
28+
metadata:
29+
name: my-service
30+
spec:
31+
selector:
32+
app.kubernetes.io/name: my-app
33+
ports:
34+
- protocol: TCP
35+
port: 80
36+
targetPort: 9376
37+
type: LoadBalancer
38+
```
39+
40+
Under the hood, Kubernetes allocates a NodePort for the Service, which is then used by kube-proxy to provide a
41+
network data path from the NodePort to the Pod. A controller will then add all available Nodes in the cluster
42+
to the load balancer’s backend pool, using the designated NodePort for the Service as the backend target port.
43+
44+
{{< figure src="traffic-engineering-service-load-balancer.png" caption="Figure 1: Overview of Service load balancers" >}}
45+
46+
Oftentimes it is beneficial to set `externalTrafficPolicy: Local` for Services, to avoid extra hops between
47+
Nodes that are not running healthy Pods backing that Service. When using `externalTrafficPolicy: Local`,
48+
an additional NodePort is allocated for health checking purposes, such that Nodes that do not contain healthy
49+
Pods are excluded from the backend pool for a load balancer.
50+
51+
{{< figure src="traffic-engineering-lb-healthy.png" caption="Figure 2: Load balancer traffic to a healthy Node, when externalTrafficPolicy is Local" >}}
52+
53+
One such scenario where traffic can be lost is when a Node loses all Pods for a Service,
54+
but the external load balancer has not probed the health check NodePort yet. The likelihood of this situation
55+
is largely dependent on the health checking interval configured on the load balancer. The larger the interval,
56+
the more likely this will happen, since the load balancer will continue to send traffic to a node
57+
even after kube-proxy has removed forwarding rules for that Service. This also occurrs when Pods start terminating
58+
during rolling updates. Since Kubernetes does not consider terminating Pods as “Ready”, traffic can be loss
59+
when there are only terminating Pods on any given Node during a rolling update.
60+
61+
{{< figure src="traffic-engineering-lb-without-proxy-terminating-endpoints.png" caption="Figure 3: Load balancer traffic to terminating endpoints, when externalTrafficPolicy is Local" >}}
62+
63+
Starting in Kubernetes v1.26, kube-proxy enables the `ProxyTerminatingEndpoints` feature by default, which
64+
adds automatic failover and routing to terminating endpoints in scenarios where the traffic would otherwise
65+
be dropped. More specifically, when there is a rolling update and a Node only contains terminating Pods,
66+
kube-proxy will route traffic to the terminating Pods based on their readiness. In addition, kube-proxy will
67+
actively fail the health check NodePort if there are only terminating Pods available. By doing so,
68+
kube-proxy alerts the external load balancer that new connections should not be sent to that Node but will
69+
gracefully handle requests for existing connections.
70+
71+
{{< figure src="traffic-engineering-lb-with-proxy-terminating-endpoints.png" caption="Figure 4: Load Balancer traffic to terminating endpoints with ProxyTerminatingEndpoints enabled, when externalTrafficPolicy is Local" >}}
72+
73+
### EndpointSlice Conditions
74+
75+
In order to support this new capability in kube-proxy, the EndpointSlice API introduced new conditions for endpoints:
76+
`serving` and `terminating`.
77+
78+
{{< figure src="endpointslice-overview.png" caption="Figure 5: Overview of EndpointSlice conditions" >}}
79+
80+
The `serving` condition is semantically identical to `ready`, except that it can be `true` or `false`
81+
while a Pod is terminating, unlike `ready` which will always be `false` for terminating Pods for compatibility reasons.
82+
The `terminating` condition is true for Pods undergoing termination (non-empty deletionTimestamp), false otherwise.
83+
84+
The addition of these two conditions enables consumers of this API to understand Pod states that were previously not possible.
85+
For example, we can now track "ready" and "not ready" Pods that are also terminating.
86+
87+
{{< figure src="endpointslice-with-terminating-pod.png" caption="Figure 6: EndpointSlice conditions with a terminating Pod" >}}
88+
89+
Consumers of the EndpointSlice API, such as Kube-proxy and Ingress Controllers, can now use these conditions to coordinate connection draining
90+
events, by continuing to forward traffic for existing connections but rerouting new connections to other non-terminating endpoints.
91+
92+
## Optimizing Internal Node-Local Traffic
93+
94+
Similar to how Services can set `externalTrafficPolicy: Local` to avoid extra hops for externally sourced traffic, Kubernetes
95+
now supports `internalTrafficPolicy: Local`, to enable the same optimization for traffic originating within the cluster, specifically
96+
for traffic using the Service Cluster IP as the destination address. This feature graduated to Beta in Kubernetes v1.24 and is graduating to GA in v1.26.
97+
98+
Services default the `internalTrafficPolicy` field to `Cluster`, where traffic is randomly distributed to all endpoints.
99+
100+
{{< figure src="service-internal-traffic-policy-cluster.png" caption="Figure 7: Service routing when internalTrafficPolicy is Cluster" >}}
101+
102+
When `internalTrafficPolicy` is set to `Local`, kube-proxy will forward internal traffic for a Service only if there is an available endpoint
103+
that is local to the same Node.
104+
105+
{{< figure src="service-internal-traffic-policy-local.png" caption="Figure 8: Service routing when internalTrafficPolicy is Local" >}}
106+
107+
{{< caution >}}
108+
When using `internalTrafficPoliy: Local`, traffic will be dropped by kube-proxy when no local endpoints are available.
109+
{{< /caution >}}
110+
111+
## Getting Involved
112+
113+
If you're interested in future discussions on Kubernetes traffic engineering, you can get involved in SIG Network through the following ways:
114+
* Slack: [#sig-network](https://kubernetes.slack.com/messages/sig-network)
115+
* [Mailing list](https://groups.google.com/forum/#!forum/kubernetes-sig-network)
116+
* [Open Community Issues/PRs](https://github.com/kubernetes/community/labels/sig%2Fnetwork)
117+
* [Biweekly meetings](https://github.com/kubernetes/community/tree/master/sig-network#meetings)
88.6 KB
Loading
80.1 KB
Loading
69 KB
Loading
Loading
Loading
Loading

0 commit comments

Comments
 (0)