|
| 1 | +--- |
| 2 | +reviewers: |
| 3 | +- robscott |
| 4 | +title: Topology Aware Hints |
| 5 | +content_type: concept |
| 6 | +weight: 10 |
| 7 | +--- |
| 8 | + |
| 9 | + |
| 10 | +<!-- overview --> |
| 11 | + |
| 12 | +{{< feature-state for_k8s_version="v1.21" state="alpha" >}} |
| 13 | + |
| 14 | +_Topology Aware Hints_ enable topology aware routing by including suggestions |
| 15 | +for how clients should consume endpoints. This approach adds metadata to enable |
| 16 | +consumers of Endpoint(Slice) to be able to route traffic closer to where it is |
| 17 | +originated. For example, users can route traffic within a locality to reduce |
| 18 | +costs and improve performance. |
| 19 | + |
| 20 | +<!-- body --> |
| 21 | + |
| 22 | +## Motivation |
| 23 | + |
| 24 | +Kubernetes clusters are increasingly deployed in multi-zone environments. |
| 25 | +_Topology Aware Hints_ provides a mechanism to help keep traffic within the zone |
| 26 | +it originated from. This concept is commonly referred to as "Topology Aware |
| 27 | +Routing". When calculating the endpoints for a Service, the EndpointSlice |
| 28 | +controller considers the topology (region and zone) of each endpoint and |
| 29 | +populates the hints field to allocate it to a zone. These hints are then |
| 30 | +consumed by components like kube-proxy as they configure how requests are |
| 31 | +routed. |
| 32 | + |
| 33 | +## Using Topology Aware Hints |
| 34 | + |
| 35 | +You can enable Topology Aware Hints for a Service by setting the |
| 36 | +`service.kubernetes.io/topology-aware-hints` annotation to `auto`. This tells |
| 37 | +the EndpointSlice controller to set topology hints if it is deemed safe. |
| 38 | +Importantly, this does not guarantee that hints will always be set. |
| 39 | + |
| 40 | +## How it Works |
| 41 | + |
| 42 | +The functionality enabling this feature is split into two components: The |
| 43 | +EndpointSlice controller and Kube-Proxy. This provides a high level overview of |
| 44 | +how each component implements this feature. |
| 45 | + |
| 46 | +### EndpointSlice controller |
| 47 | + |
| 48 | +The EndpointSlice controller is responsible for setting hints on EndpointSlices |
| 49 | +when this feature is enabled. The controller allocates a proportional amount of |
| 50 | +endpoints to each zone. This proportion is based on the |
| 51 | +[allocatable](/docs/tasks/administer-cluster/reserve-compute-resources/#node-allocatable) |
| 52 | +CPU cores for nodes running in that zone. For example, if one zone had 2 CPU |
| 53 | +cores and another zone only had 1 CPU core, the controller would allocated twice |
| 54 | +as many endpoints to the zone with 2 CPU cores. |
| 55 | + |
| 56 | +The following example shows what an EndpointSlice looks like when hints have |
| 57 | +been populated: |
| 58 | + |
| 59 | +```yaml |
| 60 | +apiVersion: discovery.k8s.io/v1 |
| 61 | +kind: EndpointSlice |
| 62 | +metadata: |
| 63 | + name: example-hints |
| 64 | + labels: |
| 65 | + kubernetes.io/service-name: example-svc |
| 66 | +addressType: IPv4 |
| 67 | +ports: |
| 68 | + - name: http |
| 69 | + protocol: TCP |
| 70 | + port: 80 |
| 71 | +endpoints: |
| 72 | + - addresses: |
| 73 | + - "10.1.2.3" |
| 74 | + conditions: |
| 75 | + ready: true |
| 76 | + hostname: pod-1 |
| 77 | + zone: zone-a |
| 78 | + hints: |
| 79 | + forZones: |
| 80 | + - name: "zone-a" |
| 81 | +``` |
| 82 | +
|
| 83 | +### Kube-Proxy |
| 84 | +
|
| 85 | +Kube-Proxy filters the endpoints it routes to based on the hints set by the |
| 86 | +EndpointSlice controller. In most cases, this means that kube-proxy will route |
| 87 | +to endpoints in the same zone. Sometimes the controller allocates endpoints from |
| 88 | +a different zone to ensure more even distribution of endpoints between zones. |
| 89 | +This would result in some traffic being routed to other zones. |
| 90 | +
|
| 91 | +## Safeguards |
| 92 | +
|
| 93 | +The Kubernetes control plane and the kube-proxy on each node apply some |
| 94 | +safeguard rules before using Topology Aware Hints. If these don't check out, |
| 95 | +kube-proxy selects endpoints from anywhere in your cluster, regardless of the |
| 96 | +zone. |
| 97 | +
|
| 98 | +1. **Insufficient number of endpoints:** If there are less endpoints than zones |
| 99 | + in a cluster, the controller will not assign any hints. |
| 100 | +
|
| 101 | +2. **Impossible to achieve balanced allocation:** In some cases, it will be |
| 102 | + impossible to achieve a balanced allocation of endpoints among zones. For |
| 103 | + example, if zone-a is twice as large as zone-b, but there are only 2 |
| 104 | + endpoints, an endpoint allocated to zone-a may receive twice as much traffic |
| 105 | + as zone-b. The controller wil not assign hints if it can't get this "expected |
| 106 | + overload" value below an acceptable threshold for each zone. Importantly this |
| 107 | + is not based on real-time feedback. It is still possible for individual |
| 108 | + endpoints to become overloaded. |
| 109 | +
|
| 110 | +3. **One or more Nodes has insufficient information:** If any node does not have |
| 111 | + a `topology.kubernetes.io/zone` label or is not reporting a value for |
| 112 | + allocatable CPU, the control plane does not set any topology-aware endpoint |
| 113 | + hints and so kube-proxy does not filter endpoints by zone. |
| 114 | + |
| 115 | +4. **One or more endpoints does not have a zone hint:** When this happens, |
| 116 | + kube-proxy assumes that a transition from or to Topology Aware Hints is |
| 117 | + underway. Filtering endpoints for a Service in this state would be dangerous |
| 118 | + so Kube-Proxy falls back to using all endpoints. |
| 119 | + |
| 120 | +5. **A zone is not represented in hints:** If kube-proxy is unable to find at |
| 121 | + least one endpoint with a hint targeting the zone it is running in, it will |
| 122 | + fall back to using endpoints from all zones. This is most likely to happen as |
| 123 | + a new zone is being added to a cluster. |
| 124 | + |
| 125 | +## Constraints |
| 126 | + |
| 127 | +* Topology Aware Hints are not used when either `externalTrafficPolicy` or |
| 128 | + `internalTrafficPolicy` is set to `Local` on a Service. It is possible to use |
| 129 | + both features in the same cluster on different Services, just not on the same |
| 130 | + Service. |
| 131 | + |
| 132 | +* This approach will not work well for Services that have a large proportion of |
| 133 | + traffic originating from a subset of zones. Instead this assumes that incoming |
| 134 | + traffic will be roughly proportional to the capacity of the Nodes in each |
| 135 | + zone. |
| 136 | + |
| 137 | +* The EndpointSlice controller ignores unready nodes as it calculates the |
| 138 | + proportions of each zone. This could have unintended consequences if a large |
| 139 | + portion of nodes are unready. |
| 140 | + |
| 141 | +* The EndpointSlice controller does not take into account {{< glossary_tooltip |
| 142 | + text="tolerations" term_id="toleration" >}} when deploying calculating the |
| 143 | + proportions of each zone. If the Pods backing a Service are limited to a |
| 144 | + subset of Nodes in the cluster, this will not be taken into account. |
| 145 | + |
| 146 | +* This may not work well with autoscaling. For example, if a lot of traffic is |
| 147 | + originating from a single zone, only the endpoints allocated to that zone will |
| 148 | + be handling that traffic. That could result in {{< glossary_tooltip |
| 149 | + text="Horizontal Pod Autoscaler" term_id="horizontal-pod-autoscaler" >}} |
| 150 | + either not picking up on this event, or newly added pods starting in a |
| 151 | + different zone. |
| 152 | + |
| 153 | +## {{% heading "whatsnext" %}} |
| 154 | + |
| 155 | +* Read about [enabling Topology Aware Hints](/docs/tasks/administer-cluster/enabling-topology-aware-hints) |
| 156 | +* Read [Connecting Applications with Services](/docs/concepts/services-networking/connect-applications-service/) |
0 commit comments