Skip to content

Commit e1bdbca

Browse files
authored
Merge pull request kubernetes#4003 from aojea/proxy_prefer_zone
KEP-2433: add new heuristic to topology routing
2 parents bdccc9c + 15fb5f8 commit e1bdbca

File tree

2 files changed

+64
-39
lines changed

2 files changed

+64
-39
lines changed

keps/sig-network/2433-topology-aware-hints/README.md

Lines changed: 63 additions & 38 deletions
Original file line numberDiff line numberDiff line change
@@ -16,13 +16,16 @@
1616
- [Kube-Proxy](#kube-proxy)
1717
- [EndpointSlice Controller](#endpointslice-controller)
1818
- [Heuristics](#heuristics)
19-
- [Proportional CPU Heuristic](#proportional-cpu-heuristic)
20-
- [Assumptions](#assumptions)
21-
- [Identifying Zones](#identifying-zones)
19+
- [Identifying Zones](#identifying-zones)
2220
- [Excluding Control Plane Nodes](#excluding-control-plane-nodes)
23-
- [Example](#example)
2421
- [Overload](#overload)
2522
- [Handling Node Updates](#handling-node-updates)
23+
- [Proportional CPU Heuristic](#proportional-cpu-heuristic)
24+
- [Assumptions](#assumptions)
25+
- [Example](#example)
26+
- [PreferZone Heuristic](#preferzone-heuristic)
27+
- [Assumptions](#assumptions-1)
28+
- [Example](#example-1)
2629
- [Additional Heuristics](#additional-heuristics)
2730
- [Future Expansion](#future-expansion)
2831
- [Test Plan](#test-plan)
@@ -295,7 +298,7 @@ implemented directly by kube-proxy.
295298
### EndpointSlice Controller
296299

297300
When the `TopologyAwareHints` feature gate is enabled and the annotation is set
298-
to `Auto` or `ProportionalByCore` for a Service, the EndpointSlice controller
301+
to `Auto` or `ProportionalZoneCPU` for a Service, the EndpointSlice controller
299302
will add hints to EndpointSlices. These hints will indicate where an endpoint
300303
should be consumed by proxy implementations to enable topology aware routing.
301304

@@ -306,27 +309,15 @@ This KEP starts with the following heuristics:
306309
| Heuristic Name | Description |
307310
|-|-|
308311
| Auto | EndpointSlice controller and/or underlying dataplane can choose the heuristic used. |
309-
| ProportionalByCore | Endpoints will be allocated to each zone proportionally, based on the allocatable Node CPU cores in each zone. |
312+
| ProportionalZoneCPU | Endpoints will be allocated to each zone proportionally, based on the allocatable Node CPU cores in each zone. |
313+
| PreferZone | Hints are always populated to represent the zone the endpoint is in. |
310314

311315
In the future, additional heuristics may be added. Until that point, "Auto" will
312316
be the only configurable value. In most clusters, that will translate to
313-
`ProportionalByCore` unless the underlying dataplane has a better approach
317+
`ProportionalZoneCPU` unless the underlying dataplane has a better approach
314318
available.
315319

316-
### Proportional CPU Heuristic
317-
#### Assumptions
318-
319-
- Incoming traffic is proportional to the number of allocatable CPU cores in a
320-
zone. Although this is an imperfect metric, it is the best available way of
321-
predicting how much traffic will be received in a zone. If we are unable to
322-
derive the number of allocatable cores in a zone we will fall back to the
323-
number of nodes in that zone.
324-
- Service capacity is proportional to the number of endpoints in a zone. This
325-
assumes that each endpoint has equivalent capacity. Although this is not
326-
always true, it usually is. We can explore ways to deal with variable capacity
327-
endpoints in the future.
328-
329-
#### Identifying Zones
320+
### Identifying Zones
330321

331322
The EndpointSlice controller reads the standard `topology.kubernetes.io/zone`
332323
label on Nodes to determine which zone a Pod is running in. Kube-Proxy would be
@@ -340,23 +331,6 @@ calculating allocatable cores in a zone:
340331
* `node-role.kubernetes.io/control-plane`
341332
* `node-role.kubernetes.io/master`
342333

343-
#### Example
344-
345-
zone-a: 20 CPU cores
346-
zone-b: 16 CPU cores
347-
zone-c: 14 CPU cores
348-
349-
In this scenario, the following proportion of endpoints would be allocated for
350-
each Service:
351-
352-
zone-a: 40%
353-
zone-b: 32%
354-
zone-c: 28%
355-
356-
When allocating endpoints to meet this distribution, keeping endpoints in the
357-
same zone will be prioritized. When same-zone endpoints are exhausted, endpoints
358-
will be taken from zones that have excess capacity.
359-
360334
#### Overload
361335

362336
Overload is a key concept for this proposal. This occurs when there are less
@@ -393,6 +367,57 @@ of the following scenarios:
393367
2. A new Node results in a Service that is able to achieve an endpoint
394368
distribution below 20% for the first time.
395369

370+
### Proportional CPU Heuristic
371+
372+
#### Assumptions
373+
374+
- Incoming traffic is proportional to the number of allocatable CPU cores in a
375+
zone. Although this is an imperfect metric, it is the best available way of
376+
predicting how much traffic will be received in a zone. If we are unable to
377+
derive the number of allocatable cores in a zone we will fall back to the
378+
number of nodes in that zone.
379+
- Service capacity is proportional to the number of endpoints in a zone. This
380+
assumes that each endpoint has equivalent capacity. Although this is not
381+
always true, it usually is. We can explore ways to deal with variable capacity
382+
endpoints in the future.
383+
#### Example
384+
385+
zone-a: 20 CPU cores
386+
zone-b: 16 CPU cores
387+
zone-c: 14 CPU cores
388+
389+
In this scenario, the following proportion of endpoints would be allocated for
390+
each Service:
391+
392+
zone-a: 40%
393+
zone-b: 32%
394+
zone-c: 28%
395+
396+
When allocating endpoints to meet this distribution, keeping endpoints in the
397+
same zone will be prioritized. When same-zone endpoints are exhausted, endpoints
398+
will be taken from zones that have excess capacity.
399+
400+
### PreferZone Heuristic
401+
402+
#### Assumptions
403+
404+
- Endpoints are distributed per zone proportionally to the expected traffic capacity.
405+
406+
This heuristic will route traffic to the endpoints existing in the zone without any overflow.
407+
Dataplanes will fall back to cluster-wide routing if there are no endpoints with hints for the
408+
zone the dataplane is running in.
409+
There is risk of blackholing traffic or traffic imbalance if the endpoint distribution is incorrect.
410+
411+
#### Example
412+
413+
zone-a: 2 endpoints
414+
zone-b: 0 endpoint
415+
zone-c: 3 endpoints
416+
417+
In this scenario, traffic generated in zona-a or zone-c will be routed only to the endpoints existing
418+
in their corresponding zone. Traffic from zone-b, since does not have any endpoint, will fall back to
419+
cluster wide routing and will be routed to endpoints in zone-a and zone-c.
420+
396421
### Additional Heuristics
397422
To enable additional heuristics to be added in the future, we will:
398423

keps/sig-network/2433-topology-aware-hints/kep.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -34,7 +34,7 @@ latest-milestone: "v1.27"
3434
milestone:
3535
alpha: "v1.21"
3636
beta: "v1.23"
37-
stable: "v1.28"
37+
stable: "v1.29"
3838

3939
# The following PRR answers are required at alpha release
4040
# List the feature gate name and the components for which it must be enabled

0 commit comments

Comments
 (0)