Skip to content

Commit 537f858

Browse files
authored
Merge pull request #5151 from gauravkghildiyal/kep-2433-to-stable
KEP-2433: Graduate Topology Aware Hints with only the Hints field to GA
2 parents 598575c + 50533ae commit 537f858

File tree

3 files changed

+193
-45
lines changed

3 files changed

+193
-45
lines changed

keps/prod-readiness/sig-network/2433.yaml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,3 +3,5 @@ alpha:
33
approver: "@wojtek-t"
44
beta:
55
approver: "@wojtek-t"
6+
stable:
7+
approver: "@wojtek-t"

keps/sig-network/2433-topology-aware-hints/README.md

Lines changed: 184 additions & 42 deletions
Original file line numberDiff line numberDiff line change
@@ -1,18 +1,19 @@
11
# KEP: Topology Aware Hints
22
<!-- toc -->
33
- [Release Signoff Checklist](#release-signoff-checklist)
4+
- [IMPORTANT: Scope Reduction (Feb 2025)](#important-scope-reduction-feb-2025)
45
- [Summary](#summary)
56
- [Motivation](#motivation)
67
- [Goals](#goals)
78
- [Non-Goals](#non-goals)
89
- [Proposal](#proposal)
910
- [Risks and Mitigations](#risks-and-mitigations)
1011
- [Design Details](#design-details)
12+
- [API](#api)
13+
- [Future API Expansion](#future-api-expansion)
1114
- [Configuration](#configuration)
1215
- [Interoperability](#interoperability)
1316
- [Feature Gate](#feature-gate)
14-
- [API](#api)
15-
- [Future API Expansion](#future-api-expansion)
1617
- [Kube-Proxy](#kube-proxy)
1718
- [EndpointSlice Controller](#endpointslice-controller)
1819
- [Heuristics](#heuristics)
@@ -65,15 +66,42 @@ Items marked with (R) are required *prior to targeting to a milestone / release*
6566
- [x] (R) Graduation criteria is in place
6667
- [x] (R) Production readiness review completed
6768
- [x] (R) Production readiness review approved
68-
- [ ] "Implementation History" section is up-to-date for milestone
69-
- [ ] User-facing documentation has been created in [kubernetes/website], for publication to [kubernetes.io]
70-
- [ ] Supporting documentation—e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes
69+
- [X] "Implementation History" section is up-to-date for milestone
70+
- [X] User-facing documentation has been created in [kubernetes/website], for publication to [kubernetes.io]
71+
- [X] Supporting documentation—e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes
7172

7273
[kubernetes.io]: https://kubernetes.io/
7374
[kubernetes/enhancements]: https://git.k8s.io/enhancements
7475
[kubernetes/kubernetes]: https://git.k8s.io/kubernetes
7576
[kubernetes/website]: https://git.k8s.io/website
7677

78+
## IMPORTANT: Scope Reduction (Feb 2025)
79+
80+
This KEP's GA scope has been significantly reduced. While originally the KEP
81+
proposed both the `hints` field in `EndpointSlice` *and* a topology-aware
82+
routing implementation using Service annotation
83+
`service.kubernetes.io/topology-mode=Auto`, *only the `hints` field is being
84+
graduated to GA*. The topology-aware routing aspects, including the
85+
`service.kubernetes.io/topology-mode` annotation and associated heuristics, are
86+
not part of this GA release.
87+
88+
The following sections of this KEP are provided for historical context and to
89+
explain the rationale behind the `hints` field. The reason the entire KEP has
90+
not been updated is to maintain this valuable context. While other sections of
91+
this KEP remain, they have not been updated to fully reflect this scope
92+
reduction and should be considered in that light. Much of the content, including
93+
aspects of the Production Readiness Review, remains applicable as significant
94+
portions of the original implementation are still in use and will graduate to GA
95+
separately (through other KEPs, with their own Production Readiness Review),
96+
even though only the API change (the `hints` field itself) is graduating through
97+
this KEP.
98+
99+
For current active plans on topology-aware routing solutions, please refer to the
100+
following KEPs:
101+
102+
* https://kep.k8s.io/4444
103+
* https://kep.k8s.io/3015
104+
77105
## Summary
78106

79107
Kubernetes clusters are increasingly deployed in multi-zone environments but
@@ -132,9 +160,10 @@ for most use cases.
132160
- Ensuring that Pods are distributed evenly across zones.
133161

134162
## Proposal
163+
135164
This KEP describes two related concepts:
136165

137-
1. A way to express the heuristic you'd like to use for Topology Aware Routing.
166+
1. (Not graduating to GA; see [scope reduction](#important-scope-reduction-feb-2025)) A way to express the heuristic you'd like to use for Topology Aware Routing.
138167
2. A new Hints field in EndpointSlices that can be used to enable certain
139168
topology heuristics.
140169

@@ -194,33 +223,6 @@ with a new Service annotation.
194223

195224
## Design Details
196225

197-
### Configuration
198-
199-
A new `service.kubernetes.io/topology-mode` annotation can be used to enable or
200-
disable Topology Aware Routing heuristics for a Service.
201-
202-
The previous `service.kubernetes.io/topology-aware-hints` annotation will
203-
continue to be supported as a means of configuring this feature for both "Auto"
204-
and "Disabled" values. New values will only be supported by the new annotation.
205-
206-
### Interoperability
207-
208-
Topology hints will be ignored if the TopologyKeys field has at least one entry.
209-
This field is deprecated and will be removed soon.
210-
211-
Both ExternalTrafficPolicy and InternalTrafficPolicy will be given precedence
212-
over topology aware routing. For example, if `ExternalTrafficPolicy=Local` and
213-
topology was enabled, external traffic would be routed using the
214-
ExternalTrafficPolicy configuration while internal traffic would be routed with
215-
topology.
216-
217-
### Feature Gate
218-
219-
This functionality will be guarded by the `TopologyAwareHints` feature gate.
220-
This gate also interacts with 2 other feature gates:
221-
- It is dependent on the `ServiceTrafficPolicy` feature gate.
222-
- It is not compatible with the deprecated `ServiceTopology` feature gate.
223-
224226
### API
225227

226228
A new `EndpointHints` struct would be added to the `EndpointSlice.Endpoint`
@@ -271,6 +273,44 @@ Additionally we could easily expand this API to include support for region
271273
hints. Although it is unclear if either expansion will be necessary, the API is
272274
designed in a way to make expansions straightforward.
273275
276+
```
277+
278+
+---------------------------------- IMPORTANT -------------------------------------+
279+
| |
280+
| NOTE: The remaining design proposals described in this KEP will not graduate to |
281+
| GA. For more information, see the scope reduction details a the beginning of the |
282+
| KEP. |
283+
| |
284+
+----------------------------------------------------------------------------------+
285+
286+
```
287+
### Configuration
288+
289+
A new `service.kubernetes.io/topology-mode` annotation can be used to enable or
290+
disable Topology Aware Routing heuristics for a Service.
291+
292+
The previous `service.kubernetes.io/topology-aware-hints` annotation will
293+
continue to be supported as a means of configuring this feature for both "Auto"
294+
and "Disabled" values. New values will only be supported by the new annotation.
295+
296+
### Interoperability
297+
298+
Topology hints will be ignored if the TopologyKeys field has at least one entry.
299+
This field is deprecated and will be removed soon.
300+
301+
Both ExternalTrafficPolicy and InternalTrafficPolicy will be given precedence
302+
over topology aware routing. For example, if `ExternalTrafficPolicy=Local` and
303+
topology was enabled, external traffic would be routed using the
304+
ExternalTrafficPolicy configuration while internal traffic would be routed with
305+
topology.
306+
307+
### Feature Gate
308+
309+
This functionality will be guarded by the `TopologyAwareHints` feature gate.
310+
This gate also interacts with 2 other feature gates:
311+
- It is dependent on the `ServiceTrafficPolicy` feature gate.
312+
- It is not compatible with the deprecated `ServiceTopology` feature gate.
313+
274314
### Kube-Proxy
275315

276316
When the `TopologyAwareHints` feature gate is enabled, Kube-Proxy will be
@@ -590,13 +630,19 @@ completeness.
590630
- Tests expanded to include e2e coverage described above.
591631

592632
**GA:**
593-
- Feedback from real world usage shows that feature is working as intended
594-
- Events are triggered on each Service to provide users with clear information
595-
on when the feature transitioned between enabled and disabled states.
633+
- Feedback from real world usage shows that feature is working as intended (i.e., the `hints` field is functioning correctly).
596634
- Test coverage in EndpointSlice strategy to ensure that the Hints field is
597635
dropped when the feature gate is not enabled.
598636
- Test coverage in EndpointSlice controller for the transition from enabled to
599637
disabled.
638+
639+
**[Deprecated] GA:**
640+
641+
The following points were originally considered for GA but are *not* part of
642+
this KEP's GA release (see [scope reduction](#important-scope-reduction-feb-2025)):
643+
644+
- Events are triggered on each Service to provide users with clear information
645+
on when the feature transitioned between enabled and disabled states.
600646
- Ensure that existing Topology Hints e2e test runs as a presubmit if any code
601647
changes in kube-proxy or the EndpointSlice controller.
602648
- Autoscaling and Scheduling SIGs have a plan to provide zone aware autoscaling
@@ -655,8 +701,9 @@ enabled even if the annotation has been set on the Service.
655701
Tests.)](https://github.com/kubernetes/kubernetes/blob/468ce5918377ab4d4e3180b4fd33fdd2bdb16ec9/pkg/controller/endpointslice/reconciler_test.go#L1641-L1907)
656702
* Hints field is dropped when feature gate is off. [(Strategy Unit
657703
Tests.)](https://github.com/kubernetes/kubernetes/blob/468ce5918377ab4d4e3180b4fd33fdd2bdb16ec9/pkg/registry/discovery/endpointslice/strategy_test.go)
658-
* TODO before GA: Test coverage in EndpointSlice controller for the transition
659-
from enabled to disabled.
704+
* Manual testing of feature gate enabling, disabling, upgrades, and rollbacks
705+
was conducted, as detailed in the "Were upgrade and rollback tested? Was the
706+
upgrade->downgrade->upgrade path tested?" section.
660707

661708
### Rollout, Upgrade and Rollback Planning
662709

@@ -673,10 +720,91 @@ enabled even if the annotation has been set on the Service.
673720
with before the feature was enabled.
674721

675722
* **Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested?**
676-
Per-Service enablement/disablement is covered in depth and feature gate
677-
enablement and disablement will be covered before the feature graduates to GA.
678-
In addition, manual testing covering combinations of
679-
upgrade->downgrade->upgrade cycles will be completed prior to GA graduation.
723+
724+
The `TopologyAwareHints` feature and the corresponding feature-gate has existed
725+
since k8s v1.21, with the feature being enabled by default since k8s 1.24 (~3
726+
years ago). That is one useful data point showing that there have not been any
727+
issues with `TopologyAwareHints` and the upgrade/rollback stories.
728+
729+
In addition, manual testing was performed using the following steps:
730+
731+
1. Create a v1.21.1 Kind cluster with the `TopologyAwareHints` feature-gate.
732+
733+
```bash
734+
kind create cluster --name=topology-hints --config=<(cat <<EOF
735+
kind: Cluster
736+
apiVersion: kind.x-k8s.io/v1alpha4
737+
featureGates:
738+
TopologyAwareHints: true
739+
nodes:
740+
- role: control-plane
741+
image: kindest/node:v1.21.1
742+
- role: worker
743+
image: kindest/node:v1.21.1
744+
EOF
745+
)
746+
```
747+
748+
2. Create an EndpointSlice within the `Hints` field configured:
749+
750+
```bash
751+
cat <<EOF | kubectl apply -f -
752+
apiVersion: discovery.k8s.io/v1
753+
kind: EndpointSlice
754+
metadata:
755+
name: topology-hints
756+
addressType: IPv4
757+
ports:
758+
- name: http
759+
protocol: TCP
760+
port: 80
761+
endpoints:
762+
- addresses:
763+
- "10.0.0.1"
764+
hints:
765+
forZones:
766+
- name: "zone-a"
767+
EOF
768+
```
769+
770+
3. Verify that the EndpointSlice was created successfully and has the `Hints`
771+
field populated.
772+
773+
```bash
774+
kubectl get endpointslice topology-hints -o yaml
775+
```
776+
777+
4. Rollback kube-apiserver to v1.20.0 (which has `TopologyAwareHints` feature
778+
gate disabled by default)
779+
780+
```bash
781+
docker exec -it topology-hints-control-plane /bin/bash
782+
783+
# Edit file /etc/kubernetes/manifests/kube-apiserver.yaml, remove feature flag
784+
# and downgrade image to v1.20.0
785+
```
786+
787+
5. Verify that the endpointslice is still there but no longer has the `Hints` field:
788+
789+
```bash
790+
kubectl get endpointslice topology-hints -o yaml
791+
```
792+
793+
6. Rollback kube-apiserver to v1.21.1 and re-enable `TopologyAwareHints` feature-gate.
794+
795+
```bash
796+
docker exec -it topology-hints-control-plane /bin/bash
797+
798+
# Edit file /etc/kubernetes/manifests/kube-apiserver.yaml, add feature flag and
799+
# upgrade image to v1.21.1
800+
```
801+
802+
7. Verify that the EndpointSlice has the `Hints` field visible again (since it
803+
was persisted in etcd).
804+
805+
```bash
806+
kubectl get endpointslice topology-hints -o yaml
807+
```
680808

681809
* **Is the rollout accompanied by any deprecations and/or removals of features,
682810
APIs, fields of API types, flags, etc.?**
@@ -689,6 +817,14 @@ enabled even if the annotation has been set on the Service.
689817
If the `endpointslices_changed_per_sync` metric has a non-zero value for the
690818
`auto` approach, this feature is in use.
691819

820+
* **How can someone using this feature know that it is working for their
821+
instance?**
822+
823+
With the new [reduced scope](#important-scope-reduction-feb-2025), the part
824+
being classified as "having graduated to GA" only involves an API field
825+
addition. Users can verify its functionality by describing an EndpointSlice
826+
and checking if the `Hints` field is configured.
827+
692828
* **What are the SLIs (Service Level Indicators) an operator can use to
693829
determine the health of the service?**
694830
- [x] Metrics
@@ -753,6 +889,11 @@ enabled even if the annotation has been set on the Service.
753889
(specifically the EndpointSlice controller). Profiling will be performed to
754890
ensure that this increase is minimal.
755891

892+
* **Can enabling / using this feature result in resource exhaustion of some node
893+
resources (PIDs, sockets, inodes, etc.)?**
894+
895+
No.
896+
756897
### Troubleshooting
757898

758899
* **How does this feature react if the API server and/or etcd is unavailable?**
@@ -776,6 +917,7 @@ enabled even if the annotation has been set on the Service.
776917
- Alpha release: Kubernetes 1.21
777918
- Beta Release: Kubernetes 1.23[^1]
778919
- Feature Gate on-by default, feature available by default: 1.24
920+
- KEP Graduates to GA in 1.33 with [reduced scope](#important-scope-reduction-feb-2025)
779921

780922
[^1]: This was intended to also flip the feature gate to enabled by default, but
781923
unfortunately that part was missed in 1.23. See

keps/sig-network/2433-topology-aware-hints/kep.yaml

Lines changed: 7 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -2,12 +2,14 @@ title: Topology Aware Hints
22
kep-number: 2433
33
authors:
44
- "@robscott"
5+
- "@gauravkghildiyal"
56
owning-sig: sig-network
67
status: implementable
78
creation-date: 2021-02-04
89
reviewers:
910
- "@andrewsykim"
1011
- "@bowei"
12+
- "@danwinship"
1113
- "@dcbw"
1214
- "@thockin"
1315
approvers:
@@ -19,22 +21,24 @@ see-also:
1921
- "github.com/kubernetes/enhancements/blob/master/keps/sig-network/2004-topology-aware-subsetting"
2022
- "github.com/kubernetes/enhancements/blob/master/keps/sig-network/2030-topology-aware-proxying"
2123
- "github.com/kubernetes/enhancements/blob/master/keps/sig-network/2086-service-internal-traffic-policy"
24+
- "github.com/kubernetes/enhancements/tree/master/keps/sig-network/4444-service-traffic-distribution"
25+
- "https://github.com/kubernetes/enhancements/issues/3015"
2226
replaces:
2327
- "github.com/kubernetes/enhancements/tree/master/keps/sig-network/536-topology-aware-routing"
2428

2529
# The target maturity stage in the current dev cycle for this KEP.
26-
stage: beta
30+
stage: stable
2731

2832
# The most recent milestone for which work toward delivery of this KEP has been
2933
# done. This can be the current (upcoming) milestone, if it is being actively
3034
# worked on.
31-
latest-milestone: "v1.29"
35+
latest-milestone: "v1.33"
3236

3337
# The milestone at which this feature was, or is targeted to be, at each stage.
3438
milestone:
3539
alpha: "v1.21"
3640
beta: "v1.23"
37-
stable: "v1.30"
41+
stable: "v1.33"
3842

3943
# The following PRR answers are required at alpha release
4044
# List the feature gate name and the components for which it must be enabled

0 commit comments

Comments
 (0)