Skip to content

Commit 1aa5bb6

Browse files
KEP-4444: Graduate Service Traffic Distribution to GA. Clarify the definition of PreferClose.
1 parent e0fc784 commit 1aa5bb6

File tree

3 files changed

+104
-41
lines changed

3 files changed

+104
-41
lines changed

keps/prod-readiness/sig-network/4444.yaml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,3 +3,5 @@ alpha:
33
approver: "@johnbelamaric"
44
beta:
55
approver: "@johnbelamaric"
6+
stable:
7+
approver: "@johnbelamaric"

keps/sig-network/4444-service-traffic-distribution/README.md

Lines changed: 96 additions & 36 deletions
Original file line numberDiff line numberDiff line change
@@ -29,6 +29,8 @@
2929
- [e2e tests](#e2e-tests)
3030
- [Graduation Criteria](#graduation-criteria)
3131
- [Alpha](#alpha)
32+
- [Beta](#beta)
33+
- [GA](#ga)
3234
- [Upgrade / Downgrade Strategy](#upgrade--downgrade-strategy)
3335
- [Version Skew Strategy](#version-skew-strategy)
3436
- [Possible future expansions](#possible-future-expansions)
@@ -45,6 +47,7 @@
4547
- [Implementation History](#implementation-history)
4648
- [Drawbacks](#drawbacks)
4749
- [Alternatives](#alternatives)
50+
- [An alternative definition of <code>PreferClose</code>](#an-alternative-definition-of-preferclose)
4851
- [Repurpose the existing topology annotation to recognize additional values](#repurpose-the-existing-topology-annotation-to-recognize-additional-values)
4952
- [Reuse the fields internal/externalTrafficPolicy to offer these routing preferences](#reuse-the-fields-internalexternaltrafficpolicy-to-offer-these-routing-preferences)
5053
- [Granular Routing Controls](#granular-routing-controls)
@@ -70,20 +73,20 @@ checklist items _must_ be updated for the enhancement to be released.
7073

7174
Items marked with (R) are required *prior to targeting to a milestone / release*.
7275

73-
- [ ] (R) Enhancement issue in release milestone, which links to KEP dir in [kubernetes/enhancements] (not the initial KEP PR)
74-
- [ ] (R) KEP approvers have approved the KEP status as `implementable`
75-
- [ ] (R) Design details are appropriately documented
76-
- [ ] (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input (including test refactors)
77-
- [ ] e2e Tests for all Beta API Operations (endpoints)
78-
- [ ] (R) Ensure GA e2e tests meet requirements for [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md)
79-
- [ ] (R) Minimum Two Week Window for GA e2e tests to prove flake free
80-
- [ ] (R) Graduation criteria is in place
81-
- [ ] (R) [all GA Endpoints](https://github.com/kubernetes/community/pull/1806) must be hit by [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md)
82-
- [ ] (R) Production readiness review completed
83-
- [ ] (R) Production readiness review approved
84-
- [ ] "Implementation History" section is up-to-date for milestone
85-
- [ ] User-facing documentation has been created in [kubernetes/website], for publication to [kubernetes.io]
86-
- [ ] Supporting documentation—e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes
76+
- [X] (R) Enhancement issue in release milestone, which links to KEP dir in [kubernetes/enhancements] (not the initial KEP PR)
77+
- [X] (R) KEP approvers have approved the KEP status as `implementable`
78+
- [X] (R) Design details are appropriately documented
79+
- [X] (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input (including test refactors)
80+
- [X] e2e Tests for all Beta API Operations (endpoints)
81+
- [X] (R) Ensure GA e2e tests meet requirements for [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md)
82+
- [X] (R) Minimum Two Week Window for GA e2e tests to prove flake free
83+
- [X] (R) Graduation criteria is in place
84+
- [X] (R) [all GA Endpoints](https://github.com/kubernetes/community/pull/1806) must be hit by [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md)
85+
- [X] (R) Production readiness review completed
86+
- [X] (R) Production readiness review approved
87+
- [X] "Implementation History" section is up-to-date for milestone
88+
- [X] User-facing documentation has been created in [kubernetes/website], for publication to [kubernetes.io]
89+
- [X] Supporting documentation—e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes
8790

8891
<!--
8992
**Note:** This checklist is iterative and should be reviewed and updated every time this enhancement is being considered for a milestone.
@@ -196,7 +199,7 @@ such a preference in future refinements.
196199
* **Immediate Support for All Possible Heuristics:** The initial implementation
197200
focuses on a core set of heuristics. Addition of new heuristics (like
198201
`Local` for Node local preference) could be explored in future
199-
refinements.
202+
refinements. (See https://kep.k8s.io/3015)
200203

201204
## Proposal
202205

@@ -206,11 +209,12 @@ while making routing decisions. It does not offer strict routing guarantees.
206209

207210
The field will support the following initial values:
208211

212+
* `PreferClose`: Indicates a preference for routing traffic to endpoints in
213+
the same zone as the client.
209214

210-
* `PreferClose`: Indicates a preference for routing traffic to endpoints that
211-
are topologically proximate to the client. The interpretation of
212-
"topologically proximate" may vary across implementations and could encompass
213-
endpoints within the same node, rack, zone, or even region.
215+
(For background on the name `PreferClose` and its definition, see the "[An
216+
alternative definition of
217+
PreferClose](#an-alternative-definition-of-preferclose)" section)
214218

215219
The absence of a value indicates no specific routing preference. In this case,
216220
the user delegates the routing decision to the implementation, allowing it to
@@ -220,17 +224,6 @@ Implementations SHOULD support the standard values. While some flexibility in
220224
interpretation is permitted, implementations should aim to align their behavior
221225
with the described intent of these preferences as closely as possible.
222226

223-
NOTE: Implementations reserve the right to refine the behavior associated with
224-
any heuristic, including standard heuristics. This means the behavior enabled
225-
by values such as `PreferClose` might evolve over time, and some
226-
evolutions might interpret the heuristic goals slightly differently. For
227-
example, in the case of `PreferClose`, an implementation might initially route
228-
traffic within the zone without considering endpoint overload, while a future
229-
refinement could introduce feedback mechanisms to detect overload and route
230-
traffic outside the zone when necessary, optimizing overall performance. The
231-
decision of what constitutes an "improvement" remains at the discretion of the
232-
implementation.
233-
234227
### User Stories
235228

236229
#### Story 1
@@ -363,8 +356,6 @@ NOTE: The expectation remains that *all* endpoints within an EndpointSlice must
363356
Aware
364357
Hints](https://github.com/kubernetes/enhancements/blob/master/keps/sig-network/2433-topology-aware-hints/README.md#kube-proxy), i.e. _"This is to provide safer transitions between enabled and disabled states. Without this fallback, endpoints could easily get overloaded as hints were being added or removed from some EndpointSlices but had not yet propagated to all of them."_
365358

366-
<<[UNRESOLVED Name for the field is being discussed]>>
367-
368359
### Choice of field name
369360
The name `trafficDistribution` is meant to capture the highly
370361
implementation-specific nature of this field and how it affects the routing of
@@ -387,8 +378,6 @@ traffic
387378
so as not to confuse with the actual process of selecting the complete set of
388379
pods backing a service.
389380

390-
<<[/UNRESOLVED]>>
391-
392381
### Intersection with internal/externalTrafficPolicy
393382

394383
The intersection of the field with `internalTrafficPolicy` and
@@ -462,6 +451,10 @@ The following packages will also see minor changes:
462451
and field `trafficDistribution: PreferClose` are configured, precedence is given to
463452
the annotation.
464453

454+
Link to tests: https://github.com/kubernetes/kubernetes/blob/69ab91a5c59617872c9f48737c64409a9dec2957/test/integration/service/service_test.go#L292-L652
455+
456+
Link to k8s-triage: https://storage.googleapis.com/k8s-triage/index.html?sig=network&test=service_test
457+
465458
##### e2e tests
466459

467460
* Verify that EndpointSlice hints are correctly populated when
@@ -471,12 +464,44 @@ The following packages will also see minor changes:
471464
requests originating from zones with no service pods, requests should not get
472465
blackholed and should rather be forwarded to any service pod from the cluster.
473466

467+
Testgrid: https://testgrid.k8s.io/sig-network-kind#pr-sig-network-kind,%20multizone&include-filter-by-regex=Traffic%20Distribution
468+
474469
### Graduation Criteria
475470

476471
#### Alpha
477472

478-
- Feature implemented behind a feature gate
479-
- Initial e2e tests completed and enabled
473+
- Feature implemented behind a feature gate.
474+
- Initial e2e tests completed and enabled.
475+
476+
### Beta
477+
478+
- Gather feedback from developers.
479+
- Additional tests are in Testgrid and linked in KEP.
480+
481+
### GA
482+
483+
- Examples of real-world usage
484+
- Available in [GKE Alpha
485+
Clusters](https://cloud.google.com/kubernetes-engine/docs/concepts/alpha-clusters)
486+
since 1.30
487+
- Available in standard GKE clusters since 1.31:
488+
https://opensource.googleblog.com/2024/08/kubernetes-131-is-now-available-on-gke-one-week-after-open-source-release.html
489+
- Available in Cilium v1.16 onwards:
490+
https://isovalent.com/blog/post/cilium-1-16/#h-service-traffic-distribution
491+
- The feature was made alpha in k8s 1.30 and since it's beta (enabled by
492+
default) release in 1.31, the required number of two minor releases have
493+
passed (1.31 and 1.32).
494+
- Based on developer feedback, a separate KEP
495+
([KEP-3015](http://kep.k8s.io/3015)) will introduce a new `PreferSameNode`
496+
option and, to improve clarity, will also introduce `PreferSameZone` as a more
497+
precise alias for the existing `PreferClose` field. Because both
498+
`PreferSameNode` and the renaming to `PreferSameZone` require the standard
499+
Alpha/Beta/GA graduation process (including new feature gates), handling them
500+
in a separate KEP allows the `trafficDistribution` field and its initial
501+
`PreferClose` option to reach GA status independently. This is important for
502+
users who can benefit from the existing functionality now, as waiting for
503+
`PreferSameNode` and `PreferSameZone` to reach GA would delay general
504+
availability of `trafficDistribution` until at least 1.35.
480505

481506
### Upgrade / Downgrade Strategy
482507

@@ -965,6 +990,8 @@ In terms of mitigation, there are several options:
965990
- Changes released in alpha as part of Kubernetes 1.30
966991
- KEP updated to rename field names with the choices made during implementation.
967992
- KEP updated with PRR sections filled, targeting beta release in Kubernetes 1.31
993+
- [Feb 2025] KEP is updated to have a more precise definition of `PreferClose`.
994+
KEP is targeting graduation to GA in 1.33
968995

969996
## Drawbacks
970997

@@ -974,6 +1001,39 @@ Why should this KEP _not_ be implemented?
9741001

9751002
## Alternatives
9761003

1004+
### An alternative definition of `PreferClose`
1005+
1006+
A previous iteration of this KEP defined `PreferClose` as follows:
1007+
1008+
>PreferClose: Indicates a preference for routing traffic to endpoints that are
1009+
>topologically proximate to the client. The interpretation of "topologically
1010+
>proximate" may vary across implementations and could encompass endpoints within
1011+
>the same node, rack, zone, or even region.
1012+
1013+
This open-ended definition aimed to accommodate both simple implementations
1014+
(like kube-proxy, initially interpreting it as "prefer same zone") and more
1015+
sophisticated ones (potentially offering "prefer same node with load-based
1016+
fallback").
1017+
1018+
However, this flexibility also introduced ambiguity. As discussions around
1019+
adding a "prefer-same-node" option in
1020+
[kubernetes/enhancements#4931](https://github.com/kubernetes/enhancements/pull/4931)
1021+
illustrated, the lack of a precise definition for `PreferClose` raised concerns
1022+
about overlapping behaviors and future extensibility. Having something like
1023+
`PreferSameNode` alongside `PreferClose` could lead to confusion about the
1024+
distinction.
1025+
1026+
To address this ambiguity and pave the way for future enhancements like
1027+
`PreferSameNode`, the meaning of `PreferClose` has been clarified and now
1028+
specifically means:
1029+
1030+
> PreferClose: Indicates a preference for routing traffic to endpoints in the
1031+
> same zone as the client.
1032+
1033+
A separate KEP ([KEP-3015](http://kep.k8s.io/3015)) will introduce
1034+
`PreferSameZone` as a more precise name for this functionality (while retaining
1035+
`PreferClose` for backward compatibility)
1036+
9771037
### Repurpose the existing topology annotation to recognize additional values
9781038

9791039
The historical reason for having a topology annotation instead of a field was

keps/sig-network/4444-service-traffic-distribution/kep.yaml

Lines changed: 6 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -20,17 +20,18 @@ see-also:
2020
- "https://github.com/kubernetes/enhancements/tree/master/keps/sig-network/2086-service-internal-traffic-policy"
2121

2222
# The target maturity stage in the current dev cycle for this KEP.
23-
stage: beta
23+
stage: stable
2424

2525
# The most recent milestone for which work toward delivery of this KEP has been
2626
# done. This can be the current (upcoming) milestone, if it is being actively
2727
# worked on.
28-
latest-milestone: "v1.31"
28+
latest-milestone: "v1.33"
2929

3030
# The milestone at which this feature was, or is targeted to be, at each stage.
3131
milestone:
3232
alpha: "v1.30"
3333
beta: "v1.31"
34+
stable: "v1.33"
3435

3536
# The following PRR answers are required at alpha release
3637
# List the feature gate name and the components for which it must be enabled
@@ -42,6 +43,6 @@ feature-gates:
4243
- kube-apiserver
4344
disable-supported: true
4445

45-
# The following PRR answers are required at beta release
46-
# metrics:
47-
# - my_feature_metric
46+
metrics:
47+
- endpoint_slice_controller_services_count_by_traffic_distribution
48+
- endpoint_slice_controller_endpointslices_changed_per_sync

0 commit comments

Comments
 (0)