Skip to content

Commit 2b32b0c

Browse files
authored
Merge pull request kubernetes#1944 from andrewsykim/service-internal-traffic-policy
KEP-2086: Service Internal Traffic Policy
2 parents 56f1ae1 + 6d74634 commit 2b32b0c

File tree

2 files changed

+395
-0
lines changed

2 files changed

+395
-0
lines changed
Lines changed: 354 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,354 @@
1+
# KEP-2086: Service Internal Traffic Policy
2+
3+
<!-- toc -->
4+
- [Release Signoff Checklist](#release-signoff-checklist)
5+
- [Summary](#summary)
6+
- [Motivation](#motivation)
7+
- [Goals](#goals)
8+
- [Non-Goals](#non-goals)
9+
- [Proposal](#proposal)
10+
- [User Stories (Optional)](#user-stories-optional)
11+
- [Story 1](#story-1)
12+
- [Story 2](#story-2)
13+
- [Risks and Mitigations](#risks-and-mitigations)
14+
- [Design Details](#design-details)
15+
- [Test Plan](#test-plan)
16+
- [Graduation Criteria](#graduation-criteria)
17+
- [Upgrade / Downgrade Strategy](#upgrade--downgrade-strategy)
18+
- [Version Skew Strategy](#version-skew-strategy)
19+
- [Production Readiness Review Questionnaire](#production-readiness-review-questionnaire)
20+
- [Feature Enablement and Rollback](#feature-enablement-and-rollback)
21+
- [Rollout, Upgrade and Rollback Planning](#rollout-upgrade-and-rollback-planning)
22+
- [Monitoring Requirements](#monitoring-requirements)
23+
- [Dependencies](#dependencies)
24+
- [Scalability](#scalability)
25+
- [Troubleshooting](#troubleshooting)
26+
- [Implementation History](#implementation-history)
27+
- [Drawbacks](#drawbacks)
28+
- [Alternatives](#alternatives)
29+
- [EndpointSlice Subsetting](#endpointslice-subsetting)
30+
- [Bool Field For Node Local](#bool-field-for-node-local)
31+
<!-- /toc -->
32+
33+
## Release Signoff Checklist
34+
35+
Items marked with (R) are required *prior to targeting to a milestone / release*.
36+
37+
- [X] (R) Enhancement issue in release milestone, which links to KEP dir in [kubernetes/enhancements] (not the initial KEP PR)
38+
- [ ] (R) KEP approvers have approved the KEP status as `implementable`
39+
- [ ] (R) Design details are appropriately documented
40+
- [ ] (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input
41+
- [ ] (R) Graduation criteria is in place
42+
- [ ] (R) Production readiness review completed
43+
- [ ] Production readiness review approved
44+
- [ ] "Implementation History" section is up-to-date for milestone
45+
- [ ] User-facing documentation has been created in [kubernetes/website], for publication to [kubernetes.io]
46+
- [ ] Supporting documentation—e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes
47+
48+
<!--
49+
**Note:** This checklist is iterative and should be reviewed and updated every time this enhancement is being considered for a milestone.
50+
-->
51+
52+
[kubernetes.io]: https://kubernetes.io/
53+
[kubernetes/enhancements]: https://git.k8s.io/enhancements
54+
[kubernetes/kubernetes]: https://git.k8s.io/kubernetes
55+
[kubernetes/website]: https://git.k8s.io/website
56+
57+
## Summary
58+
59+
Add a new field `spec.internalTrafficPolicy` to Service that allows node-local routing for Service internal traffic.
60+
61+
## Motivation
62+
63+
Internal traffic routed to a Service is not topology aware today. The [Topolgoy Aware Subsetting](/keps/sig-network/2004-topology-aware-subsetting)
64+
KEP addresses topology aware routing for Services by subsetting endpoints to dedicated EndpointSlices.
65+
While this approach works for the standard zone/region topologies, it wouldn't work for node level
66+
topologies since that would require an EndpointSlice per node. In larger clusters this wouldn't scale well.
67+
68+
This KEP proposes a new field in Service to treat node-local topologies as a first class concept in Service similar
69+
to `externalTrafficPolicy`. This addresses the node-local use-case for Service while avoiding EndpointSlice
70+
subsetting per node.
71+
72+
### Goals
73+
74+
* Allow internal Service traffic to be routed to node-local endpoints.
75+
* Default behavior for internal Service traffic should not change.
76+
77+
### Non-Goals
78+
79+
* Topology aware routing for zone/region topologies.
80+
81+
## Proposal
82+
83+
Introduce a new field in Service `spec.internalTrafficPolicy`. The field will have 3 codified values:
84+
1. Cluster (default): route to all cluster-wide endpoints (or use topology aware subsetting if enabled).
85+
2. PreferLocal: route to node-local endpoints if it exists, otherwise fallback to behavior from Cluster.
86+
3. Local: only route to node-local endpoints, drop otherwise.
87+
88+
A feature gate `ServiceInternalTrafficPolicy` will also be introduced for the alpha stage of this feature.
89+
The `internalTrafficPolicy` field cannot be set on Service during the alpha stage unless the feature gate is enabled.
90+
During the Beta stage, the feature gate will be on by default.
91+
92+
The `internalTrafficPolicy` field will not apply for headless Services or Services of type `ExternalName`.
93+
94+
### User Stories (Optional)
95+
96+
#### Story 1
97+
98+
As an application owner, I would like traffic to cluster DNS servers to always prefer local endpoints to reduce
99+
latency in my application.
100+
101+
#### Story 2
102+
103+
As a platform owner, I want to create a Service that always directs traffic to a logging daemon on the same node.
104+
Traffic should never bounce to a daemon on another node.
105+
106+
### Risks and Mitigations
107+
108+
* When the `Local` policy is set, it is the user's responsibility to ensure node-local endpoints are ready, otherwise traffic will be dropped.
109+
* Using the `Local` or `PreferLocal` policy may result in imbalanced traffic for pods in a Service. It is the user's responsibility to handle this.
110+
111+
## Design Details
112+
113+
Proposed addition to core v1 API:
114+
```go
115+
type ServiceInternalTrafficPolicyType string
116+
117+
const (
118+
ServiceInternalTrafficPolicyTypeCluster ServiceInternalTrafficPolicyType = "Cluster"
119+
ServiceInternalTrafficPolicyTypePreferLocal ServiceInternalTrafficPolicyType = "PreferLocal"
120+
ServiceInternalTrafficPolicyTypeLocal ServiceInternalTrafficPolicyType = "Local"
121+
)
122+
123+
// ServiceSpec describes the attributes that a user creates on a service.
124+
type ServiceSpec struct {
125+
...
126+
...
127+
128+
// internalTrafficPolicy denotes if the internal traffic for a Service should route
129+
// to cluster-wide endpoints or node-local endpoints. "Cluster" routes internal traffic
130+
// to a Service to all cluster-wide endpoints. "PreferLocal" will route internal traffic
131+
// to node-local endpoints if one exists, otherwise it will fallback to the same behavior
132+
// as "Cluster". "Local" routes traffic to node-local endpoints only, traffic is dropped
133+
// if no node-local endpoints are ready.
134+
InternalTrafficPolicy ServiceInternalTrafficPolicyType `json:"internalTrafficPolicy,omitempty"`
135+
}
136+
```
137+
138+
Proposed changes to kube-proxy:
139+
* when `internalTrafficPolicy=Cluster`, default to existing behavior today.
140+
* when `internalTrafficPolicy=PreferLocal`, route to endpoints in EndpointSlice that matches the local node's topology (topology defined by `kubernetes.io/hostname`),
141+
fall back to "Cluster" behavior if there are no local endpoints.
142+
* when `internalTrafficPolicy=Local`, route to endpoints in EndpointSlice that maches the local node's topology, drop traffic if none exist.
143+
144+
### Test Plan
145+
146+
Unit tests:
147+
* unit tests validating API strategy/validation for when `internalTrafficPolicy` is set on Service.
148+
* unit tests exercising kube-proxy behavior when `internalTrafficPolicy` is set to all possible values.
149+
150+
E2E test:
151+
* e2e tests validating default behavior with kube-proxy did not change when `internalTrafficPolicy` defaults to `Cluster`. Existing tests should cover this.
152+
* e2e tests validating that traffic is preferred to local endpoints when `internalTrafficPolicy` is set to `PreferLocal`.
153+
* e2e tests validating that traffic is only sent to node-local endpoints when `internalTrafficPolicy` is set to `Local`.
154+
155+
### Graduation Criteria
156+
157+
Alpha:
158+
* feature gate `ServiceInternalTrafficPolicy` _must_ be enabled for apiserver to accept values for `spec.internalTrafficPolicy`. Otherwise field is dropped.
159+
* kube-proxy handles traffic routing for 3 initial internal traffic policies `Cluster`, `PreferLocal` and `Local`.
160+
* Unit tests as defined in "Test Plan" section above. E2E tests are nice to have but not required for Alpha.
161+
162+
163+
### Upgrade / Downgrade Strategy
164+
165+
* The `internalTrafficPolicy` field will be off by default during the alpha stage but can handle any existing Services that has the field already set.
166+
This ensures n-1 apiservers can handle the new field on downgrade.
167+
* On upgrade, if the feature gate is enabled there should be no changes in the behavior since the default value for `internalTrafficPolicy` is `Cluster`.
168+
169+
### Version Skew Strategy
170+
171+
Since this feature will be alpha for at least 1 release, an n-1 kube-proxy should handle enablement of this feature if a new apiserver enabled it.
172+
173+
## Production Readiness Review Questionnaire
174+
175+
### Feature Enablement and Rollback
176+
177+
_This section must be completed when targeting alpha to a release._
178+
179+
* **How can this feature be enabled / disabled in a live cluster?**
180+
- [X] Feature gate (also fill in values in `kep.yaml`)
181+
- Feature gate name: `ServiceInternalTrafficPolicy`
182+
- Components depending on the feature gate: kube-apiserver, kube-proxy
183+
- [ ] Other
184+
- Describe the mechanism:
185+
- Will enabling / disabling the feature require downtime of the control
186+
plane?
187+
- Will enabling / disabling the feature require downtime or reprovisioning
188+
of a node? (Do not assume `Dynamic Kubelet Config` feature is enabled).
189+
190+
* **Does enabling the feature change any default behavior?**
191+
192+
No, enabling the feature does not change any default behavior since the default value of `internalTrafficPolicy` is `Cluster`.
193+
194+
* **Can the feature be disabled once it has been enabled (i.e. can we roll back
195+
the enablement)?**
196+
197+
Yes, the feature gate can be disabled, but Service resource that have set the new field will persist that field unless unset by the user.
198+
199+
* **What happens if we reenable the feature if it was previously rolled back?**
200+
201+
New Services should be able to set the `internalTrafficPolicy` field. Existing Services that have the field set already should not be impacted.
202+
203+
* **Are there any tests for feature enablement/disablement?**
204+
205+
There will be unit tests to verify that apiserver will drop the field when the `ServiceInternalTrafficPolicy` feature gate is disabled.
206+
207+
### Rollout, Upgrade and Rollback Planning
208+
209+
_This section must be completed when targeting beta graduation to a release._
210+
211+
* **How can a rollout fail? Can it impact already running workloads?**
212+
213+
TBD for beta.
214+
215+
* **What specific metrics should inform a rollback?**
216+
217+
TBD for beta.
218+
219+
* **Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested?**
220+
221+
TBD for beta.
222+
223+
* **Is the rollout accompanied by any deprecations and/or removals of features, APIs,
224+
fields of API types, flags, etc.?**
225+
226+
TBD for beta.
227+
228+
### Monitoring Requirements
229+
230+
_This section must be completed when targeting beta graduation to a release._
231+
232+
* **How can an operator determine if the feature is in use by workloads?**
233+
234+
TBD for beta.
235+
236+
* **What are the SLIs (Service Level Indicators) an operator can use to determine
237+
the health of the service?**
238+
239+
TBD for beta.
240+
241+
* **What are the reasonable SLOs (Service Level Objectives) for the above SLIs?**
242+
243+
TBD for beta.
244+
245+
* **Are there any missing metrics that would be useful to have to improve observability
246+
of this feature?**
247+
248+
TBD for beta.
249+
250+
### Dependencies
251+
252+
_This section must be completed when targeting beta graduation to a release._
253+
254+
* **Does this feature depend on any specific services running in the cluster?**
255+
Think about both cluster-level services (e.g. metrics-server) as well
256+
as node-level agents (e.g. specific version of CRI). Focus on external or
257+
optional services that are needed. For example, if this feature depends on
258+
a cloud provider API, or upon an external software-defined storage or network
259+
control plane.
260+
261+
TBD for beta.
262+
263+
264+
### Scalability
265+
266+
_For alpha, this section is encouraged: reviewers should consider these questions
267+
and attempt to answer them._
268+
269+
_For beta, this section is required: reviewers must answer these questions._
270+
271+
_For GA, this section is required: approvers should be able to confirm the
272+
previous answers based on experience in the field._
273+
274+
* **Will enabling / using this feature result in any new API calls?**
275+
276+
No, since this is a user-defined field in Service. No extra calls will be required
277+
from EndpointSlice as well since topology information is already stored there.
278+
279+
* **Will enabling / using this feature result in introducing new API types?**
280+
281+
No API types are introduced, only a new field in Service.
282+
283+
* **Will enabling / using this feature result in any new calls to the cloud
284+
provider?**
285+
286+
No
287+
288+
* **Will enabling / using this feature result in increasing size or count of
289+
the existing API objects?**
290+
291+
This feature will (negligibly) increase the size of Service by adding a single field.
292+
293+
* **Will enabling / using this feature result in increasing time taken by any
294+
operations covered by [existing SLIs/SLOs]?**
295+
Think about adding additional work or introducing new steps in between
296+
(e.g. need to do X to start a container), etc. Please describe the details.
297+
298+
This feature may slightly increase kube-proxy's sync time for iptable / IPVS rules,
299+
since node topology must be calculated, but this is likely negligible given we
300+
already have many checks like this for `externalTrafficPolicy: Local`.
301+
302+
* **Will enabling / using this feature result in non-negligible increase of
303+
resource usage (CPU, RAM, disk, IO, ...) in any components?**
304+
305+
Any increase in CPU usage by kube-proxy to calculate node-local topology will likely
306+
be offset by reduced iptable rules it needs to sync when using `PreferLocal` or `Local`
307+
internal traffic policies.
308+
309+
### Troubleshooting
310+
311+
The Troubleshooting section currently serves the `Playbook` role. We may consider
312+
splitting it into a dedicated `Playbook` document (potentially with some monitoring
313+
details). For now, we leave it here.
314+
315+
_This section must be completed when targeting beta graduation to a release._
316+
317+
* **How does this feature react if the API server and/or etcd is unavailable?**
318+
319+
TBD for beta.
320+
321+
* **What are other known failure modes?**
322+
323+
TBD for beta.
324+
325+
* **What steps should be taken if SLOs are not being met to determine the problem?**
326+
327+
TBD for beta.
328+
329+
[supported limits]: https://git.k8s.io/community//sig-scalability/configs-and-limits/thresholds.md
330+
[existing SLIs/SLOs]: https://git.k8s.io/community/sig-scalability/slos/slos.md#kubernetes-slisslos
331+
332+
## Implementation History
333+
334+
2020-10-09: KEP approved as implementable in "alpha" stage.
335+
336+
## Drawbacks
337+
338+
Added complexity in the Service API and in kube-proxy to address node-local routing.
339+
This also pushes some responsibility on application owners to ensure pods are scheduled
340+
to work with node-local routing.
341+
342+
## Alternatives
343+
344+
### EndpointSlice Subsetting
345+
346+
EndpointSlice subsetting per node can address the node-local use-case, but this would not be very scalable
347+
for large clusters since that would require an EndpointSlice resource per node.
348+
349+
### Bool Field For Node Local
350+
351+
Instead of `internalTrafficPolicy` field with codified values, a bool field can be used to enable node-local routing.
352+
While this is simpler, it is not expressive enough for the `PreferLocal` use-case where traffic should ideally go
353+
to a local endpoint, but be routed somewhere else otherwise.
354+
Lines changed: 41 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,41 @@
1+
title: Service Internal Traffic Policy
2+
kep-number: 2086
3+
authors:
4+
- "@andrewsykim"
5+
owning-sig: sig-network
6+
participating-sigs:
7+
status: implementable
8+
creation-date: 2020-10-07
9+
reviewers:
10+
- "robscott"
11+
- "thockin"
12+
approvers:
13+
- "@thockin"
14+
prr-approvers:
15+
- "@johnbelamaric"
16+
see-also:
17+
- "/keps/sig-network/2004-topology-aware-subsetting"
18+
- "/keps/sig-network/20181024-service-topology.md"
19+
20+
# The target maturity stage in the current dev cycle for this KEP.
21+
stage: alpha
22+
23+
# The most recent milestone for which work toward delivery of this KEP has been
24+
# done. This can be the current (upcoming) milestone, if it is being actively
25+
# worked on.
26+
latest-milestone: "v1.21"
27+
28+
# The milestone at which this feature was, or is targeted to be, at each stage.
29+
milestone:
30+
alpha: "v1.21"
31+
beta: "v1.22"
32+
stable: "v1.23"
33+
34+
# The following PRR answers are required at alpha release
35+
# List the feature gate name and the components for which it must be enabled
36+
feature-gates:
37+
- name: ServiceInternalTrafficPolicy
38+
components:
39+
- kube-apiserver
40+
- kube-proxy
41+
disable-supported: true

0 commit comments

Comments
 (0)