Skip to content

Commit 31521b6

Browse files
authored
Merge pull request kubernetes#2733 from andrewsykim/kep-2086
KEP-2086: update beta milestone for v1.22
2 parents 780c28f + a9763c3 commit 31521b6

File tree

3 files changed

+64
-63
lines changed

3 files changed

+64
-63
lines changed
Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,5 @@
11
kep-number: 2086
22
alpha:
33
approver: "@wojtek-t"
4+
beta:
5+
approver: "@wojtek-t"

keps/sig-network/2086-service-internal-traffic-policy/README.md

Lines changed: 59 additions & 60 deletions
Original file line numberDiff line numberDiff line change
@@ -56,7 +56,7 @@ Items marked with (R) are required *prior to targeting to a milestone / release*
5656

5757
## Summary
5858

59-
Add a new field `spec.trafficPolicy` to Service that allows node-local and topology-aware routing for Service traffic.
59+
Add a new field `spec.internalTrafficPolicy` to Service that allows node-local and topology-aware routing for Service traffic.
6060

6161
## Motivation
6262

@@ -76,17 +76,15 @@ for internal Service traffic.
7676

7777
## Proposal
7878

79-
Introduce a new field in Service `spec.trafficPolicy`. The field will have 4 codified values:
79+
Introduce a new field in Service `spec.internalTrafficPolicy`. The field will have 2 codified values:
8080
1. Cluster (default): route to all cluster-wide endpoints (or use topology aware subsetting if enabled).
81-
2. Topology: route to endpoints using topology-aware routing. See Topology Aware Hints KEP for more details.
82-
2. PreferLocal: route to node-local endpoints if it exists, otherwise fallback to behavior from Cluster.
83-
3. Local: only route to node-local endpoints, drop otherwise.
81+
2. Local: only route to node-local endpoints, drop otherwise.
8482

85-
A feature gate `ServiceTrafficPolicy` will also be introduced for the alpha stage of this feature.
86-
The `trafficPolicy` field cannot be set on Service during the alpha stage unless the feature gate is enabled.
83+
A feature gate `ServiceInternalTrafficPolicy` will also be introduced this feature.
84+
The `internalTrafficPolicy` field cannot be set on Service during the alpha stage unless the feature gate is enabled.
8785
During the Beta stage, the feature gate will be on by default.
8886

89-
The `trafficPolicy` field will not apply for headless Services or Services of type `ExternalName`.
87+
The `internalTrafficPolicy` field will not apply for headless Services or Services of type `ExternalName`.
9088

9189
### User Stories (Optional)
9290

@@ -103,7 +101,6 @@ Traffic should never bounce to a daemon on another node.
103101
### Risks and Mitigations
104102

105103
* When the `Local` policy is set, it is the user's responsibility to ensure node-local endpoints are ready, otherwise traffic will be dropped.
106-
* Using the `Local` or `PreferLocal` policy may result in imbalanced traffic for pods in a Service. It is the user's responsibility to handle this.
107104

108105
## Design Details
109106

@@ -113,8 +110,6 @@ type ServiceInternalTrafficPolicyType string
113110

114111
const (
115112
ServiceTrafficPolicyTypeCluster ServiceTrafficPolicyType = "Cluster"
116-
ServiceTrafficPolicyTypeTopology ServiceTrafficPolicyType = "Topology"
117-
ServiceTrafficPolicyTypePreferLocal ServiceTrafficPolicyType = "PreferLocal"
118113
ServiceTrafficPolicyTypeLocal ServiceTrafficPolicyType = "Local"
119114
)
120115

@@ -123,51 +118,54 @@ type ServiceSpec struct {
123118
...
124119
...
125120

126-
// trafficPolicy denotes if the traffic for a Service should route
127-
// to cluster-wide endpoints or node-local endpoints. "Cluster" routes traffic
128-
// to a Service to all cluster-wide endpoints. "Topology" routes traffic based on
129-
// topology hints. "PreferLocal" will route internal traffic to node-local endpoints
130-
// if one exists, otherwise it will fallback to the same behavior as "Cluster".
131-
// "Local" routes traffic to node-local endpoints only, traffic is dropped
132-
// if no node-local endpoints are ready. When externalTrafficPolicy is "Cluster",
133-
// traffic from external sources will be routed based on the trafficPolicy. When
134-
// externalTrafficPolicy is "Local", trafficPolicy is ignored for traffic from
135-
// external sources.
121+
// InternalTrafficPolicy specifies if the cluster internal traffic
122+
// should be routed to all endpoints or node-local endpoints only.
123+
// "Cluster" routes internal traffic to a Service to all endpoints.
124+
// "Local" routes traffic to node-local endpoints only, traffic is
125+
// dropped if no node-local endpoints are ready.
126+
// The default value is "Cluster".
127+
// +featureGate=ServiceInternalTrafficPolicy
136128
// +optional
137-
// +feature-gate=ServiceTrafficPolicy
138-
TrafficPolicy ServiceTrafficPolicyType `json:"trafficPolicy,omitempty"`
129+
InternalTrafficPolicy *ServiceInternalTrafficPolicyType `json:"internalTrafficPolicy,omitempty" protobuf:"bytes,22,opt,name=internalTrafficPolicy"`
139130
}
140131
```
141132

142-
This new field will intersect with externalTrafficPolicy in the following ways:
143-
* if `externalTrafficPolicy=Cluster`, traffic will be routed based on `trafficPolicy` for external sources
144-
* if `externalTrafficPolicy=Local`, `externalTrafficPolicy` will take precedent over `trafficPolicy`, but only for external sources.
133+
This field will be independent from externalTrafficPolicy. In other words, internalTrafficPolicy only applies to traffic originating from internal sources.
145134

146135
Proposed changes to kube-proxy:
147-
* when `trafficPolicy=Cluster`, default to existing behavior today.
148-
* when `trafficPolicy=Topology`, use topology hints from EndpointSlice API.
149-
* when `trafficPolicy=PreferLocal`, route to endpoints in EndpointSlice that matches the local node's topology (topology defined by `kubernetes.io/hostname`),
150-
fall back to "Cluster" behavior if there are no local endpoints.
151-
* when `trafficPolicy=Local`, route to endpoints in EndpointSlice that maches the local node's topology, drop traffic if none exist.
136+
* when `internalTrafficPolicy=Cluster`, default to existing behavior today.
137+
* when `internalTrafficPolicy=Local`, route to endpoints in EndpointSlice that maches the local node's topology, drop traffic if none exist.
138+
139+
Overlap with topology-aware routing:
140+
141+
| ExternalTrafficPolicy | InternalTrafficPolicy | Topology | External Result | Internal Result |
142+
| - | - | - | - | - |
143+
| - | - | Auto | Topology | Topology |
144+
| Local | - | Auto | Local | Topology |
145+
| Local | Local | Auto | Local | Local |
152146

153147
### Test Plan
154148

155149
Unit tests:
156-
* unit tests validating API strategy/validation for when `trafficPolicy` is set on Service.
157-
* unit tests exercising kube-proxy behavior when `trafficPolicy` is set to all possible values.
150+
* unit tests validating API strategy/validation for when `internalTrafficPolicy` is set on Service.
151+
* unit tests exercising kube-proxy behavior when `internalTrafficPolicy` is set to all possible values.
158152

159153
E2E test:
160-
* e2e tests validating default behavior with kube-proxy did not change when `trafficPolicy` defaults to `Cluster`. Existing tests should cover this.
161-
* e2e tests validating that traffic is preferred to local endpoints when `trafficPolicy` is set to `PreferLocal`.
162-
* e2e tests validating that traffic is only sent to node-local endpoints when `trafficPolicy` is set to `Local`.
154+
* e2e tests validating default behavior with kube-proxy did not change when `internalTrafficPolicy` defaults to `Cluster`. Existing tests should cover this.
155+
* e2e tests validating that traffic is only sent to node-local endpoints when `internalTrafficPolicy` is set to `Local`.
163156

164157
### Graduation Criteria
165158

166159
Alpha:
167-
* feature gate `ServiceTrafficPolicy` _must_ be enabled for apiserver to accept values for `spec.trafficPolicy`. Otherwise field is dropped.
168-
* kube-proxy handles traffic routing for 4 initial internal traffic policies `Cluster`, `Topology`, `PreferLocal` and `Local`.
160+
* feature gate `ServiceInternalTrafficPolicy` _must_ be enabled for apiserver to accept values for `spec.internalTrafficPolicy`. Otherwise field is dropped.
161+
* kube-proxy handles traffic routing for 2 initial internal traffic policies `Cluster`, and `Local`.
169162
* Unit tests as defined in "Test Plan" section above. E2E tests are nice to have but not required for Alpha.
170163

164+
Beta:
165+
* integration tests exercising API behavior for `spec.internalTrafficPolicy` field of Service.
166+
* e2e tests exercising kube-proxy routing when `internalTrafficPolicy` is `Local`.
167+
* feature gate `ServiceInternalTrafficPolicy` is enabled by default.
168+
* consensus on how internalTrafficPolicy overlaps with topology-aware routing.
171169

172170
### Upgrade / Downgrade Strategy
173171

@@ -187,18 +185,12 @@ _This section must be completed when targeting alpha to a release._
187185

188186
* **How can this feature be enabled / disabled in a live cluster?**
189187
- [X] Feature gate (also fill in values in `kep.yaml`)
190-
- Feature gate name: `ServiceTrafficPolicy`
188+
- Feature gate name: `ServiceInternalTrafficPolicy`
191189
- Components depending on the feature gate: kube-apiserver, kube-proxy
192-
- [ ] Other
193-
- Describe the mechanism:
194-
- Will enabling / disabling the feature require downtime of the control
195-
plane?
196-
- Will enabling / disabling the feature require downtime or reprovisioning
197-
of a node? (Do not assume `Dynamic Kubelet Config` feature is enabled).
198190

199191
* **Does enabling the feature change any default behavior?**
200192

201-
No, enabling the feature does not change any default behavior since the default value of `trafficPolicy` is `Cluster`.
193+
No, enabling the feature does not change any default behavior since the default value of `internalTrafficPolicy` is `Cluster`.
202194

203195
* **Can the feature be disabled once it has been enabled (i.e. can we roll back
204196
the enablement)?**
@@ -207,54 +199,57 @@ Yes, the feature gate can be disabled, but Service resource that have set the ne
207199

208200
* **What happens if we reenable the feature if it was previously rolled back?**
209201

210-
New Services should be able to set the `trafficPolicy` field. Existing Services that have the field set already should not be impacted.
202+
New Services should be able to set the `internalTrafficPolicy` field. Existing Services that have the field set will begin to apply the policy again.
211203

212204
* **Are there any tests for feature enablement/disablement?**
213205

214-
There will be unit tests to verify that apiserver will drop the field when the `ServiceTrafficPolicy` feature gate is disabled.
206+
There will be unit tests to verify that apiserver will drop the field when the `ServiceInternalTrafficPolicy` feature gate is disabled.
215207

216208
### Rollout, Upgrade and Rollback Planning
217209

218210
_This section must be completed when targeting beta graduation to a release._
219211

220212
* **How can a rollout fail? Can it impact already running workloads?**
221213

222-
TBD for beta.
214+
Rollout should have minimal impact because the default value of `internalTrafficPolicy` is `Cluster`, which is the default behavior today.
223215

224216
* **What specific metrics should inform a rollback?**
225217

226-
TBD for beta.
218+
Metrics representing Services being black-holed will be added. This metric can inform rollback.
227219

228220
* **Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested?**
229221

230-
TBD for beta.
222+
No, but this will be manually tested prior to beta. Automated testing will be done if the test tooling is available.
231223

232224
* **Is the rollout accompanied by any deprecations and/or removals of features, APIs,
233225
fields of API types, flags, etc.?**
234226

235-
TBD for beta.
227+
No.
236228

237229
### Monitoring Requirements
238230

239231
_This section must be completed when targeting beta graduation to a release._
240232

241233
* **How can an operator determine if the feature is in use by workloads?**
242234

243-
TBD for beta.
235+
* Check Service to see if `internalTrafficPolicy` is set to `Local`.
236+
* A per-node "blackhole" metric will be added to kube-proxy which represent Services that are being intentionally dropped (internalTrafficPolicy=Local and no endpoints).
237+
238+
TODO: add metric name once it's decided
244239

245240
* **What are the SLIs (Service Level Indicators) an operator can use to determine
246241
the health of the service?**
247242

248-
TBD for beta.
243+
They can check the "blackhole" metric when internalTrafficPolicy=Local and there are no endpoints.
249244

250245
* **What are the reasonable SLOs (Service Level Objectives) for the above SLIs?**
251246

252-
TBD for beta.
247+
This will depend on Service topology and whether `internalTrafficPolicy=Local` is being used.
253248

254249
* **Are there any missing metrics that would be useful to have to improve observability
255250
of this feature?**
256251

257-
TBD for beta.
252+
A new metric will be added to represent Services that are being "blackholed" (internalTrafficPolicy=Local and no endpoints).
258253

259254
### Dependencies
260255

@@ -267,7 +262,7 @@ _This section must be completed when targeting beta graduation to a release._
267262
a cloud provider API, or upon an external software-defined storage or network
268263
control plane.
269264

270-
TBD for beta.
265+
No.
271266

272267

273268
### Scalability
@@ -325,22 +320,26 @@ _This section must be completed when targeting beta graduation to a release._
325320

326321
* **How does this feature react if the API server and/or etcd is unavailable?**
327322

328-
TBD for beta.
323+
Services will not be able to update their internal traffic policy.
329324

330325
* **What are other known failure modes?**
331326

332-
TBD for beta.
327+
A Service `internalTrafficPolicy` is set to `Local` but there are no node-local endpoints.
333328

334329
* **What steps should be taken if SLOs are not being met to determine the problem?**
335330

336-
TBD for beta.
331+
* check Service for internal traffic policy
332+
* check EndpointSlice to ensure nodeName is set correctly
333+
* check iptables/ipvs rules on kube-proxy
337334

338335
[supported limits]: https://git.k8s.io/community//sig-scalability/configs-and-limits/thresholds.md
339336
[existing SLIs/SLOs]: https://git.k8s.io/community/sig-scalability/slos/slos.md#kubernetes-slisslos
340337

341338
## Implementation History
342339

343340
2020-10-09: KEP approved as implementable in "alpha" stage.
341+
2021-03-08: alpha implementation merged for v1.21
342+
2021-05-12: KEP approved as implementable in "beta" stage.
344343

345344
## Drawbacks
346345

keps/sig-network/2086-service-internal-traffic-policy/kep.yaml

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -18,12 +18,12 @@ see-also:
1818
- "/keps/sig-network/20181024-service-topology.md"
1919

2020
# The target maturity stage in the current dev cycle for this KEP.
21-
stage: alpha
21+
stage: beta
2222

2323
# The most recent milestone for which work toward delivery of this KEP has been
2424
# done. This can be the current (upcoming) milestone, if it is being actively
2525
# worked on.
26-
latest-milestone: "v1.21"
26+
latest-milestone: "v1.22"
2727

2828
# The milestone at which this feature was, or is targeted to be, at each stage.
2929
milestone:
@@ -34,7 +34,7 @@ milestone:
3434
# The following PRR answers are required at alpha release
3535
# List the feature gate name and the components for which it must be enabled
3636
feature-gates:
37-
- name: ServiceITrafficPolicy
37+
- name: ServiceInternalTrafficPolicy
3838
components:
3939
- kube-apiserver
4040
- kube-proxy

0 commit comments

Comments
 (0)