24
24
- [ Test Plan] ( #test-plan )
25
25
- [ Controller Unit Tests] ( #controller-unit-tests )
26
26
- [ Kube-Proxy Unit Tests] ( #kube-proxy-unit-tests )
27
+ - [ e2e Tests] ( #e2e-tests )
27
28
- [ Observability] ( #observability )
28
29
- [ Graduation Criteria] ( #graduation-criteria )
29
30
- [ Version Skew Strategy] ( #version-skew-strategy )
@@ -75,15 +76,9 @@ routing at zone level but could be expanded to include region.
75
76
76
77
In the short term, this is taking the place of two closely related KEPs that
77
78
were never implemented. These KEPs relate to EndpointSlice subsetting and are
78
- still relevant, just deferred to a later point in time. For more info on this
79
- transition refer to the following resources:
80
-
81
- - [ Doc: Updates to Topology in Kubernetes
82
- 1.21] ( https://docs.google.com/document/d/1ZzUoFY1SrdjVefl7gVOJZJLt1I1LHttw8pcX95nlgMY/edit )
83
- - [ KEP 2004: Topology Aware
84
- Subsetting] ( https://github.com/kubernetes/enhancements/blob/master/keps/sig-network/2004-topology-aware-subsetting ) .
85
- - [ KEP 2030: Topology Aware
86
- Proxying] ( https://github.com/kubernetes/enhancements/blob/master/keps/sig-network/2030-topology-aware-proxying ) .
79
+ still relevant, just deferred to a later point in time. This
80
+ [ doc] ( https://docs.google.com/document/d/1ZzUoFY1SrdjVefl7gVOJZJLt1I1LHttw8pcX95nlgMY/edit )
81
+ has more info on this transition.
87
82
88
83
## Motivation
89
84
@@ -151,8 +146,7 @@ hints help ensure that each zone will have a single endpoint to consume by
151
146
adding a hint to the third endpoint that it should be consumed by "zone-c".
152
147
153
148
This functionality will be enabled by a ` TopologyAwareHints ` feature gate along
154
- with the ` trafficPolicy ` field on Service that will be added as part of KEP
155
- 2086 .
149
+ with a new Service annotation.
156
150
157
151
### Risks and Mitigations
158
152
@@ -196,20 +190,21 @@ updated to read the same information to identify which zone it is running in.
196
190
197
191
### Configuration
198
192
199
- The new Service ` trafficPolicy ` field will be expanded to support a new value:
200
-
201
- - ` PreferZone ` : When there are a sufficient number of endpoints for the Service,
202
- the EndpointSlice controller will add topology hints for each endpoint that
203
- will ensure a proportional amounts are available to each zone in a cluster.
193
+ A new ` service.kubernetes.io/topology-aware-routing ` annotation can be used to
194
+ enable or disable Topology Aware Routing (and by extension, hints) for a
195
+ Service. This may be set to "Auto" or "Disabled". Any other value is treated as
196
+ "Disabled".
204
197
205
- A future KEP will explore changing the default value of this field to
206
- ` PreferZone ` .
198
+ The previous ` service.kubernetes.io/topology-aware-hints ` annotation will
199
+ continue to be supported as a means of configuring this feature .
207
200
208
201
#### Interoperability
209
202
210
- Validation will ensure that ` trafficPolicy ` can not be set to ` PreferZone ` when
211
- the deprecated ` topologyKeys ` field is also set. This will be true until the
212
- ` topologyKeys ` field is removed in the future.
203
+ If any of the following are true, topology hints will be ignored:
204
+
205
+ - ExternalTrafficPolicy is set to Local
206
+ - InternalTrafficPolicy is set to Local
207
+ - TopologyKeys field has at least one entry
213
208
214
209
#### Feature Gate
215
210
@@ -250,7 +245,6 @@ type ForZone struct {
250
245
}
251
246
```
252
247
253
-
254
248
#### Future API Expansion
255
249
This approach would allow for future API expansion that enabled specifying
256
250
multiple zones per endpoint with weights. That level of complexity may never be
@@ -277,7 +271,7 @@ conditions are true:
277
271
278
272
- Kube-Proxy is able to determine the zone it is running within (likely based
279
273
on node labels).
280
- - The `trafficPolicy` field is set to `PreferZone` for the Service .
274
+ - The annotation is set to `Auto` .
281
275
- At least one endpoint for the Service has a hint pointing to the zone
282
276
Kube-Proxy is running within.
283
277
- All endpoints for the Service have zone hints.
@@ -293,10 +287,10 @@ had not yet propagated to all of them.
293
287
294
288
# ## EndpointSlice Controller
295
289
296
- When the `TopologyAwareHints` feature gate is enabled and the `trafficPolicy`
297
- field is set to `PreferZone ` for a Service, the EndpointSlice controller will
298
- add hints to EndpointSlices. These hints will indicate where an endpoint should
299
- be consumed by proxy implementations to enable topology aware routing.
290
+ When the `TopologyAwareHints` feature gate is enabled and the annotation is set
291
+ to `Auto ` for a Service, the EndpointSlice controller will add hints to
292
+ EndpointSlices. These hints will indicate where an endpoint should be consumed
293
+ by proxy implementations to enable topology aware routing.
300
294
301
295
The EndpointSlice controller will determine how many endpoints should be
302
296
available for each zone based on the proportion of CPU cores in each zone. If
@@ -370,13 +364,10 @@ In the future we may expand this functionality if needed. This could include:
370
364
# ### Controller Unit Tests
371
365
| Test Description | Expected Result |
372
366
| :--- | :--- |
373
- | Feature Gate On, TrafficPolicy == 'PreferZone', 2+ zones | Hints set |
374
- | Feature Gate On, TrafficPolicy == 'PreferZone', 1 zone | No hints set |
375
- | Feature Gate On, TrafficPolicy == 'Local', 2+ zones | No hints |
376
- | Feature Gate On, TrafficPolicy Unset, 2+ zones | No hints |
377
- | Feature Gate Off, TrafficPolicy == 'PreferZone', 2+ zones | No hints |
378
- | Feature Gate Off, TrafficPolicy Unset, 2+ zones | No hints |
379
- | Feature Gate Off, TrafficPolicy Unset, 2+ zones | No hints |
367
+ | Feature On, 2+ zones | Hints set |
368
+ | Feature Off, 2+ zones | No hints |
369
+ | Feature On, 1 zone | No hints set |
370
+ | Feature On, ExternalTrafficPolicy == 'Local', 2+ zones | No hints |
380
371
| 2 endpoints, 3 zones | No hints |
381
372
| 3 endpoints, 3 zones | Hints set |
382
373
| 4 endpoints, 3 zones | No hints |
@@ -393,10 +384,28 @@ In the future we may expand this functionality if needed. This could include:
393
384
# ### Kube-Proxy Unit Tests
394
385
| Test Description | Expected Result |
395
386
| :--- | :--- |
396
- | Feature Gate On, TrafficPolicy == 'PreferZone', hints matching zone | Endpoints filtered |
397
- | Feature Gate On, TrafficPolicy == 'Local', hints matching zone | Endpoints not filtered |
398
- | Feature Gate Off, TrafficPolicy == 'PreferZone', hints matching zone | Endpoints not filtered |
399
- | Feature Gate On, TrafficPolicy == 'PreferZone', no hints matching zone | Endpoints not filtered |
387
+ | Feature On, hints matching zone | Endpoints filtered |
388
+ | Feature On, ExternalTrafficPolicy == 'Local', hints matching zone | Endpoints not filtered |
389
+ | Feature Off, hints matching zone | Endpoints not filtered |
390
+ | Feature On, no hints matching zone | Endpoints not filtered |
391
+
392
+ # ## e2e Tests
393
+ This represents the largest and most uncertain part of the testing effort. We
394
+ need to find a way to run e2e tests on multizone clusters. To limit flakiness,
395
+ those clusters likely need to have a consistent distribution of nodes across
396
+ zones. This will enable us to write predictable tests for topology aware
397
+ routing.
398
+
399
+ At a minimum, we likely want the following test :
400
+
401
+ - 3 zone cluster, with 1 equivalent node per zone
402
+ - Deploy a single pod to each node with a daemonset
403
+ - Create a Service that targets that daemonset
404
+ - Make requests from each zone and ensure that the request is routed to a pod in
405
+ the same zone
406
+
407
+ We'll likely need more tests to properly vet this feature, but this one should
408
+ be straightforward to write and unlikely to be flaky.
400
409
401
410
# ## Observability
402
411
We can reuse some of the metrics of EndpointSlice Controller that we already
@@ -448,7 +457,7 @@ Thus there could be two potential version skew scenarios:
448
457
of the new controller functionality.
449
458
450
459
Each scenario described above will end up behaving as if this feature is not
451
- enabled even if the `trafficPolicy` has been set on Service.
460
+ enabled even if the annotation has been set on the Service.
452
461
453
462
# # Production Readiness Review Questionnaire
454
463
@@ -467,7 +476,7 @@ enabled even if the `trafficPolicy` has been set on Service.
467
476
* **Can the feature be disabled once it has been enabled (i.e. can we roll back
468
477
the enablement)?**
469
478
Yes. It can easily be disabled universally by turning off the feature gate or
470
- setting the `trafficPolicy` field to some other value for a Service.
479
+ setting the annotation to some other value for a Service.
471
480
472
481
* **What happens if we reenable the feature if it was previously rolled back?**
473
482
EndpointSlices hints will be added again resulting in changes to existing
0 commit comments