You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
KEP-1669: promote ProxyTerminatingEndpoints to Beta (kubernetes#3505)
* KEP-1669: promote ProxyTerminatingEndpoints to Beta
Signed-off-by: Andrew Sy Kim <[email protected]>
* KEP-1669: add note that upgrade/downgrade testing should be done before promotion to Beta
Signed-off-by: Andrew Sy Kim <[email protected]>
* KEP-1669: add more details about metric sync_proxy_rules_no_local_endpoints_total and fix typo
Signed-off-by: Andrew Sy Kim <[email protected]>
* KEP-1669: use the new test plan format, including links to existing tests
Signed-off-by: Andrew Sy Kim <[email protected]>
* KEP-1669: answer PRR question 'What steps should be taken if SLOs are not being met to determine the problem?'
Signed-off-by: Andrew Sy Kim <[email protected]>
Signed-off-by: Andrew Sy Kim <[email protected]>
@@ -156,21 +158,38 @@ In addition, kube-proxy's node port health check should fail if there are only `
156
158
157
159
### Test Plan
158
160
159
-
#### Unit Tests
161
+
[X] I/we understand the owners of the involved components may require updates to
162
+
existing tests to make this code solid enough prior to committing the changes necessary
163
+
to implement this enhancement.
160
164
161
-
kube-proxy unit tests:
165
+
##### Prerequisite testing updates
162
166
163
-
* Unit tests will validate the correct behavior when there are only local terminating endpoints.
164
-
* Unit tests will validate the changein behavior against the matrix of possible Service configurations using both internalTrafficPolicy and externalTrafficPolicy.
165
-
* Existing unit tests will validate that terminating endpoints are only used when there are no ready endpoints, otherwise ready && !terminating endpoints are used.
166
-
* Unit tests will validate health check node port succeeds only when there are ready && !terminating endpoints.
167
+
##### Unit tests
167
168
168
-
#### E2E Tests
169
+
-`pkg/proxy`: `07/2021` - Validating behavior in iptables and ipvs proxier. Also tests feature gate enablement.
170
+
-`pkg/proxy`: `03/2022` - All tests updated to cover all traffic policies (not just Local)
E2E tests will be added to validate that no traffic is dropped during a rolling update for a Service. E2E tests should cover all permutations of externalTrafficPolicy
171
186
and internalTrafficPolicy.
172
187
173
-
All existing E2E tests for Services should continue to pass.
188
+
- E2E test validating health check node port behavior: https://github.com/kubernetes/kubernetes/blob/4bc1398c0834a63370952702eef24d5e74c736f6/test/e2e/network/service.go#L2790
189
+
- E2E test validating fallback behavior for terminating endpoints when `externalTrafficPolicy: Cluster`: https://github.com/kubernetes/kubernetes/blob/4bc1398c0834a63370952702eef24d5e74c736f6/test/e2e/network/service.go#L3060
190
+
- E2E test validating fallback behavior for terminating endpoints when `externalTrafficPolicy: Local`: https://github.com/kubernetes/kubernetes/blob/4bc1398c0834a63370952702eef24d5e74c736f6/test/e2e/network/service.go#L3145
191
+
- E2E test validating fallback behaviro for terminating endpoints when `internalTrafficPolicy: Cluster`: https://github.com/kubernetes/kubernetes/blob/4bc1398c0834a63370952702eef24d5e74c736f6/test/e2e/network/service.go#L2889
192
+
- E2E test validating fallback behaviro for terminating endpoints when `internalTrafficPolicy: Local`: https://github.com/kubernetes/kubernetes/blob/4bc1398c0834a63370952702eef24d5e74c736f6/test/e2e/network/service.go#L2972
174
193
175
194
### Graduation Criteria
176
195
@@ -179,12 +198,13 @@ All existing E2E tests for Services should continue to pass.
179
198
* kube-proxy internally tracks the `terminating` and `serving` condition from EndpointSlice
180
199
* kube-proxy falls back to terminating endpoints if and only if they are the only available endpoints.
181
200
* feature is only enabled if the feature gate `ProxyTerminatingEndpoints` is on.
182
-
* unit tests in kube-proxy.
201
+
* unit tests in kube-proxy (see [Test Plan](#test-plan) section)
183
202
184
203
#### Beta
185
204
186
-
* E2E tests are in place, exercising all permutations of internalTrafficPolicy and externalTrafficPolicy.
205
+
* E2E tests are in place, exercising all permutations of internalTrafficPolicy and externalTrafficPolicy (see [Test Plan](#test-plan) section)
187
206
* Metrics to publish how many Services/Endpoints are routing traffic to terminating endpoints.
207
+
* Manual or automated rollback testing (see [Test Plan](#test-plan) section)
188
208
189
209
### Upgrade / Downgrade Strategy
190
210
@@ -246,13 +266,14 @@ When the rollout happens, workloads may unexpectedly receive traffic when termin
246
266
247
267
###### What specific metrics should inform a rollback?
248
268
249
-
There will be metrics added to publish how many Services/Endpoints are routing to terminating pods. It may be expected that clusters
250
-
route to many terminating pods at once, especially during rolling updates, but users can correlate this metric with other factors to
251
-
gauge if a rollback is necessary.
269
+
`sync_proxy_rules_no_local_endpoints_total` can be used to inform rollback in scenarios where Services are dropping traffic to local endpoints.
270
+
If this metric increases dramatically (especially when there are no rollouts happening), it could mean there is a programming error in kube-proxy.
271
+
In general, we expect this metric to decrease during roll outs when this feature is enabled since nodes that only have terminating endpoints should
272
+
no longer be included in this metric.
252
273
253
274
###### Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested?
254
275
255
-
Upgrade->downgrade->upgrade path has not been tested yet. We may want to require this for beta or GA.
276
+
Upgrade->downgrade->upgrade testing (manual or automated) will be required for Beta. If tested manually, the steps will be documented in this KEP.
256
277
257
278
###### Is the rollout accompanied by any deprecations and/or removals of features, APIs, fields of API types, flags, etc.?
258
279
@@ -269,7 +290,7 @@ regardless of their termination state. If this is undesired, workloads should be
269
290
###### What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service?
0 commit comments