@@ -191,18 +191,19 @@ Yes. It is tested by `TestUpdateServiceLoadBalancerStatus` in pkg/registry/core/
191
191
192
192
###### How can a rollout or rollback fail? Can it impact already running workloads?
193
193
194
- A rollout can fail in case the value of ` ipMode ` had been set to "Proxy" on a service
195
- and running workloads consuming this service fails to reach it because of some
196
- extra hop, or some misconfiguration on the LoadBalancer.
194
+ A rollout can fail in case the new API is supported and consumed by CCM,
195
+ but not all nodes get kube-proxy updated, so part of the workloads on a node will
196
+ start sending the traffic to a LoadBalancer, while the others may still have the
197
+ loadbalancer IP configured on a node interface.
197
198
198
- A rollback can fail in case kube-proxy is not able to detect a rollback and re-add
199
- the LoadBalancer address back to the interface.
199
+ In case of a rollback, kube-proxy will also rollback to the default behavior, re-adding
200
+ the LoadBalancer interface. This can fail for workloads that may be already relying
201
+ on the new behavior (eg. sending traffic to the LoadBalancer expecting some additional
202
+ features, like PROXY and TLS Termination as per the Motivations section).
200
203
201
204
###### What specific metrics should inform a rollback?
202
205
203
- Workloads consuming a service configured to use the new ` ipMode ` and that starts
204
- to fail to reach the service, or an increase on the requests time are metrics that
205
- should inform a rollback
206
+ N/A
206
207
207
208
###### Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested?
208
209
@@ -216,20 +217,19 @@ No.
216
217
217
218
###### How can an operator determine if the feature is in use by workloads?
218
219
219
- As this is a low level operation, to check if it is working an operator should:
220
- * Verify a service of type=LoadBalancer and this feature enabled
221
- * Check and confirm that the IPs set on ` .status.loadBalancer.ingress.ip ` are not set
222
- on any interface
220
+ Checking if the field ` .status.loadBalancer.ingress.ipMode ` is set to ` Proxy `
223
221
224
222
###### How can someone using this feature know that it is working for their instance?
225
223
226
- - [ ] Events
227
- - Event Reason:
228
224
- [X] API .status
229
225
- Condition name:
230
226
- Other field: ` .status.loadBalancer.ingress.ipMode ` not null
231
- - [ ] Other (treat as last resort)
232
- - Details:
227
+ - [X] Other:
228
+ - Details: To detect if the traffic is being directed to the LoadBalancer and not
229
+ directly to another node, the user will need to rely on the LoadBalancer logs,
230
+ and the destination workload logs to check if the traffic is coming from one Pod
231
+ to the other or from the LoadBalancer.
232
+
233
233
234
234
###### What are the reasonable SLOs (Service Level Objectives) for the enhancement?
235
235
@@ -238,27 +238,26 @@ type LoadBalancer and the feature enabled.
238
238
239
239
###### What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service?
240
240
241
- - [X] Other (treat as last resort)
242
- - Details: Workload/Application instrumentation containing the error rate and
243
- latency of calls against other services on this cluster
241
+ N/A
244
242
245
243
###### Are there any missing metrics that would be useful to have to improve observability of this feature?
246
244
247
- N/A
245
+ * On kube-proxy, a metric containing the count of IP programming vs service type would be useful
246
+ to determine if the feature is being used, and if there is any drift between nodes
247
+ * TBD: Should this metric be implemented?
248
248
249
249
### Dependencies
250
250
251
251
###### Does this feature depend on any specific services running in the cluster?
252
252
253
253
- cloud controller manager / LoadBalancer controller
254
254
- LoadBalancer controller should set the right .status field for ` ipMode `
255
- - In case of this feature outage, the traffic may still be routed using the ` VIP ` mode
256
- - kube-proxy
257
- - Network interface IP addressprogramming
258
- - In case of this feature outage, network interfaces on the node may still keep
259
- adding the LoadBalancer IP, that may cause wrong traffic routing
260
- - This dependency doesn't happen on clusters that uses CNI that replaces kube-proxy.
261
- The CNIs should implement this feature their own, in this case.
255
+ - In case of this feature outage nothing happens, as LoadBalancers will be
256
+ already out of sync with services in case of CCM being crashed
257
+ - kube-proxy or other Service Proxy that implements this feature
258
+ - Network interface IP address programming
259
+ - In case of this feature outage, the user will get the same result/behavior as
260
+ if the ` ipMode ` field has not been set.
262
261
263
262
### Scalability
264
263
@@ -274,13 +273,11 @@ previous answers based on experience in the field.
274
273
275
274
###### Will enabling / using this feature result in any new API calls?
276
275
277
- - API call type - Patch
278
- - Estimated throughput - 1 per service creation/reconciliation
279
- - originating component - cloud controller manager / LoadBalancer controller
276
+ No.
280
277
281
278
###### Will enabling / using this feature result in introducing new API types?
282
279
283
- No
280
+ No.
284
281
285
282
###### Will enabling / using this feature result in any new calls to the cloud provider?
286
283
0 commit comments