Skip to content

Commit 01919b4

Browse files
committed
Fix review comments
1 parent 430f4a7 commit 01919b4

File tree

1 file changed

+28
-31
lines changed
  • keps/sig-network/1860-kube-proxy-IP-node-binding

1 file changed

+28
-31
lines changed

keps/sig-network/1860-kube-proxy-IP-node-binding/README.md

Lines changed: 28 additions & 31 deletions
Original file line numberDiff line numberDiff line change
@@ -191,18 +191,19 @@ Yes. It is tested by `TestUpdateServiceLoadBalancerStatus` in pkg/registry/core/
191191

192192
###### How can a rollout or rollback fail? Can it impact already running workloads?
193193

194-
A rollout can fail in case the value of `ipMode` had been set to "Proxy" on a service
195-
and running workloads consuming this service fails to reach it because of some
196-
extra hop, or some misconfiguration on the LoadBalancer.
194+
A rollout can fail in case the new API is supported and consumed by CCM,
195+
but not all nodes get kube-proxy updated, so part of the workloads on a node will
196+
start sending the traffic to a LoadBalancer, while the others may still have the
197+
loadbalancer IP configured on a node interface.
197198

198-
A rollback can fail in case kube-proxy is not able to detect a rollback and re-add
199-
the LoadBalancer address back to the interface.
199+
In case of a rollback, kube-proxy will also rollback to the default behavior, re-adding
200+
the LoadBalancer interface. This can fail for workloads that may be already relying
201+
on the new behavior (eg. sending traffic to the LoadBalancer expecting some additional
202+
features, like PROXY and TLS Termination as per the Motivations section).
200203

201204
###### What specific metrics should inform a rollback?
202205

203-
Workloads consuming a service configured to use the new `ipMode` and that starts
204-
to fail to reach the service, or an increase on the requests time are metrics that
205-
should inform a rollback
206+
N/A
206207

207208
###### Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested?
208209

@@ -216,20 +217,19 @@ No.
216217

217218
###### How can an operator determine if the feature is in use by workloads?
218219

219-
As this is a low level operation, to check if it is working an operator should:
220-
* Verify a service of type=LoadBalancer and this feature enabled
221-
* Check and confirm that the IPs set on `.status.loadBalancer.ingress.ip` are not set
222-
on any interface
220+
Checking if the field `.status.loadBalancer.ingress.ipMode` is set to `Proxy`
223221

224222
###### How can someone using this feature know that it is working for their instance?
225223

226-
- [ ] Events
227-
- Event Reason:
228224
- [X] API .status
229225
- Condition name:
230226
- Other field: `.status.loadBalancer.ingress.ipMode` not null
231-
- [ ] Other (treat as last resort)
232-
- Details:
227+
- [X] Other:
228+
- Details: To detect if the traffic is being directed to the LoadBalancer and not
229+
directly to another node, the user will need to rely on the LoadBalancer logs,
230+
and the destination workload logs to check if the traffic is coming from one Pod
231+
to the other or from the LoadBalancer.
232+
233233

234234
###### What are the reasonable SLOs (Service Level Objectives) for the enhancement?
235235

@@ -238,27 +238,26 @@ type LoadBalancer and the feature enabled.
238238

239239
###### What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service?
240240

241-
- [X] Other (treat as last resort)
242-
- Details: Workload/Application instrumentation containing the error rate and
243-
latency of calls against other services on this cluster
241+
N/A
244242

245243
###### Are there any missing metrics that would be useful to have to improve observability of this feature?
246244

247-
N/A
245+
* On kube-proxy, a metric containing the count of IP programming vs service type would be useful
246+
to determine if the feature is being used, and if there is any drift between nodes
247+
* TBD: Should this metric be implemented?
248248

249249
### Dependencies
250250

251251
###### Does this feature depend on any specific services running in the cluster?
252252

253253
- cloud controller manager / LoadBalancer controller
254254
- LoadBalancer controller should set the right .status field for `ipMode`
255-
- In case of this feature outage, the traffic may still be routed using the `VIP` mode
256-
- kube-proxy
257-
- Network interface IP addressprogramming
258-
- In case of this feature outage, network interfaces on the node may still keep
259-
adding the LoadBalancer IP, that may cause wrong traffic routing
260-
- This dependency doesn't happen on clusters that uses CNI that replaces kube-proxy.
261-
The CNIs should implement this feature their own, in this case.
255+
- In case of this feature outage nothing happens, as LoadBalancers will be
256+
already out of sync with services in case of CCM being crashed
257+
- kube-proxy or other Service Proxy that implements this feature
258+
- Network interface IP address programming
259+
- In case of this feature outage, the user will get the same result/behavior as
260+
if the `ipMode` field has not been set.
262261

263262
### Scalability
264263

@@ -274,13 +273,11 @@ previous answers based on experience in the field.
274273

275274
###### Will enabling / using this feature result in any new API calls?
276275

277-
- API call type - Patch
278-
- Estimated throughput - 1 per service creation/reconciliation
279-
- originating component - cloud controller manager / LoadBalancer controller
276+
No.
280277

281278
###### Will enabling / using this feature result in introducing new API types?
282279

283-
No
280+
No.
284281

285282
###### Will enabling / using this feature result in any new calls to the cloud provider?
286283

0 commit comments

Comments
 (0)