@@ -627,74 +627,7 @@ dispatch traffic to. The Kubernetes APIs do not define how health checks have to
627
627
implemented for Kubernetes managed load balancers, instead it's the cloud providers
628
628
(and the people implementing integration code) who decide on the behavior. Load
629
629
balancer health checks are extensively used within the context of supporting the
630
- ` externalTrafficPolicy` field for Services. If `Cluster` is specified all nodes are
631
- eligible load balancing targets _as long as_ the node is not being deleted and kube-proxy
632
- is healthy. In this mode : load balancer health checks are configured to target the
633
- service proxy's readiness port and path. In the case of kube-proxy this evaluates
634
- to : ` ${NODE_IP}:10256/healthz` . kube-proxy will return either an HTTP code 200 or 503.
635
- kube-proxy's load balancer health check endpoint returns 200 if :
636
-
637
- 1. kube-proxy is healthy, meaning :
638
- - it's able to progress programming the network and isn't timing out while doing
639
- so (the timeout is defined to be : **2 × `iptables.syncPeriod`**); and
640
- 2. the node is not being deleted (there is no deletion timestamp set for the Node).
641
-
642
- The reason why kube-proxy returns 503 and marks the node as not
643
- eligible when it's being deleted, is because kube-proxy supports connection
644
- draining for terminating nodes. A couple of important things occur from the point
645
- of view of a Kubernetes-managed load balancer when a node _is being_ / _is_ deleted.
646
-
647
- While deleting :
648
-
649
- * kube-proxy will start failing its readiness probe and essentially mark the
650
- node as not eligible for load balancer traffic. The load balancer health
651
- check failing causes load balancers which support connection draining to
652
- allow existing connections to terminate, and block new connections from
653
- establishing.
654
-
655
- When deleted :
656
-
657
- * The service controller in the Kubernetes cloud controller manager removes the
658
- node from the referenced set of eligible targets. Removing any instance from
659
- the load balancer's set of backend targets immediately terminates all
660
- connections. This is also the reason kube-proxy first fails the health check
661
- while the node is deleting.
662
-
663
- It's important to note for Kubernetes vendors that if any vendor configures the
664
- kube-proxy readiness probe as a liveness probe : that kube-proxy will start
665
- restarting continuously when a node is deleting until it has been fully deleted.
666
-
667
- Users deploying kube-proxy can inspect both the readiness / liveness state by
668
- evaluating the metrics : ` proxy_livez_total` / `proxy_healthz_total`. Both
669
- metrics publish two series, one with the 200 label and one with the 503 one.
670
-
671
- For Services of `externalTrafficPolicy : Local`: kube-proxy will return 200 if
672
-
673
- 1. kube-proxy is healthy/ready, and
674
- 2. has a local endpoint on the node in question.
675
-
676
- Node deletion does **not** have an impact on kube-proxy's return
677
- code for what concerns load balancer health checks. The reason for this is :
678
- deleting nodes could end up causing an ingress outage should all endpoints
679
- simultaneously be running on said nodes.
680
-
681
- It's important to note that the configuration of load balancer health checks is
682
- specific to each cloud provider, meaning : different cloud providers configure
683
- the health check in different ways. The three main cloud providers do so in the
684
- following way :
685
-
686
- * AWS: if ELB; probes the first NodePort defined on the service spec
687
- * Azure: probes all NodePort defined on the service spec.
688
- * GCP: probes port 10256 (kube-proxy's healthz port)
689
-
690
- There are drawbacks and benefits to each method, so none can be considered fully
691
- right, but it is important to mention that connection draining using kube-proxy
692
- can therefore only occur for cloud providers which configure the health checks to
693
- target kube-proxy. Also note that configuring health checks to target the application
694
- might cause ingress downtime should the application experience issues which
695
- are unrelated to networking problems. The recommendation is therefore that cloud
696
- providers configure the load balancer health checks to target the service
697
- proxy's healthz port.
630
+ ` externalTrafficPolicy` field for Services.
698
631
699
632
# ### Load balancers with mixed protocol types
700
633
0 commit comments