@@ -619,6 +619,83 @@ You can integrate with [Gateway](https://gateway-api.sigs.k8s.io/) rather than S
619
619
can define your own (provider specific) annotations on the Service that specify the equivalent detail.
620
620
{{< /note >}}
621
621
622
+ # ### Node liveness impact on load balancer traffic
623
+
624
+ Load balancer health checks are critical to modern applications. They are used to
625
+ determine which server (virtual machine, or IP address) the load balancer should
626
+ dispatch traffic to. The Kubernetes APIs do not define how health checks have to be
627
+ implemented for Kubernetes managed load balancers, instead it's the cloud providers
628
+ (and the people implementing integration code) who decide on the behavior. Load
629
+ balancer health checks are extensively used within the context of supporting the
630
+ ` externalTrafficPolicy` field for Services. If `Cluster` is specified all nodes are
631
+ eligible load balancing targets _as long as_ the node is not being deleted and kube-proxy
632
+ is healthy. In this mode : load balancer health checks are configured to target the
633
+ service proxy's readiness port and path. In the case of kube-proxy this evaluates
634
+ to : ` ${NODE_IP}:10256/healthz` . kube-proxy will return either an HTTP code 200 or 503.
635
+ kube-proxy's load balancer health check endpoint returns 200 if :
636
+
637
+ 1. kube-proxy is healthy, meaning :
638
+ - it's able to progress programming the network and isn't timing out while doing
639
+ so (the timeout is defined to be : **2 × `iptables.syncPeriod`**); and
640
+ 2. the node is not being deleted (there is no deletion timestamp set for the Node).
641
+
642
+ The reason why kube-proxy returns 503 and marks the node as not
643
+ eligible when it's being deleted, is because kube-proxy supports connection
644
+ draining for terminating nodes. A couple of important things occur from the point
645
+ of view of a Kubernetes-managed load balancer when a node _is being_ / _is_ deleted.
646
+
647
+ While deleting :
648
+
649
+ * kube-proxy will start failing its readiness probe and essentially mark the
650
+ node as not eligible for load balancer traffic. The load balancer health
651
+ check failing causes load balancers which support connection draining to
652
+ allow existing connections to terminate, and block new connections from
653
+ establishing.
654
+
655
+ When deleted :
656
+
657
+ * The service controller in the Kubernetes cloud controller manager removes the
658
+ node from the referenced set of eligible targets. Removing any instance from
659
+ the load balancer's set of backend targets immediately terminates all
660
+ connections. This is also the reason kube-proxy first fails the health check
661
+ while the node is deleting.
662
+
663
+ It's important to note for Kubernetes vendors that if any vendor configures the
664
+ kube-proxy readiness probe as a liveness probe : that kube-proxy will start
665
+ restarting continuously when a node is deleting until it has been fully deleted.
666
+
667
+ Users deploying kube-proxy can inspect both the readiness / liveness state by
668
+ evaluating the metrics : ` proxy_livez_total` / `proxy_healthz_total`. Both
669
+ metrics publish two series, one with the 200 label and one with the 503 one.
670
+
671
+ For Services of `externalTrafficPolicy : Local`: kube-proxy will return 200 if
672
+
673
+ 1. kube-proxy is healthy/ready, and
674
+ 2. has a local endpoint on the node in question.
675
+
676
+ Node deletion does **not** have an impact on kube-proxy's return
677
+ code for what concerns load balancer health checks. The reason for this is :
678
+ deleting nodes could end up causing an ingress outage should all endpoints
679
+ simultaneously be running on said nodes.
680
+
681
+ It's important to note that the configuration of load balancer health checks is
682
+ specific to each cloud provider, meaning : different cloud providers configure
683
+ the health check in different ways. The three main cloud providers do so in the
684
+ following way :
685
+
686
+ * AWS: if ELB; probes the first NodePort defined on the service spec
687
+ * Azure: probes all NodePort defined on the service spec.
688
+ * GCP: probes port 10256 (kube-proxy's healthz port)
689
+
690
+ There are drawbacks and benefits to each method, so none can be considered fully
691
+ right, but it is important to mention that connection draining using kube-proxy
692
+ can therefore only occur for cloud providers which configure the health checks to
693
+ target kube-proxy. Also note that configuring health checks to target the application
694
+ might cause ingress downtime should the application experience issues which
695
+ are unrelated to networking problems. The recommendation is therefore that cloud
696
+ providers configure the load balancer health checks to target the service
697
+ proxy's healthz port.
698
+
622
699
# ### Load balancers with mixed protocol types
623
700
624
701
{{< feature-state feature_gate_name="MixedProtocolLBService" >}}
0 commit comments