LoadBalancer Alerting #1833

maaft · 2025-07-15T08:23:03Z

maaft
Jul 15, 2025

Description

This morning, suddenly all targets of my load balancer went unhealthy without me noticing.

https://status.hetzner.com/incident/8becfe86-b077-4983-b7d0-af81b4fc1496

Somehow a restart of all k3s nodes helped and all targets where healty again.

Anyway, it took me a while to find out that the load balancer was the problem.

I already opened this issue here hetznercloud/hcloud-cloud-controller-manager#976

If those metrics are added, maybe kube-hetzner module could also add options to generate & deploy monitoring / alerting manifests to the cluster.

loomsen · 2025-07-15T10:03:32Z

loomsen
Jul 15, 2025

Happened to me as well! Unfortunately, a reboot didn't fix it for me. Still everything down and the LB is not picking up the targets.

0 replies

askanhesse · 2025-07-15T12:21:44Z

askanhesse
Jul 15, 2025

The non working LBs seem to be fixed when hetzner cloud-controller-manager pod gets restarted. Simply deleting it should also work.

0 replies

maaft · 2025-07-15T13:05:03Z

maaft
Jul 15, 2025
Author

@askanhesse thanks for the info! Deletion would give you a new load-balancer IP though, right? Just keep in mind that this is potentially dangerous if your DNS records point directly to your LB.

@loomsen TBH I only suspected that restarting k3s on all nodes solved the issue. Maybe this is not the full solution. What I actually did:

use_control_plane_lb = true
terraform apply

This added a 2nd LB for control-plane API, and restarted all nodes. Afterwards the regular LB was working for me.

For this reason I have two servers in front of my LB with haproxy installed + a small script to reconfigure them with a new target IP. This allows me to move to new clusters or new loadbalancers with minimum downtime. Could also be a nice additione to this project @mysticaltech ;)

0 replies

loomsen · 2025-07-15T13:13:06Z

loomsen
Jul 15, 2025

@maaft sounds reasonable, after todays outage I was also thinking about something similar. But not much you can do, when hetzner has issues on the load balancer infra itself.
It just started working again some half an hour ago or so without restarting anything on my side.
Fortunately I could get my prod projects back to working with restarting the nodes, however, for another dev project, this didn't have any effect up until it recovered itself just now.
I'm using a control plane lb in all projects, too :)

0 replies

askanhesse · 2025-07-15T13:30:24Z

askanhesse
Jul 15, 2025

@askanhesse thanks for the info! Deletion would give you a new load-balancer IP though, right? Just keep in mind that this is potentially dangerous if your DNS records point directly to your LB.

No, this is not the case. It will sync configuration to existing LBs. It is quite the same as restarting the whole cluster, resulting in freshly started pod.

0 replies

mysticaltech · 2025-07-27T16:23:10Z

mysticaltech
Jul 27, 2025
Maintainer

Good ideas folks. Personally, I have started using Cloudflare Tunnels with cloudflared (and Cloudflare Zero Trust) instead of LBs, I find it much easier to work with and more reliable, and also free, unless you get mega worldscale traffic.

0 replies

raakraka · 2025-08-20T15:18:45Z

raakraka
Aug 20, 2025

As an alternative you can also point Cloudflare Loadbalancer directly to your worker's public ip's with a custom ingress controller (to disable deploying a hetzner lb).
This feels like a good way to have HA

0 replies

Uh oh!

LoadBalancer Alerting #1833

Uh oh!

maaft Jul 15, 2025

Description

Replies: 7 comments

Uh oh!

Uh oh!

loomsen Jul 15, 2025

Uh oh!

Uh oh!

askanhesse Jul 15, 2025

Uh oh!

Uh oh!

maaft Jul 15, 2025 Author

Uh oh!

loomsen Jul 15, 2025

Uh oh!

askanhesse Jul 15, 2025

Uh oh!

mysticaltech Jul 27, 2025 Maintainer

Uh oh!

raakraka Aug 20, 2025

maaft
Jul 15, 2025

loomsen
Jul 15, 2025

askanhesse
Jul 15, 2025

maaft
Jul 15, 2025
Author

loomsen
Jul 15, 2025

askanhesse
Jul 15, 2025

mysticaltech
Jul 27, 2025
Maintainer

raakraka
Aug 20, 2025