You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I installed a cluster in eu-central region, with 3 control-plane nodes spawn across HEL, FSN, NBG as well as 3 agent nodes.
I used the lastest release v2.1.6.
After the installation, all nodes were added to the cluster and become healthy, except the 1 agent node in NBG. The 1 node stays "unhealthy".
Debugging steps
I started debugging on the node, and found in the "journalctl" logs a lot of those errors
May 22 20:58:59 k3s-worker-nbg1-wtc k3s[1350]: I0522 20:58:59.658425 1350 status_manager.go:667] "Failed to get status for pod" podUID=44c48f1e-ec34-4c48-b9a8-e108e4efd480 pod="kube-system/cilium-lqhgn" err="Get \"https://127.0.0.1:6444/api/v1/namespaces/kube-system/pods/cilium-lqhgn\": net/http: TLS handshake timeout"
May 22 20:59:00 k3s-worker-nbg1-wtc k3s[1350]: E0522 20:59:00.084778 1350 kubelet.go:2373] "Container runtime network not ready" networkReady="NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: cni plugin not initialized"
May 22 20:59:01 k3s-worker-nbg1-wtc k3s[1350]: W0522 20:59:01.179363 1350 reflector.go:424] object-"kube-system"/"hubble-server-certs": failed to list *v1.Secret: Get "https://127.0.0.1:6444/api/v1/namespaces/kube-system/secrets?fieldSelector=metadata.name%3Dhubble-server-certs&resourceVersion=1680277": net/http: TLS handshake timeout
May 22 20:59:01 k3s-worker-nbg1-wtc k3s[1350]: I0522 20:59:01.179528 1350 trace.go:219] Trace[1662194913]: "Reflector ListAndWatch" name:object-"kube-system"/"hubble-server-certs" (22-May-2023 20:58:51.177) (total time: 10002ms):
May 22 20:59:01 k3s-worker-nbg1-wtc k3s[1350]: Trace[1662194913]: ---"Objects listed" error:Get "https://127.0.0.1:6444/api/v1/namespaces/kube-system/secrets?fieldSelector=metadata.name%3Dhubble-server-certs&resourceVersion=1680277": net/http: TLS handshake timeout 10002ms (20:59:01.179)
May 22 20:59:01 k3s-worker-nbg1-wtc k3s[1350]: Trace[1662194913]: [10.002200076s] [10.002200076s] END
systemctl status k3s-agent is running but also showed a TLS handshake timeout error connection to: https://127.0.0.1:6444/
● k3s-agent.service - Lightweight Kubernetes
Loaded: loaded (/etc/systemd/system/k3s-agent.service; enabled; preset: disabled)
Active: active (running) since Fri 2023-05-19 00:55:22 UTC; 3 days ago
Docs: https://k3s.io
Process: 1346 ExecStartPre=/bin/sh -xc ! /usr/bin/systemctl is-enabled --quiet nm-cloud-setup.service (code=exited, status=0/SUCCESS)
Process: 1348 ExecStartPre=/sbin/modprobe br_netfilter (code=exited, status=0/SUCCESS)
Process: 1349 ExecStartPre=/sbin/modprobe overlay (code=exited, status=0/SUCCESS)
Main PID: 1350 (k3s-agent)
Tasks: 84
CPU: 3h 32min 3.727s
CGroup: /system.slice/k3s-agent.service
├─ 1350 "/usr/local/bin/k3s agent"
├─ 8161 containerd -c /var/lib/rancher/k3s/agent/etc/containerd/config.toml -a /run/k3s/containerd/containerd.sock --state /run/k3s/containe>
├─ 8351 /var/lib/rancher/k3s/data/feeeb9b2f9234f89a72104f4e1c25b6a2ffe117ddaadbe6791cf09885153bdc3/bin/containerd-shim-runc-v2 -namespace k8>
├─ 8485 /var/lib/rancher/k3s/data/feeeb9b2f9234f89a72104f4e1c25b6a2ffe117ddaadbe6791cf09885153bdc3/bin/containerd-shim-runc-v2 -namespace k8>
├─13762 /var/lib/rancher/k3s/data/feeeb9b2f9234f89a72104f4e1c25b6a2ffe117ddaadbe6791cf09885153bdc3/bin/containerd-shim-runc-v2 -namespace k8>
└─13949 /var/lib/rancher/k3s/data/feeeb9b2f9234f89a72104f4e1c25b6a2ffe117ddaadbe6791cf09885153bdc3/bin/containerd-shim-runc-v2 -namespace k8>
May 22 21:04:11 k3s-worker-nbg1-wtc k3s[1350]: W0522 21:04:11.356131 1350 reflector.go:424] k8s.io/[email protected]/tools/cache/reflector.go:169>
May 22 21:04:11 k3s-worker-nbg1-wtc k3s[1350]: I0522 21:04:11.356251 1350 trace.go:219] Trace[94676488]: "Reflector ListAndWatch" name:k8s.io/client-g>
May 22 21:04:11 k3s-worker-nbg1-wtc k3s[1350]: Trace[94676488]: ---"Objects listed" error:Get "https://127.0.0.1:6444/api/v1/nodes?fieldSelector=metadata>May 22 21:04:11 k3s-worker-nbg1-wtc k3s[1350]: Trace[94676488]: [10.001875034s] [10.001875034s] ENDMay 22 21:04:11 k3s-worker-nbg1-wtc k3s[1350]: E0522 21:04:11.356274 1350 reflector.go:140] k8s.io/[email protected]/tools/cache/reflector.go:169>May 22 21:04:11 k3s-worker-nbg1-wtc k3s[1350]: E0522 21:04:11.452264 1350 pod_workers.go:965] "Error syncing pod, skipping" err="network is not ready:>
May 22 21:04:11 k3s-worker-nbg1-wtc k3s[1350]: E0522 21:04:11.452747 1350 pod_workers.go:965] "Error syncing pod, skipping" err="network is not ready:>May 22 21:04:11 k3s-worker-nbg1-wtc k3s[1350]: I0522 21:04:11.713382 1350 status_manager.go:667] "Failed to get status for pod" podUID=364e4d33-8d3d-4>May 22 21:04:13 k3s-worker-nbg1-wtc k3s[1350]: E0522 21:04:13.452769 1350 pod_workers.go:965] "Error syncing pod, skipping" err="network is not ready:>
May 22 21:04:13 k3s-worker-nbg1-wtc k3s[1350]: E0522 21:04:13.452903 1350 pod_workers.go:965] "Error syncing pod, skipping" err="network is not ready:>~~
I manually tried to curl, but also ran into a timeout.
k3s-worker-nbg1-wtc:~# curl -vk "https://127.0.0.1:6444/api/v1/namespaces/kube-system/secrets?fieldSelector=metadata.name%3Dhubble-server-certs&resourceV
ersion=1680277"* Trying 127.0.0.1:6444...* Connected to 127.0.0.1 (127.0.0.1) port 6444 (#0)* ALPN: offers h2,http/1.1* TLSv1.3 (OUT), TLS handshake, Client hello (1):* Recv failure: Connection reset by peer* OpenSSL SSL_connect: Connection reset by peer in connection to 127.0.0.1:6444 * Closing connection 0curl: (35) Recv failure: Connection reset by peer
So I went on and compared the settings of /etc/rancher/k3s/config.yaml to one of the other nodes, but they are the same (except the own node-ip).
I checked the Routes and Subnets in the Hetzner console, but all looks fine:
I even spinned up another node in NBG, but it resulted in the same state.
Howerver spinning up another node in another FSN worked fine.
Questions
Are there any firewalls or settings on OS side which could block this ?
Does anyoned has this issue as well on nodes in NBG?
How could it be that the node registered itself to the server, but then is not able to connect to the kube-apiserver on port 6443 or 6444 anymore ?
Does the subnet range of the nodes "10.0.0.0/16" conflicts with some address space of the CNI or something else (I use cilium with encryption enabled) ?
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Hi,
Issue descriptions
I installed a cluster in eu-central region, with 3 control-plane nodes spawn across HEL, FSN, NBG as well as 3 agent nodes.
I used the lastest release
v2.1.6
.After the installation, all nodes were added to the cluster and become healthy, except the 1 agent node in NBG. The 1 node stays "unhealthy".
Debugging steps
I started debugging on the node, and found in the "journalctl" logs a lot of those errors
systemctl status k3s-agent is running but also showed a TLS handshake timeout error connection to:
https://127.0.0.1:6444/
I manually tried to curl, but also ran into a timeout.
So I went on and compared the settings of
/etc/rancher/k3s/config.yaml
to one of the other nodes, but they are the same (except the own node-ip).also curl of the api-server failed
I checked the Routes and Subnets in the Hetzner console, but all looks fine:

I even spinned up another node in NBG, but it resulted in the same state.
Howerver spinning up another node in another FSN worked fine.
Questions
Beta Was this translation helpful? Give feedback.
All reactions