-
Notifications
You must be signed in to change notification settings - Fork 54
Description
Current Behaviour
The testing-framework failed because some pods in the longhorn-system weren't running.
2025-12-11T11:46:41Z ERR claudie_test.go:138 > Error in test sets test-set1 error="error while performing additional test for manifest 1.yaml from test-set1 : error while checking if all pods from longhorn-system are ready in cluster ts1-ovh-cluster-test-set-no1: pods in longhorn-system took too long to initialize in cluster ts1-ovh-cluster-test-set-no1" module=testing-framework
Specifically, the longhorn-driver-deployer was crashlooping because the kubelet on a node running this pod didn't have a server certificate.
longhorn-driver-deployer-69c7675c65-q9qq8 0/1 CrashLoopBackOff 33 (102s ago) 124m
Logs from kubelet
pod_workers.go:1324] "Error syncing pod, skipping" err="failed to \"StartContainer\" for \"longhorn-driver-deployer\" with CrashLoopBackOff: \"back-off 5m0s restarting f>
1 12:03:38.697235 8513 ???:1] "http: TLS handshake error from 192.168.2.1:52576: no serving certificate available for the kubelet"
You can see below that there was only a client certificate.
root@ts1-ovh-cluster-test-set-no1-1a7itlo-ovh-cmpt-nodes-u48hs5d-01:~# ls /var/lib/kubelet/pki/
kubelet-client-2025-12-11-11-19-58.pem kubelet-client-current.pem
The K8s is responsible for issuing a server certificate, and I found a lot of Pending CertificateSigningRequests. Ultimately, I had to manually approve one of them to issue the server certificate for kubelet.
$ k certificate approve csr-qdskk
certificatesigningrequest.certificates.k8s.io/csr-qdskk approved
$ k get csr
NAME AGE SIGNERNAME REQUESTOR REQUESTEDDURATION CONDITION
csr-2bmz7 110m kubernetes.io/kubelet-serving system:node:ovh-cmpt-nodes-u48hs5d-01 <none> Pending
csr-ktgbm 49m kubernetes.io/kubelet-serving system:node:ovh-cmpt-nodes-u48hs5d-01 <none> Pending
csr-qdskk 12m kubernetes.io/kubelet-serving system:node:ovh-cmpt-nodes-u48hs5d-01 <none> Approved,Issued
csr-rwpcl 80m kubernetes.io/kubelet-serving system:node:ovh-cmpt-nodes-u48hs5d-01 <none> Pending
csr-s79qj 95m kubernetes.io/kubelet-serving system:node:ovh-cmpt-nodes-u48hs5d-01 <none> Pending
csr-sqgxw 18m kubernetes.io/kubelet-serving system:node:ovh-cmpt-nodes-u48hs5d-01 <none> Pending
csr-t9k4l 65m kubernetes.io/kubelet-serving system:node:ovh-cmpt-nodes-u48hs5d-01 <none> Pending
csr-x2fgq 34m kubernetes.io/kubelet-serving system:node:ovh-cmpt-nodes-u48hs5d-01 <none> Pending
csr-zm5vk 125m kubernetes.io/kubelet-serving system:node:ovh-cmpt-nodes-u48hs5d-01 <none> Pending
This finally created one.
root@ts1-ovh-cluster-test-set-no1-1a7itlo-ovh-cmpt-nodes-u48hs5d-01:~# ls /var/lib/kubelet/pki/
kubelet-client-2025-12-11-11-19-58.pem kubelet-client-current.pem kubelet-server-2025-12-11-13-26-13.pem kubelet-server-current.pem
And the deployment of the Longhorn apps moved on.
$ k get po --watch
NAME READY STATUS RESTARTS AGE
csi-attacher-68bb9b96db-6qk56 0/1 ContainerCreating 0 2s
csi-attacher-68bb9b96db-jzq4t 0/1 ContainerCreating 0 2s
csi-attacher-68bb9b96db-xls7b 0/1 ContainerCreating 0 2s
csi-provisioner-68d9d8c99-9vxp6 0/1 ContainerCreating 0 2s
csi-provisioner-68d9d8c99-vgbrt 0/1 ContainerCreating 0 2s
csi-provisioner-68d9d8c99-xnf8k 0/1 ContainerCreating 0 2s
csi-resizer-d9b695f4b-cs9sc 0/1 ContainerCreating 0 2s
csi-resizer-d9b695f4b-hnj24 0/1 ContainerCreating 0 2s
csi-resizer-d9b695f4b-pzttc 0/1 ContainerCreating 0 2s
csi-snapshotter-ccd8b586-vtjqp 0/1 ContainerCreating 0 2s
csi-snapshotter-ccd8b586-xms2v 0/1 ContainerCreating 0 2s
csi-snapshotter-ccd8b586-zn4fk 0/1 ContainerCreating 0 2s
engine-image-ei-db6c2b6f-wx4x2 1/1 Running 0 127m
instance-manager-029f27c72f2044dd0b0528c47a0c21eb 1/1 Running 0 126m
longhorn-csi-plugin-shp8c 0/3 ContainerCreating 0 2s
longhorn-driver-deployer-69c7675c65-46x8g 1/1 Running 0 7s
longhorn-manager-fpf8h 2/2 Running 0 127m
longhorn-ui-8d76df79b-48fl5 1/1 Running 0 127m
longhorn-ui-8d76df79b-dxggg 1/1 Running 0 127m
csi-attacher-68bb9b96db-jzq4t 1/1 Running 0 4s
Expected Behaviour
AFAIK, K8s stops supporting auto-approvals for server certificates issued for kubelet, but preferably, I don't want the e2e to fail on this issue.
Steps To Reproduce
No idea. Happened only once and quite randomly when running the e2e pipeline.