Skip to content

Bug: e2e failure due to the kubelet's missing certificate #1920

@JKBGIT1

Description

@JKBGIT1

Current Behaviour

The testing-framework failed because some pods in the longhorn-system weren't running.

2025-12-11T11:46:41Z ERR claudie_test.go:138 > Error in test sets test-set1  error="error while performing additional test for manifest 1.yaml from test-set1 : error while checking if all pods from longhorn-system are ready in cluster ts1-ovh-cluster-test-set-no1: pods in longhorn-system took too long to initialize in cluster ts1-ovh-cluster-test-set-no1" module=testing-framework

Specifically, the longhorn-driver-deployer was crashlooping because the kubelet on a node running this pod didn't have a server certificate.

longhorn-driver-deployer-69c7675c65-q9qq8           0/1     CrashLoopBackOff   33 (102s ago)   124m

Logs from kubelet

pod_workers.go:1324] "Error syncing pod, skipping" err="failed to \"StartContainer\" for \"longhorn-driver-deployer\" with CrashLoopBackOff: \"back-off 5m0s restarting f>
1 12:03:38.697235    8513 ???:1] "http: TLS handshake error from 192.168.2.1:52576: no serving certificate available for the kubelet"

You can see below that there was only a client certificate.

root@ts1-ovh-cluster-test-set-no1-1a7itlo-ovh-cmpt-nodes-u48hs5d-01:~# ls /var/lib/kubelet/pki/
kubelet-client-2025-12-11-11-19-58.pem  kubelet-client-current.pem

The K8s is responsible for issuing a server certificate, and I found a lot of Pending CertificateSigningRequests. Ultimately, I had to manually approve one of them to issue the server certificate for kubelet.

$ k certificate approve csr-qdskk
certificatesigningrequest.certificates.k8s.io/csr-qdskk approved
$ k get csr
NAME        AGE    SIGNERNAME                      REQUESTOR                               REQUESTEDDURATION   CONDITION
csr-2bmz7   110m   kubernetes.io/kubelet-serving   system:node:ovh-cmpt-nodes-u48hs5d-01   <none>              Pending
csr-ktgbm   49m    kubernetes.io/kubelet-serving   system:node:ovh-cmpt-nodes-u48hs5d-01   <none>              Pending
csr-qdskk   12m    kubernetes.io/kubelet-serving   system:node:ovh-cmpt-nodes-u48hs5d-01   <none>              Approved,Issued
csr-rwpcl   80m    kubernetes.io/kubelet-serving   system:node:ovh-cmpt-nodes-u48hs5d-01   <none>              Pending
csr-s79qj   95m    kubernetes.io/kubelet-serving   system:node:ovh-cmpt-nodes-u48hs5d-01   <none>              Pending
csr-sqgxw   18m    kubernetes.io/kubelet-serving   system:node:ovh-cmpt-nodes-u48hs5d-01   <none>              Pending
csr-t9k4l   65m    kubernetes.io/kubelet-serving   system:node:ovh-cmpt-nodes-u48hs5d-01   <none>              Pending
csr-x2fgq   34m    kubernetes.io/kubelet-serving   system:node:ovh-cmpt-nodes-u48hs5d-01   <none>              Pending
csr-zm5vk   125m   kubernetes.io/kubelet-serving   system:node:ovh-cmpt-nodes-u48hs5d-01   <none>              Pending

This finally created one.

root@ts1-ovh-cluster-test-set-no1-1a7itlo-ovh-cmpt-nodes-u48hs5d-01:~# ls /var/lib/kubelet/pki/
kubelet-client-2025-12-11-11-19-58.pem  kubelet-client-current.pem  kubelet-server-2025-12-11-13-26-13.pem  kubelet-server-current.pem

And the deployment of the Longhorn apps moved on.

$ k get po --watch
NAME                                                READY   STATUS              RESTARTS   AGE
csi-attacher-68bb9b96db-6qk56                       0/1     ContainerCreating   0          2s
csi-attacher-68bb9b96db-jzq4t                       0/1     ContainerCreating   0          2s
csi-attacher-68bb9b96db-xls7b                       0/1     ContainerCreating   0          2s
csi-provisioner-68d9d8c99-9vxp6                     0/1     ContainerCreating   0          2s
csi-provisioner-68d9d8c99-vgbrt                     0/1     ContainerCreating   0          2s
csi-provisioner-68d9d8c99-xnf8k                     0/1     ContainerCreating   0          2s
csi-resizer-d9b695f4b-cs9sc                         0/1     ContainerCreating   0          2s
csi-resizer-d9b695f4b-hnj24                         0/1     ContainerCreating   0          2s
csi-resizer-d9b695f4b-pzttc                         0/1     ContainerCreating   0          2s
csi-snapshotter-ccd8b586-vtjqp                      0/1     ContainerCreating   0          2s
csi-snapshotter-ccd8b586-xms2v                      0/1     ContainerCreating   0          2s
csi-snapshotter-ccd8b586-zn4fk                      0/1     ContainerCreating   0          2s
engine-image-ei-db6c2b6f-wx4x2                      1/1     Running             0          127m
instance-manager-029f27c72f2044dd0b0528c47a0c21eb   1/1     Running             0          126m
longhorn-csi-plugin-shp8c                           0/3     ContainerCreating   0          2s
longhorn-driver-deployer-69c7675c65-46x8g           1/1     Running             0          7s
longhorn-manager-fpf8h                              2/2     Running             0          127m
longhorn-ui-8d76df79b-48fl5                         1/1     Running             0          127m
longhorn-ui-8d76df79b-dxggg                         1/1     Running             0          127m
csi-attacher-68bb9b96db-jzq4t                       1/1     Running             0          4s

Expected Behaviour

AFAIK, K8s stops supporting auto-approvals for server certificates issued for kubelet, but preferably, I don't want the e2e to fail on this issue.

Steps To Reproduce

No idea. Happened only once and quite randomly when running the e2e pipeline.

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions