-
Notifications
You must be signed in to change notification settings - Fork 263
Description
i have an issue in my cluster, (using trident 25.06 ontap-san-economy driver, AWS FSX Ontap filesystem)
where volumes fail retry attach for a long time, before finally succeeding, and i cannot reach the root cause
of those failures because node_server.go hides underlying cause of LUKS open failure:
https://github.com/NetApp/trident/blob/master/frontend/csi/node_server.go#L1857-L1861
the only indication i see is multiple "could not set LUKS volume passphrase" events in my namespace (where attach is happening):
LAST SEEN TYPE REASON OBJECT MESSAGE
32s Warning FailedScheduling pod/workspaces-ws-hbisv-deployment-54486d45dc-rd9xp 0/10 nodes are available: pod has unbound immediate PersistentVolumeClaims. preemption: 0/10 nodes are available: 10 Preemption is not helpful for scheduling.
24s Warning FailedScheduling pod/workspaces-ws-hbisv-deployment-54486d45dc-rd9xp 0/10 nodes are available: pod has unbound immediate PersistentVolumeClaims. preemption: 0/10 nodes are available: 10 Preemption is not helpful for scheduling.
20s Normal Scheduled pod/workspaces-ws-hbisv-deployment-54486d45dc-rd9xp Successfully assigned ws-ns-workspaces-ws-hbisv/workspaces-ws-hbisv-deployment-54486d45dc-rd9xp to ip-10-250-208-253.eu-central-1.compute.internal
32s Normal SuccessfulCreate replicaset/workspaces-ws-hbisv-deployment-54486d45dc Created pod: workspaces-ws-hbisv-deployment-54486d45dc-rd9xp
33s Normal ScalingReplicaSet deployment/workspaces-ws-hbisv-deployment Scaled up replica set workspaces-ws-hbisv-deployment-54486d45dc to 1
35s Normal NoPods poddisruptionbudget/ws-hbisv-pdb No matching pods found
0s Normal SuccessfulAttachVolume pod/workspaces-ws-hbisv-deployment-54486d45dc-rd9xp AttachVolume.Attach succeeded for volume "pp-consume-1dec7b6b5810"
0s Warning FailedMount pod/workspaces-ws-hbisv-deployment-54486d45dc-rd9xp MountVolume.MountDevice failed for volume "pp-consume-1dec7b6b5810" : rpc error: code = Internal desc = could not set LUKS volume passphrase
0s Warning FailedMount pod/workspaces-ws-hbisv-deployment-54486d45dc-rd9xp MountVolume.MountDevice failed for volume "pp-consume-1dec7b6b5810" : rpc error: code = Internal desc = could not set LUKS volume passphrase
0s Warning FailedMount pod/workspaces-ws-hbisv-deployment-54486d45dc-rd9xp MountVolume.MountDevice failed for volume "pp-consume-1dec7b6b5810" : rpc error: code = Internal desc = could not set LUKS volume passphrase
0s Warning FailedMount pod/workspaces-ws-hbisv-deployment-54486d45dc-rd9xp MountVolume.MountDevice failed for volume "pp-consume-1dec7b6b5810" : rpc error: code = Internal desc = could not set LUKS volume passphrase
0s Warning FailedMount pod/workspaces-ws-hbisv-deployment-54486d45dc-rd9xp MountVolume.MountDevice failed for volume "pp-consume-1dec7b6b5810" : rpc error: code = Internal desc = could not set LUKS volume passphrase
other indicator is that tridentctl get volumes command takes more than a minute to return response, number of tridentvolumes in the cluster is around 3800.
eventually (after 5-10 mins) the attach succeeds, so the passphrase secret and passphrase (which are not changed and exist at the moment of attach) are correct
attaching csi node server logs:
Please advise on how can i troubleshoot this issue