Skip to content

csi/node_server.go hides underlying cause of LUKS open failures/retries #1095

@sergeykuperman

Description

@sergeykuperman

i have an issue in my cluster, (using trident 25.06 ontap-san-economy driver, AWS FSX Ontap filesystem)
where volumes fail retry attach for a long time, before finally succeeding, and i cannot reach the root cause
of those failures because node_server.go hides underlying cause of LUKS open failure:
https://github.com/NetApp/trident/blob/master/frontend/csi/node_server.go#L1857-L1861
the only indication i see is multiple "could not set LUKS volume passphrase" events in my namespace (where attach is happening):

LAST SEEN   TYPE      REASON              OBJECT                                                 MESSAGE
32s         Warning   FailedScheduling    pod/workspaces-ws-hbisv-deployment-54486d45dc-rd9xp    0/10 nodes are available: pod has unbound immediate PersistentVolumeClaims. preemption: 0/10 nodes are available: 10 Preemption is not helpful for scheduling.
24s         Warning   FailedScheduling    pod/workspaces-ws-hbisv-deployment-54486d45dc-rd9xp    0/10 nodes are available: pod has unbound immediate PersistentVolumeClaims. preemption: 0/10 nodes are available: 10 Preemption is not helpful for scheduling.
20s         Normal    Scheduled           pod/workspaces-ws-hbisv-deployment-54486d45dc-rd9xp    Successfully assigned ws-ns-workspaces-ws-hbisv/workspaces-ws-hbisv-deployment-54486d45dc-rd9xp to ip-10-250-208-253.eu-central-1.compute.internal
32s         Normal    SuccessfulCreate    replicaset/workspaces-ws-hbisv-deployment-54486d45dc   Created pod: workspaces-ws-hbisv-deployment-54486d45dc-rd9xp
33s         Normal    ScalingReplicaSet   deployment/workspaces-ws-hbisv-deployment              Scaled up replica set workspaces-ws-hbisv-deployment-54486d45dc to 1
35s         Normal    NoPods              poddisruptionbudget/ws-hbisv-pdb                       No matching pods found
0s          Normal    SuccessfulAttachVolume   pod/workspaces-ws-hbisv-deployment-54486d45dc-rd9xp    AttachVolume.Attach succeeded for volume "pp-consume-1dec7b6b5810"
0s          Warning   FailedMount              pod/workspaces-ws-hbisv-deployment-54486d45dc-rd9xp    MountVolume.MountDevice failed for volume "pp-consume-1dec7b6b5810" : rpc error: code = Internal desc = could not set LUKS volume passphrase
0s          Warning   FailedMount              pod/workspaces-ws-hbisv-deployment-54486d45dc-rd9xp    MountVolume.MountDevice failed for volume "pp-consume-1dec7b6b5810" : rpc error: code = Internal desc = could not set LUKS volume passphrase
0s          Warning   FailedMount              pod/workspaces-ws-hbisv-deployment-54486d45dc-rd9xp    MountVolume.MountDevice failed for volume "pp-consume-1dec7b6b5810" : rpc error: code = Internal desc = could not set LUKS volume passphrase
0s          Warning   FailedMount              pod/workspaces-ws-hbisv-deployment-54486d45dc-rd9xp    MountVolume.MountDevice failed for volume "pp-consume-1dec7b6b5810" : rpc error: code = Internal desc = could not set LUKS volume passphrase
0s          Warning   FailedMount              pod/workspaces-ws-hbisv-deployment-54486d45dc-rd9xp    MountVolume.MountDevice failed for volume "pp-consume-1dec7b6b5810" : rpc error: code = Internal desc = could not set LUKS volume passphrase

other indicator is that tridentctl get volumes command takes more than a minute to return response, number of tridentvolumes in the cluster is around 3800.

eventually (after 5-10 mins) the attach succeeds, so the passphrase secret and passphrase (which are not changed and exist at the moment of attach) are correct

attaching csi node server logs:

trident-node.log

Please advise on how can i troubleshoot this issue

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions