Skip to content

Potential race conditions with Replication #517

@knguyen0125

Description

@knguyen0125

Bug Report

Description

When creating VM for replication, there is a brief period of time where the VM is created successfully, but not immediately available for query. This cause CreateVolume to fail and recreate the VM until it succeeds.

Suggested Fix

Wait or ensure that VM is available for querying via API

Logs

Controller: [kubectl logs -c proxmox-csi-plugin-controller proxmox-csi-plugin-controller-...]

I0106 10:43:01.231634       1 controller.go:123] "CreateVolume: called" args="{\"accessibility_requirements\":{\"preferred\":[{\"segments\":{\"topology.kubernetes.io/region\":\"hanoi\",\"topology.kubernetes.io/zone\":\"beta\"}},{\"segments\":{\"topology
I0106 10:43:01.231781       1 controller.go:145] "CreateVolume: parameters" parameters={"storage":"local-zfs","storageFormat":"","backup":false,"cache":"none","iothread":true,"ssd":true,"diskIOPS":null,"diskMBps":null,"blockSize":null,"inodeSize":null,"
I0106 10:43:02.364553       1 controller.go:320] "CreateVolume: creating volume" cluster="hanoi" zone="beta" volumeID="hanoi/beta/local-zfs/vm-10000-pvc-c230b8c3-0cd5-4b17-8bb2-242ae16261f2" size=1073741824                                               
E0106 10:43:03.961237       1 controller.go:391] "CreateVolume: failed to create replication" err="failed to get vm config: VM machine not found" cluster="hanoi" volumeID="hanoi/beta/local-zfs/vm-10000-pvc-c230b8c3-0cd5-4b17-8bb2-242ae16261f2" vmID=1000
E0106 10:43:03.961267       1 main.go:101] "GRPC error" err="rpc error: code = Internal desc = failed to get vm config: VM machine not found"                                                                                                                
I0106 10:43:04.963476       1 controller.go:123] "CreateVolume: called" args="{\"accessibility_requirements\":{\"preferred\":[{\"segments\":{\"topology.kubernetes.io/region\":\"hanoi\",\"topology.kubernetes.io/zone\":\"beta\"}},{\"segments\":{\"topology
I0106 10:43:04.963556       1 controller.go:145] "CreateVolume: parameters" parameters={"storage":"local-zfs","storageFormat":"","backup":false,"cache":"none","iothread":true,"ssd":true,"diskIOPS":null,"diskMBps":null,"blockSize":null,"inodeSize":null,"
I0106 10:43:06.084006       1 controller.go:320] "CreateVolume: creating volume" cluster="hanoi" zone="beta" volumeID="hanoi/beta/local-zfs/vm-10001-pvc-c230b8c3-0cd5-4b17-8bb2-242ae16261f2" size=1073741824                                               
I0106 10:43:08.987201       1 controller.go:401] "CreateVolume: volume created" cluster="hanoi" volumeID="hanoi//local-zfs/vm-10001-pvc-c230b8c3-0cd5-4b17-8bb2-242ae16261f2" size=1073741824                                                                
I0106 10:43:09.294002       1 controller.go:499] "ControllerPublishVolume: called" args="{\"node_id\":\"disposable-cluster-worker-general-purpose-05\",\"volume_capability\":{\"access_mode\":{\"mode\":\"SINGLE_NODE_MULTI_WRITER\"},\"mount\":{\"fs_type\":
I0106 10:43:09.294111       1 controller.go:540] "ControllerPublishVolume: VM ID not found in NodeID, will lookup by node name" nodeID="disposable-cluster-worker-general-purpose-05"                                                                        
I0106 10:43:09.297529       1 controller.go:1059] "failed to get proxmox VMID from ProviderID" nodeID="disposable-cluster-worker-general-purpose-05" providerID="rke2://disposable-cluster-worker-general-purpose-05"                                        
I0106 10:43:10.210824       1 controller.go:1109] "checkVolume: determined node for volume" cluster="hanoi" volumeID="hanoi//local-zfs/vm-10001-pvc-c230b8c3-0cd5-4b17-8bb2-242ae16261f2" node="beta"                                                        
I0106 10:43:11.725754       1 controller.go:591] "ControllerPublishVolume: volume published" cluster="hanoi" volumeID="hanoi//local-zfs/vm-10001-pvc-c230b8c3-0cd5-4b17-8bb2-242ae16261f2" nodeID="disposable-cluster-worker-general-purpose-05"             

Node: [kubectl logs -c proxmox-csi-plugin-node proxmox-csi-plugin-node-...]

Environment

  • Plugin version: 0.17.1
  • Kubernetes version: [kubectl version --short] v1.33.0+rke2r1
  • CSI capasity: [kubectl get csistoragecapacities -ocustom-columns=CLASS:.storageClassName,AVAIL:.capacity,ZONE:.nodeTopology.matchLabels -A]
CLASS                          AVAIL         ZONE
proxmox-local-zfs              598013448Ki   map[topology.kubernetes.io/region:hanoi topology.kubernetes.io/zone:gamma]
proxmox-local-zfs              602155980Ki   map[topology.kubernetes.io/region:hanoi topology.kubernetes.io/zone:beta]
proxmox-local-zfs-replicated   602155980Ki   map[topology.kubernetes.io/region:hanoi topology.kubernetes.io/zone:beta]
proxmox-local-zfs-replicated   588036548Ki   map[topology.kubernetes.io/region:hanoi topology.kubernetes.io/zone:alpha]
proxmox-local-zfs-replicated   598013448Ki   map[topology.kubernetes.io/region:hanoi topology.kubernetes.io/zone:gamma]
proxmox-local-zfs              588036548Ki   map[topology.kubernetes.io/region:hanoi topology.kubernetes.io/zone:alpha]
  • CSI resource on the node: [kubectl get CSINode <node> -oyaml]
  • Node describe: [kubectl describe node <node>]
  • OS version [cat /etc/os-release]

Community Note

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or other comments that do not add relevant new information or questions, they generate extra noise for issue followers and do not help prioritize the request

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions