Skip to content

CIFS NAS Backups failing "randomly" #12122

@Hanarion

Description

@Hanarion

problem

It seems that in my current conditions, the backups are failing because of a CIFS warning that is failing the Backup even if the backup was indeed successful

2025-11-24 11:05:32,938 ERROR [o.a.c.b.NASBackupProvider] (API-Job-Executor-11:[ctx-673f730c, job-3216, ctx-2492b04d]) (logid:2a7df0d3) Failed to take backup for VM i-12-606-VM: Job type:         Completed   
Operation:        Backup      
Time elapsed:     29235        ms
File processed:   23.000 GiB
File remaining:   0.000 B
File total:       23.000 GiB

2769033951
2025-11-24 11:05:32,946 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl] (API-Job-Executor-11:[ctx-673f730c, job-3216]) (logid:2a7df0d3) Complete async job-3216, jobStatus: FAILED, resultCode: 530, result: org.apache.cloudstack.api.response.ExceptionResponse/null/{"uuidList":[],"errorcode":"530","errortext":"Failed to create VM backup"}

I tried downgrading from smb 3.1.1 to 3.0, but it didn't fixed the issue, here is my full mount options :
//XXXX.XXXX/xxxxxx on /tmp/csbackup.YDH0M type cifs (rw,relatime,vers=3.0,cache=strict,upcall_target=app,username=xxxxxxx,uid=0,noforceuid,gid=0,noforcegid,addr=78.46.12.119,file_mode=0755,dir_mode=0755,soft,nounix,serverino,mapposix,nobrl,reparse=nfs,nativesocket,symlink=native,rsize=4194304,wsize=4194304,bsize=1048576,retrans=1,echo_interval=60,actimeo=1,closetimeo=1)

versions

  • CloudStack 4.22.0.0
[root@compute01 ~]# uname -a
Linux compute01 5.14.0-570.17.1.el9_6.x86_64 #1 SMP PREEMPT_DYNAMIC Fri May 23 22:47:01 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
[root@compute01 ~]# cat /etc/os-release 
NAME="Rocky Linux"
VERSION="9.6 (Blue Onyx)"
[root@compute01 ~]# ls -lah /tmp/csbackup.YDH0M/i-12-606-VM/2025.11.24.11.04.56/
total 1.4G
drwxr-xr-x. 2 root root    0 Nov 24 11:04 .
drwxr-xr-x. 2 root root    0 Nov 24 11:03 ..
-rwxr-xr-x. 1 root root 2.8M Nov 24 11:03 datadisk.96c4484b-2b51-46bd-9476-c7c9e0cedb87.qcow2
-rwxr-xr-x. 1 root root 5.7K Nov 24 11:04 domain-config.xml
-rwxr-xr-x. 1 root root  299 Nov 24 11:05 domblklist.xml
-rwxr-xr-x. 1 root root  166 Nov 24 11:04 domiflist.xml
-rwxr-xr-x. 1 root root  620 Nov 24 11:04 dominfo.xml
-rwxr-xr-x. 1 root root 2.6G Nov 24 11:04 root.98ee2a4e-8c87-4a2e-8b61-aac596fffebd.qcow2

The steps to reproduce the bug

  1. Configure the NAS Backup plugin
  2. Add a CIFS (SMB) backup repository
  3. Try to do multiple backups (Sometimes it fails, sometimes it doesnt)

What to do about it?

It seems that the CIFS mount doesn't really treat sync as it should in my current setup, and stays busy even after syncing :

[root@compute01 ~]# /usr/bin/bash /usr/share/cloudstack-common/scripts/vm/hypervisor/kvm/nasbackup.sh -o backup -v i-12-606-VM -t cifs -s '//XXXXXXXX.xxxxxxxx/xxxxxxxx' -m 'vers=3.0,username=xxxxxxx,password=xxxxxxxx' -p 'i-12-606-VM/2025.11.24.12.02.09' -q false -d ''
Job type:         Completed   
Operation:        Backup      
Time elapsed:     27079        ms
File processed:   23.000 GiB
File remaining:   0.000 B
File total:       23.000 GiB

2769033951
umount: /tmp/csbackup.rnf14: target is busy.

We could avoid this issue by checking if the mountpoint is still busy, for example with fuser (more available than lsof) :

  # Print statistics
  virsh -c qemu:///system domjobinfo $VM --completed
  du -sb $dest | cut -f1

  elapsed=0
  while fuser -m "$mount_point" >/dev/null 2>&1; do
    if (( elapsed >= 10 )); then
      echo "Timeout for unmounting reached: still busy"
      exit 1
    fi
    sleep 1
    elapsed=$((elapsed + 1))
  done

  umount $mount_point
  rmdir $mount_point

Also when i checked the code for the script, it seems that the backup_stopped_vm is not umounting and removing the mount point, i don't know if that's on purpose.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions