Skip to content

Conversation

@jeanvetorello
Copy link
Contributor

Description

When using Ceph RBD as primary storage on KVM, the agent currently tries to run:

mountpoint -q /mnt/

This fails because RBD pools are not mounted like NFS or other network filesystems.
As a result, snapshot and template operations fail with errors such as:

libvirt failed to mount storage pool at /mnt/

This patch updates LibvirtStorageAdaptor to skip the mountpoint check when the storage pool type is RBD.
It prevents false errors while keeping the logic unchanged for NFS/NetFS pools.

Types of changes

Bug fix (non-breaking change which fixes an issue)

Bug Severity

Major (affects snapshot/template functionality on Ceph RBD)

How Has This Been Tested?

CloudStack 4.21.0.0 + Ceph RBD backend (client.cloudstack user)

Created multiple VM snapshots → successfully registered in Ceph

Created templates from snapshots → completed without mountpoint errors

Verified no regressions on other pool types (NFS untouched)

Checked agent.log: no further mountpoint failures for RBD pools

Related issues

N/A (first-time fix)

@shwstppr shwstppr changed the title git commit -m "KVM: skip mountpoint check for RBD storage pools" KVM: skip mountpoint check for RBD storage pools Sep 20, 2025
@codecov
Copy link

codecov bot commented Sep 20, 2025

Codecov Report

❌ Patch coverage is 60.00000% with 6 lines in your changes missing coverage. Please review.
✅ Project coverage is 17.39%. Comparing base (f9513b4) to head (945646b).
⚠️ Report is 101 commits behind head on main.

Files with missing lines Patch % Lines
.../hypervisor/kvm/storage/LibvirtStorageAdaptor.java 60.00% 4 Missing and 2 partials ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main   #11690      +/-   ##
============================================
+ Coverage     17.36%   17.39%   +0.03%     
- Complexity    15234    15283      +49     
============================================
  Files          5886     5889       +3     
  Lines        525680   526194     +514     
  Branches      64159    64243      +84     
============================================
+ Hits          91261    91544     +283     
- Misses       424120   424304     +184     
- Partials      10299    10346      +47     
Flag Coverage Δ
uitests 3.62% <ø> (-0.01%) ⬇️
unittests 18.44% <60.00%> (+0.04%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@DaanHoogland
Copy link
Contributor

@blueorangutan package

@blueorangutan
Copy link

@DaanHoogland a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.

Copy link
Contributor

@DaanHoogland DaanHoogland left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clgtm

@blueorangutan
Copy link

Packaging result [SF]: ✔️ el8 ✔️ el9 ✔️ el10 ✔️ debian ✔️ suse15. SL-JID 15124

@DaanHoogland DaanHoogland added this to the 4.22.0 milestone Sep 21, 2025
@harikrishna-patnala
Copy link
Contributor

@blueorangutan test

@blueorangutan
Copy link

@harikrishna-patnala a [SL] Trillian-Jenkins test job (ol8 mgmt + kvm-ol8) has been kicked to run smoke tests

@blueorangutan
Copy link

[SF] Trillian test result (tid-14419)
Environment: kvm-ol8 (x2), zone: Advanced Networking with Mgmt server ol8
Total time taken: 51416 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr11690-t14419-kvm-ol8.zip
Smoke tests completed. 146 look OK, 1 have errors, 0 did not run
Only failed and skipped tests results shown below:

Test Result Time (s) Test File
ContextSuite context=TestClusterDRS>:setup Error 0.00 test_cluster_drs.py

}
}

private void checkNetfsStoragePoolMounted(String uuid) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is called only once by

if (type == StoragePoolType.NetworkFilesystem) {
checkNetfsStoragePoolMounted(name);
}

this PR seems not needed

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jeanvetorello
have you faced any issue in your testing ? any logs ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the help ,
I’m going to open an issue describing the problem in detail, along with the logs, to see if you can help resolve it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @jeanvetorello

did you face the issue #11697 when you test with this patch or not ?

@jeanvetorello
Copy link
Contributor Author

Thanks for the reviews.
I’ve realized that this PR is not the right fix for the issue.
Skipping the NFS check allowed snapshots to proceed, but it prevents the actual backup to secondary storage, which is required.
Therefore, this is not a valid solution.
The problem is in the way the KVM agent assigns/mounts the pool UUID when handling Ceph + NFS snapshots, and needs to be fixed in another way.
I’ll close this PR and continue investigating the correct approach. Sorry for the noise.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants