Skip to content

Conversation

@sureshanaparti
Copy link
Contributor

@sureshanaparti sureshanaparti commented Mar 19, 2025

Description

This PR addresses power on & move datastore issues with VMware 80u2 and 80u3, and fixes tests for restore vm (disk cleanup issue), ssvm/cpvm (wait time to disconnect). Following are the details:

  • Power On issue
    lock issue during PowerOn operation (for ROOT disk, which is cloned from seeded template)
PowerOn-LockOnDisk

https://knowledge.broadcom.com/external/article/316576/troubleshooting-issues-resulting-from-lo.html

Now, PowerOn operation is being tried after lock issue, and succeeds after few attempts.

  • Move datastore file issue
    access issue with move datastore file operation (when creating volume from snapshot)
CreateVolumeFromSnapshot-MoveDiskFailed

Now, Move datastore file operation is being tried after access issue, and succeeds after few attempts.

  • Restore VM test behavior/issue
    test checks for old root disk existence and trying to cleanup.
RestoreVM-RootDiskNotFound

But for VMware, old root disk is force expunged irrespective of the expunge parameter in restoreVirtualMachine API (which was introduced in PR #8800 - 4.19.1.0), so no need to check for old root disk exists when expunge is false.

// In case of VMware VM will continue to use the old root disk until expunged, so force expunge old root disk
// For system VM we do not need volume entry in Destroy state
if (vm.getHypervisorType() == HypervisorType.VMware || vm.getType().isUsedBySystem()) {
logger.info(String.format("Trying to expunge volume [%s] from primary data storage.", volumeToString));
AsyncCallFuture<VolumeApiResult> future = volService.expungeVolumeAsync(volFactory.getVolume(existingVolume.getId()));

  • SSVM / CPVM test issue
    Agent will be in Up state for a while after reboot, so need to wait for sometime for the agent to Disconnect and back to Up. Added wait time after reboot.

Other PR references:

VMware version references:

Types of changes

  • Breaking change (fix or feature that would cause existing functionality to change)
  • New feature (non-breaking change which adds functionality)
  • Bug fix (non-breaking change which fixes an issue)
  • Enhancement (improves an existing feature and functionality)
  • Cleanup (Code refactoring and cleanup, that may add test cases)
  • build/CI
  • test (unit or integration test code)

Feature/Enhancement Scale or Bug Severity

Feature/Enhancement Scale

  • Major
  • Minor

Bug Severity

  • BLOCKER
  • Critical
  • Major
  • Minor
  • Trivial

Screenshots (if appropriate):

How Has This Been Tested?

Tested VM/Volume operations manually, and smoke tests.

How did you try to break this feature and the system with this change?

@sureshanaparti sureshanaparti changed the base branch from main to 4.20 March 19, 2025 08:10
@sureshanaparti
Copy link
Contributor Author

@blueorangutan package

@codecov
Copy link

codecov bot commented Mar 19, 2025

Codecov Report

Attention: Patch coverage is 0% with 129 lines in your changes missing coverage. Please review.

Project coverage is 16.99%. Comparing base (dd84c74) to head (69d2c10).
Report is 18 commits behind head on 4.20.

Files with missing lines Patch % Lines
...oud/hypervisor/vmware/resource/VmwareResource.java 0.00% 63 Missing ⚠️
...ud/storage/resource/VmwareStorageLayoutHelper.java 0.00% 27 Missing ⚠️
...m/cloud/hypervisor/vmware/mo/VirtualMachineMO.java 0.00% 22 Missing ⚠️
...cloud/storage/resource/VmwareStorageProcessor.java 0.00% 16 Missing ⚠️
.../src/main/java/com/cloud/vm/UserVmManagerImpl.java 0.00% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##              4.20   #10586       +/-   ##
============================================
+ Coverage     4.00%   16.99%   +12.98%     
- Complexity       0    13238    +13238     
============================================
  Files          400     5255     +4855     
  Lines        32555   464813   +432258     
  Branches      5770    54518    +48748     
============================================
+ Hits          1305    78974    +77669     
- Misses       31101   377045   +345944     
- Partials       149     8794     +8645     
Flag Coverage Δ
uitests ?
unittests 16.99% <0.00%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@sureshanaparti
Copy link
Contributor Author

@blueorangutan package

@blueorangutan
Copy link

@sureshanaparti a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.

Copy link
Member

@weizhouapache weizhouapache left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@blueorangutan
Copy link

Packaging result [SF]: ✔️ el8 ✔️ el9 ✔️ debian ✔️ suse15. SL-JID 12828

@sureshanaparti
Copy link
Contributor Author

@blueorangutan package

@blueorangutan
Copy link

@sureshanaparti a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.

@rohityadavcloud rohityadavcloud added this to the 4.20.1 milestone Mar 19, 2025
@blueorangutan
Copy link

Packaging result [SF]: ✔️ el8 ✔️ el9 ✔️ debian ✔️ suse15. SL-JID 12829

@sureshanaparti
Copy link
Contributor Author

@blueorangutan test ol8 vmware-80u3

@blueorangutan
Copy link

@sureshanaparti a [SL] Trillian-Jenkins test job (ol8 mgmt + vmware-80u3) has been kicked to run smoke tests

@sureshanaparti
Copy link
Contributor Author

@blueorangutan test ol8 vmware-80u2

@Pearl1594 Pearl1594 moved this to In Progress in ACS 4.20.1 Mar 19, 2025
@blueorangutan
Copy link

@sureshanaparti a [SL] Trillian-Jenkins test job (ol8 mgmt + vmware-80u2) has been kicked to run smoke tests

@sureshanaparti
Copy link
Contributor Author

@blueorangutan test ol8 vmware-80u1

@blueorangutan
Copy link

@sureshanaparti a [SL] Trillian-Jenkins test job (ol8 mgmt + vmware-80u1) has been kicked to run smoke tests

@blueorangutan
Copy link

[SF] Trillian test result (tid-12768)
Environment: vmware-80u3 (x2), Advanced Networking with Mgmt server ol8
Total time taken: 78805 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr10586-t12768-vmware-80u3.zip
Smoke tests completed. 134 look OK, 7 have errors, 0 did not run
Only failed and skipped tests results shown below:

Test Result Time (s) Test File
test_deploy_more_vms_than_limit_allows Error 223.40 test_deploy_vms_in_parallel.py
test_03_deploy_and_scale_kubernetes_cluster Failure 102.81 test_kubernetes_clusters.py
test_05_basic_lifecycle_kubernetes_cluster Error 88.31 test_kubernetes_clusters.py
test_06_delete_kubernetes_cluster Failure 255.87 test_kubernetes_clusters.py
test_reboot_router Error 459.86 test_network.py
ContextSuite context=TestSharedNetworkWithConfigDrive>:setup Error 1521.98 test_network.py
test_02_restore_vm_with_disk_offering Error 104.27 test_restore_vm.py
test_03_restore_vm_with_disk_offering_custom_size Error 60.25 test_restore_vm.py
test_01_snapshot_to_volume Error 9.37 test_snapshots.py
test_08_reboot_cpvm Failure 39.37 test_ssvm.py
test_01_volume_usage Error 95.45 test_usage.py

@blueorangutan
Copy link

[SF] Trillian test result (tid-12771)
Environment: vmware-80u1 (x2), Advanced Networking with Mgmt server ol8
Total time taken: 76196 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr10586-t12771-vmware-80u1.zip
Smoke tests completed. 136 look OK, 5 have errors, 0 did not run
Only failed and skipped tests results shown below:

Test Result Time (s) Test File
test_09_connectivity_between_network_and_vpc_tier Failure 34.56 test_ipv4_routing.py
test_11_isolated_network_with_dynamic_routed_mode Failure 163.88 test_ipv4_routing.py
test_12_vpc_and_tier_with_dynamic_routed_mode Failure 411.53 test_ipv4_routing.py
test_01_ping_in_vr_success Failure 60.52 test_diagnostics.py
test_04_autoscale_kubernetes_cluster Failure 158.03 test_kubernetes_clusters.py
test_05_basic_lifecycle_kubernetes_cluster Failure 800.65 test_kubernetes_clusters.py
ContextSuite context=TestSharedNetworkWithConfigDrive>:setup Error 1524.38 test_network.py
test_02_restore_vm_with_disk_offering Error 58.20 test_restore_vm.py
test_03_restore_vm_with_disk_offering_custom_size Error 114.73 test_restore_vm.py

@weizhouapache
Copy link
Member

@blueorangutan package

@blueorangutan
Copy link

@weizhouapache a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.

@blueorangutan
Copy link

Packaging result [SF]: ✔️ el8 ✖️ el9 ✔️ debian ✖️ suse15. SL-JID 12844

@blueorangutan
Copy link

[SF] Trillian test result (tid-12770)
Environment: vmware-80u2 (x2), Advanced Networking with Mgmt server ol8
Total time taken: 80994 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr10586-t12770-vmware-80u2.zip
Smoke tests completed. 134 look OK, 7 have errors, 0 did not run
Only failed and skipped tests results shown below:

Test Result Time (s) Test File
ContextSuite context=TestClusterDRS>:setup Error 0.00 test_cluster_drs.py
test_02_internallb_roundrobin_1RVPC_3VM_HTTP_port80 Failure 488.47 test_internal_lb.py
test_02_internallb_roundrobin_1RVPC_3VM_HTTP_port80 Error 488.49 test_internal_lb.py
test_03_vpc_internallb_haproxy_stats_on_all_interfaces Error 123.15 test_internal_lb.py
test_03_vpc_internallb_haproxy_stats_on_all_interfaces Error 123.16 test_internal_lb.py
test_04_rvpc_internallb_haproxy_stats_on_all_interfaces Error 331.55 test_internal_lb.py
test_04_rvpc_internallb_haproxy_stats_on_all_interfaces Error 331.58 test_internal_lb.py
test_09_connectivity_between_network_and_vpc_tier Error 149.42 test_ipv4_routing.py
test_08_upgrade_kubernetes_ha_cluster Failure 3692.24 test_kubernetes_clusters.py
ContextSuite context=TestSharedNetworkWithConfigDrive>:setup Error 1524.99 test_network.py
test_02_restore_vm_with_disk_offering Error 60.36 test_restore_vm.py
test_03_restore_vm_with_disk_offering_custom_size Error 102.21 test_restore_vm.py
test_01_volume_usage Error 88.37 test_usage.py

@Pearl1594
Copy link
Contributor

@blueorangutan test ol8 vmware-80u1

@blueorangutan
Copy link

[SF] Trillian test result (tid-13243)
Environment: vmware-80u3 (x2), Advanced Networking with Mgmt server ol8
Total time taken: 87874 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr10586-t13243-vmware-80u3.zip
Smoke tests completed. 137 look OK, 4 have errors, 0 did not run
Only failed and skipped tests results shown below:

Test Result Time (s) Test File
ContextSuite context=TestClusterDRS>:setup Error 0.00 test_cluster_drs.py
test_deploy_more_vms_than_limit_allows Error 217.70 test_deploy_vms_in_parallel.py
test_01_deployVMInSharedNetwork Failure 3624.54 test_network.py
ContextSuite context=TestSharedNetworkWithConfigDrive>:teardown Error 3625.85 test_network.py
test_03_restore_vm_with_disk_offering_custom_size Error 73.04 test_restore_vm.py

@sureshanaparti sureshanaparti force-pushed the vmware-80u2-and-80u3-updates branch from f0d113e to 69d2c10 Compare May 14, 2025 10:59
@sureshanaparti
Copy link
Contributor Author

@blueorangutan package

@blueorangutan
Copy link

@sureshanaparti a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.

@blueorangutan
Copy link

Packaging result [SF]: ✔️ el8 ✔️ el9 ✔️ debian ✔️ suse15. SL-JID 13381

@DaanHoogland
Copy link
Contributor

@blueorangutan test ol8 vmware-80u3

@blueorangutan
Copy link

@DaanHoogland a [SL] Trillian-Jenkins test job (ol8 mgmt + vmware-80u3) has been kicked to run smoke tests

@blueorangutan
Copy link

[SF] Trillian test result (tid-13307)
Environment: vmware-80u3 (x2), Advanced Networking with Mgmt server ol8
Total time taken: 122502 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr10586-t13307-vmware-80u3.zip
Smoke tests completed. 137 look OK, 4 have errors, 0 did not run
Only failed and skipped tests results shown below:

Test Result Time (s) Test File
test_08_upgrade_kubernetes_ha_cluster Failure 3711.40 test_kubernetes_clusters.py
test_01_deployVMInSharedNetwork Error 414.13 test_network.py
test_03_restore_vm_with_disk_offering_custom_size Error 67.66 test_restore_vm.py
test_01_migrate_vm_strict_tags_success Error 3607.88 test_vm_strict_host_tags.py
test_02_migrate_vm_strict_tags_failure Error 4.12 test_vm_strict_host_tags.py
test_01_restore_vm_strict_tags_success Error 23.90 test_vm_strict_host_tags.py
test_02_restore_vm_strict_tags_failure Error 3604.35 test_vm_strict_host_tags.py
test_01_scale_vm_strict_tags_success Error 24.86 test_vm_strict_host_tags.py
test_02_scale_vm_strict_tags_failure Error 3607.30 test_vm_strict_host_tags.py
test_01_deploy_vm_on_specific_host_without_strict_tags Error 23.42 test_vm_strict_host_tags.py
test_02_deploy_vm_on_any_host_without_strict_tags Error 3605.85 test_vm_strict_host_tags.py
test_03_deploy_vm_on_specific_host_with_strict_tags_success Error 6.14 test_vm_strict_host_tags.py
test_04_deploy_vm_on_any_host_with_strict_tags_success Error 6.03 test_vm_strict_host_tags.py

@sureshanaparti
Copy link
Contributor Author

@blueorangutan test ol8 vmware-80u2

@blueorangutan
Copy link

@sureshanaparti a [SL] Trillian-Jenkins test job (ol8 mgmt + vmware-80u2) has been kicked to run smoke tests

@sureshanaparti
Copy link
Contributor Author

@blueorangutan test ol8 vmware-80u1

@blueorangutan
Copy link

@sureshanaparti a [SL] Trillian-Jenkins test job (ol8 mgmt + vmware-80u1) has been kicked to run smoke tests

@apache apache deleted a comment from blueorangutan May 16, 2025
Copy link
Contributor

@vladimirpetrov vladimirpetrov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM based on manual testing.

@rohityadavcloud rohityadavcloud merged commit 90316b2 into apache:4.20 May 16, 2025
21 of 26 checks passed
@github-project-automation github-project-automation bot moved this from In Progress to Done in ACS 4.20.1 May 16, 2025
@rohityadavcloud rohityadavcloud deleted the vmware-80u2-and-80u3-updates branch May 16, 2025 19:09
@blueorangutan
Copy link

[SF] Trillian test result (tid-13328)
Environment: vmware-80u2 (x2), Advanced Networking with Mgmt server ol8
Total time taken: 75117 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr10586-t13328-vmware-80u2.zip
Smoke tests completed. 137 look OK, 4 have errors, 0 did not run
Only failed and skipped tests results shown below:

Test Result Time (s) Test File
test_deploy_more_vms_than_limit_allows Error 196.18 test_deploy_vms_in_parallel.py
test_01_deployVMInSharedNetwork Failure 3604.36 test_network.py
ContextSuite context=TestSharedNetworkWithConfigDrive>:teardown Error 3605.64 test_network.py
test_03_restore_vm_with_disk_offering_custom_size Error 57.38 test_restore_vm.py
test_01_scale_vm Error 1.53 test_scale_vm.py
test_02_scale_vm_negative_offering_disable_scaling Error 1.49 test_scale_vm.py
test_03_scale_vm_negative_vm_disable_scaling Error 1.46 test_scale_vm.py

dhslove pushed a commit to ablecloud-team/ablestack-cloud that referenced this pull request Jun 19, 2025
* VMware - Ignore disk not found error on cleanup when the VM disk doesn't exists

* VMware - Retry powerOn on lock issues

* addressed comments

* Update CPVM reboot tests - wait for the agent to Disconnect and back Up

* Retry moveDatastoreFile when any file access issue while creating volume from snapshot

* Update full clone flag when restoring vm using root disk offering with more size than the template size

* refactored (mainly,for diskInfo - causing NPE in some cases)

* Retry moveDatastoreFile when there is any file access issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

8 participants