Skip to content

Conversation

@SadiJr
Copy link
Contributor

@SadiJr SadiJr commented Feb 14, 2023

Description

Using the VMware hypervisor, when migrating/resizing one volume, with or without IOPS limitation, and changing the disk offering, this volume keeps the configurations of IOPS of the original offering, only applying the new configurations when detaching and attaching the volume. This PR aims to fix this behavior, to apply the new IOPS configuration when migration/resizing a volume changing the disk offering.

Types of changes

  • Breaking change (fix or feature that would cause existing functionality to change)
  • New feature (non-breaking change which adds functionality)
  • Bug fix (non-breaking change which fixes an issue)
  • Enhancement (improves an existing feature and functionality)
  • Cleanup (Code refactoring and cleanup, that may add test cases)

Feature/Enhancement Scale or Bug Severity

Bug Severity

  • BLOCKER
  • Critical
  • Major
  • Minor
  • Trivial

Screenshots (if appropriate):

How Has This Been Tested?

It was tested in a local lab:

  1. I created a new VM, and three new disk offerings, one without IOPS limitation, the second with 3000 IOPS limitation, and the third with 5000 IOPS limitation;
  2. I tested migrate/resize this VM volume, using all the three new disk offerings;
  3. Before the changes, the VM definition in vCenter did not have its IOPS configuration changed;
  4. Now, the VM definition in vCenter has its IOPS configuration changed.

@codecov
Copy link

codecov bot commented Feb 14, 2023

Codecov Report

Attention: Patch coverage is 0% with 88 lines in your changes missing coverage. Please review.

Project coverage is 16.60%. Comparing base (41b4f0a) to head (1684347).
Report is 31 commits behind head on main.

Files with missing lines Patch % Lines
...oud/hypervisor/vmware/resource/VmwareResource.java 0.00% 34 Missing ⚠️
...m/cloud/agent/api/storage/ResizeVolumeCommand.java 0.00% 12 Missing ⚠️
...n/java/com/cloud/storage/VolumeApiServiceImpl.java 0.00% 9 Missing and 3 partials ⚠️
...ck/storage/motion/VmwareStorageMotionStrategy.java 0.00% 7 Missing ⚠️
.../cloud/agent/api/storage/MigrateVolumeCommand.java 0.00% 6 Missing ⚠️
...tack/storage/motion/AncientDataMotionStrategy.java 0.00% 6 Missing ⚠️
...stack/engine/orchestration/VolumeOrchestrator.java 0.00% 5 Missing ⚠️
...m/cloud/hypervisor/vmware/mo/VirtualMachineMO.java 0.00% 3 Missing ⚠️
...e/driver/CloudStackPrimaryDataStoreDriverImpl.java 0.00% 2 Missing ⚠️
.../apache/cloudstack/vm/UnmanagedVMsManagerImpl.java 0.00% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main    #7226      +/-   ##
============================================
+ Coverage     16.57%   16.60%   +0.02%     
- Complexity    13870    13924      +54     
============================================
  Files          5719     5730      +11     
  Lines        507200   508166     +966     
  Branches      61574    61783     +209     
============================================
+ Hits          84093    84384     +291     
- Misses       413688   414345     +657     
- Partials       9419     9437      +18     
Flag Coverage Δ
uitests 3.93% <ø> (-0.03%) ⬇️
unittests 17.49% <0.00%> (+0.04%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

String attachedVmName;
Volume.Type volumeType;
String hostGuidInTargetCluster;
Long newIops;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain why newIops is added , not newIopsRead/newIopsWrite ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

VMware hypervisor does not allow specifying the IOPS for read and write operations. Instead, you can only specify the IOPS for the disk.

image

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @SadiJr
MigrateVolumeCommand is a class in core module. It might be used if there are similar issues with kvm and/or xenserver which support IOPS read/write.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hsato03 Do you think that it makes sense to separate it into read and write IOPS? As @weizhouapache said, this is an agnostic command, so it should be treated like that (meaning separate read and write IOPS).

@sonarqubecloud
Copy link

SonarCloud Quality Gate failed.    Quality Gate failed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 1 Code Smell

0.0% 0.0% Coverage
0.0% 0.0% Duplication

@DaanHoogland DaanHoogland added this to the 4.19.0.0 milestone Jun 22, 2023
@github-actions
Copy link

github-actions bot commented Jul 7, 2023

This pull request has merge conflicts. Dear author, please fix the conflicts and sync your branch with the base branch.

@shwstppr
Copy link
Contributor

shwstppr commented Oct 9, 2023

@SadiJr can you please check the review comments

@github-actions
Copy link

This pull request has merge conflicts. Dear author, please fix the conflicts and sync your branch with the base branch.

@DaanHoogland
Copy link
Contributor

@blueorangutan package

@blueorangutan
Copy link

@DaanHoogland a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.

@blueorangutan
Copy link

Packaging result [SF]: ✔️ el7 ✔️ el8 ✔️ el9 ✔️ debian ✔️ suse15. SL-JID 7542

@DaanHoogland
Copy link
Contributor

@SadiJr can you answer any questions/address any comments, please?

@DaanHoogland
Copy link
Contributor

ping @SadiJr

@SadiJr
Copy link
Contributor Author

SadiJr commented Nov 21, 2023

@DaanHoogland Sorry for the delay, I will review the comments and work on this PR.

@shwstppr shwstppr modified the milestones: 4.19.0.0, 4.19.1.0 Dec 14, 2023
@SadiJr SadiJr marked this pull request as draft December 15, 2023 13:40
@github-actions
Copy link

github-actions bot commented Feb 8, 2024

This pull request has merge conflicts. Dear author, please fix the conflicts and sync your branch with the base branch.

@BryanMLima BryanMLima self-assigned this Mar 1, 2024
@hsato03
Copy link
Collaborator

hsato03 commented Jun 5, 2025

@blueorangutan package

@blueorangutan
Copy link

@hsato03 a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.

@blueorangutan
Copy link

Packaging result [SF]: ✔️ el8 ✔️ el9 ✔️ debian ✔️ suse15. SL-JID 13640

@DaanHoogland
Copy link
Contributor

@blueorangutan test

@blueorangutan
Copy link

@DaanHoogland a [SL] Trillian-Jenkins test job (ol8 mgmt + kvm-ol8) has been kicked to run smoke tests

@blueorangutan
Copy link

[SF] Trillian test result (tid-13484)
Environment: kvm-ol8 (x2), Advanced Networking with Mgmt server ol8
Total time taken: 65136 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr7226-t13484-kvm-ol8.zip
Smoke tests completed. 140 look OK, 1 have errors, 0 did not run
Only failed and skipped tests results shown below:

Test Result Time (s) Test File
test_01_deployVMInSharedNetwork Failure 451.83 test_network.py

@DaanHoogland
Copy link
Contributor

@hsato03 @BryanMLima , apart from the test_network failure, what is the status of this PR?

@hsato03
Copy link
Collaborator

hsato03 commented Jun 13, 2025

@DaanHoogland The PR is ready to be reviewed and tested. Also, I have reproduced the tests from the description and it's working as expected.

Regarding the test failure, it seems to be unrelated to the PR.

FAIL: test_01_deployVMInSharedNetwork (tests.smoke.test_network.TestSharedNetworkWithConfigDrive)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/marvin/tests/smoke/test_network.py", line 2395, in test_01_deployVMInSharedNetwork
    self._umount_config_drive(ssh, mount_path)
  File "/marvin/tests/smoke/test_network.py", line 2319, in _umount_config_drive
    "but contains: %s" % result)
AssertionError: False is not true : After umount directory should be empty but contains: ['sudo: unable to resolve host VM-a1746bba-97eb-47b9-9d6a-70ac9c03262b: Temporary failure in name resolution']

cc @weizhouapache @sureshanaparti @JoaoJandre

@hsato03
Copy link
Collaborator

hsato03 commented Jun 13, 2025

@blueorangutan package

@blueorangutan
Copy link

@hsato03 a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.

@blueorangutan
Copy link

Packaging result [SF]: ✔️ el8 ✔️ el9 ✔️ debian ✔️ suse15. SL-JID 13772

@DaanHoogland
Copy link
Contributor

@blueorangutan test ol8 vmware-70u3

@blueorangutan
Copy link

@DaanHoogland a [SL] Trillian-Jenkins test job (ol8 mgmt + vmware-70u3) has been kicked to run smoke tests

@blueorangutan
Copy link

[SF] Trillian test result (tid-13524)
Environment: vmware-70u3 (x2), Advanced Networking with Mgmt server ol8
Total time taken: 57050 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr7226-t13524-vmware-70u3.zip
Smoke tests completed. 140 look OK, 1 have errors, 0 did not run
Only failed and skipped tests results shown below:

Test Result Time (s) Test File
test_01_prepare_and_cancel_maintenance Error 0.14 test_ms_maintenance_and_safe_shutdown.py

@hsato03
Copy link
Collaborator

hsato03 commented Jul 7, 2025

@DaanHoogland I think the test error is not related to the PR. Could you run the tests again, please?

ERROR: test_01_prepare_and_cancel_maintenance (tests.smoke.test_ms_maintenance_and_safe_shutdown.TestMSMaintenanceAndSafeShutdown)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/marvin/tests/smoke/test_ms_maintenance_and_safe_shutdown.py", line 138, in test_01_prepare_and_cancel_maintenance
    self.apiclient.cancelMaintenance(cancel_maintenance_cmd)
  File "/usr/local/lib/python3.6/site-packages/marvin/cloudstackAPI/cloudstackAPIClient.py", line 1936, in cancelMaintenance
    response = self.connection.marvinRequest(command, response_type=response, method=method)
  File "/usr/local/lib/python3.6/site-packages/marvin/cloudstackConnection.py", line 381, in marvinRequest
    raise e
  File "/usr/local/lib/python3.6/site-packages/marvin/cloudstackConnection.py", line 376, in marvinRequest
    raise self.__lastError
  File "/usr/local/lib/python3.6/site-packages/marvin/cloudstackConnection.py", line 310, in __parseAndGetResponse
    response_cls)
  File "/usr/local/lib/python3.6/site-packages/marvin/jsonHelper.py", line 155, in getResultObj
    raise cloudstackException.CloudstackAPIException(respname, errMsg)
marvin.cloudstackException.CloudstackAPIException: Execute cmd: cancelmaintenance failed, due to: errorCode: 530, errorText:Management server is not in the right state to cancel maintenance

@DaanHoogland
Copy link
Contributor

@blueorangutan test ol8 vmware-70u3

@blueorangutan
Copy link

@DaanHoogland a [SL] Trillian-Jenkins test job (ol8 mgmt + vmware-70u3) has been kicked to run smoke tests

@DaanHoogland
Copy link
Contributor

@DaanHoogland I think the test error is not related to the PR. Could you run the tests again, please?

ERROR: test_01_prepare_and_cancel_maintenance (tests.smoke.test_ms_maintenance_and_safe_shutdown.TestMSMaintenanceAndSafeShutdown)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/marvin/tests/smoke/test_ms_maintenance_and_safe_shutdown.py", line 138, in test_01_prepare_and_cancel_maintenance
    self.apiclient.cancelMaintenance(cancel_maintenance_cmd)
  File "/usr/local/lib/python3.6/site-packages/marvin/cloudstackAPI/cloudstackAPIClient.py", line 1936, in cancelMaintenance
    response = self.connection.marvinRequest(command, response_type=response, method=method)
  File "/usr/local/lib/python3.6/site-packages/marvin/cloudstackConnection.py", line 381, in marvinRequest
    raise e
  File "/usr/local/lib/python3.6/site-packages/marvin/cloudstackConnection.py", line 376, in marvinRequest
    raise self.__lastError
  File "/usr/local/lib/python3.6/site-packages/marvin/cloudstackConnection.py", line 310, in __parseAndGetResponse
    response_cls)
  File "/usr/local/lib/python3.6/site-packages/marvin/jsonHelper.py", line 155, in getResultObj
    raise cloudstackException.CloudstackAPIException(respname, errMsg)
marvin.cloudstackException.CloudstackAPIException: Execute cmd: cancelmaintenance failed, due to: errorCode: 530, errorText:Management server is not in the right state to cancel maintenance

I am running it again, but the root cause may well be in the test itself, or the order in which tests are executed. I agree it does not look like related to your changes.

@blueorangutan
Copy link

[SF] Trillian test result (tid-13714)
Environment: vmware-70u3 (x2), Advanced Networking with Mgmt server ol8
Total time taken: 80577 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr7226-t13714-vmware-70u3.zip
Smoke tests completed. 127 look OK, 4 have errors, 10 did not run
Only failed and skipped tests results shown below:

Test Result Time (s) Test File
test_deploy_more_vms_than_limit_allows Error 144.78 test_deploy_vms_in_parallel.py
test_01_prepare_and_cancel_maintenance Error 0.13 test_ms_maintenance_and_safe_shutdown.py
test_01_deploy_vm_on_specific_host Error 3606.16 test_vm_deployment_planner.py
test_02_deploy_vm_on_specific_cluster Error 4.42 test_vm_deployment_planner.py
test_03_deploy_vm_on_specific_pod Error 4.50 test_vm_deployment_planner.py
test_04_deploy_vm_on_host_override_pod_and_cluster Error 4.43 test_vm_deployment_planner.py
test_05_deploy_vm_on_cluster_override_pod Error 19.81 test_vm_deployment_planner.py
ContextSuite context=TestMigrateVMStrictTags>:setup Error 0.00 test_vm_strict_host_tags.py
ContextSuite context=TestRestoreVMStrictTags>:setup Error 0.00 test_vm_strict_host_tags.py
ContextSuite context=TestScaleVMStrictTags>:setup Error 0.00 test_vm_strict_host_tags.py
ContextSuite context=TestVMDeploymentPlannerStrictTags>:setup Error 0.00 test_vm_strict_host_tags.py
all_test_vnf_templates Skipped --- test_vnf_templates.py
all_test_volumes Skipped --- test_volumes.py
all_test_vpc_ipv6 Skipped --- test_vpc_ipv6.py
all_test_vpc_redundant Skipped --- test_vpc_redundant.py
all_test_vpc_router_nics Skipped --- test_vpc_router_nics.py
all_test_vpc_vpn Skipped --- test_vpc_vpn.py
all_test_webhook_delivery Skipped --- test_webhook_delivery.py
all_test_webhook_lifecycle Skipped --- test_webhook_lifecycle.py
all_test_host_maintenance Skipped --- test_host_maintenance.py
all_test_hostha_kvm Skipped --- test_hostha_kvm.py

String attachedVmName;
Volume.Type volumeType;
String hostGuidInTargetCluster;
Long newIops;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hsato03 Do you think that it makes sense to separate it into read and write IOPS? As @weizhouapache said, this is an agnostic command, so it should be treated like that (meaning separate read and write IOPS).

Comment on lines +36 to +37
private Long newMaxIops;
private Long newMinIops;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we have max and min IOPs here but on the migrate volume command only IOPS?

Comment on lines +5216 to +5218
* Sets the disk IOPS limitation, if the {@link MigrateVolumeCommand} did not specify this limitation, then it is set to -1 (unlimited).
*/
private void setDiskIops(MigrateVolumeCommand cmd, VirtualMachineMO vmMo, String volumePath) throws Exception {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we sure that every time the volume is migrated we inform the IOPS limitation?

Comment on lines +1256 to +1257
newMinIops = newDiskOffering.getMinIops() != null ? newDiskOffering.getMinIops() : newDiskOffering.getIopsReadRate();
newMaxIops = newDiskOffering.getMaxIops() != null ? newDiskOffering.getMaxIops() : newDiskOffering.getIopsWriteRate();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should extract this to a method and add a javadoc to it explaining why we are doing it like this.

Comment on lines +1325 to +1326
boolean volumeResizeRequired = currentSize != newSize || !compareEqualsIncludingNullOrZero(newMaxIops, volume.getMaxIops()) || !compareEqualsIncludingNullOrZero(newMinIops, volume.getMinIops())
|| !compareEqualsIncludingNullOrZero(newMaxIops, diskOffering.getIopsWriteRate()) || !compareEqualsIncludingNullOrZero(newMinIops, diskOffering.getIopsReadRate());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really dislike assuming that newMaxIops has to equal write and newMinIops has to equal read. We should change this logic to avoid future problems, it is very easy for some other dev to mix these in the future.

Comment on lines +2788 to +2793
public Pair<VirtualDisk, String> getDiskDevice(String vmdkDatastorePath, boolean matchExactly, boolean ignoreDotOnPath) throws Exception {
List<VirtualDevice> devices = _context.getVimClient().getDynamicProperty(_mor, "config.hardware.device");

if (ignoreDotOnPath) {
vmdkDatastorePath = vmdkDatastorePath + ".";
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you explain this change?

@DaanHoogland
Copy link
Contributor

@JoaoJandre @hsato03 are you guys still looking at this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment