Skip to content

Conversation

Nambrok
Copy link
Contributor

@Nambrok Nambrok commented Apr 2, 2025

The tests create a VDI, connect it to Dom0 then use the tapdev to write random data somewhere in it.
It then compare the data of the VDI to the original data we have to see if it has changed.
The test create snapshot/clone before deleting them and waiting for the coalesce to have happened by observing sm-config:vhd-parent before checking the integrity of the data.

Also add --vdi-type parameter that default to vhd to prepare to tests multiple VDI types, i.e. qcow2.

@Nambrok Nambrok requested a review from Wescoeur April 2, 2025 11:16
@Nambrok Nambrok marked this pull request as ready for review April 2, 2025 11:16
@Nambrok Nambrok force-pushed the dtt-coalesce branch 4 times, most recently from 72580ab to 8cdbea9 Compare April 2, 2025 14:44
@Nambrok Nambrok requested review from Wescoeur and AnthoineB April 2, 2025 14:44
@Nambrok
Copy link
Contributor Author

Nambrok commented Apr 2, 2025

For the failing test: waiting for #291 to be resolved.

action="append",
default=[],
help="Format of VDI to execute tests on."
"Example: vhd,qcow2"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Making this a CLI parameter means you're deporting the test job definition outside pytest. When rishi wanted to do the same for thin vs thick in the context of XOSTOR tests, we asked him to rework the tests so they actually test both formats.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's to allow to test only one type by hand, the objectives for tests is to keep the default which at the moment is vhd but will be ["vhd", "qcow2"] eventually.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pytest already has test selection mechanisms:

  • path to test files
  • -k
  • -m

Copy link
Member

@stormi stormi Apr 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As long as the parameter is not necessary in jobs.py, fine by me. We won't have SR types that only support VHD or only support QCOW2 at some point?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The parameter default to VHD when not given meaning it's not needed to change jobs.py. But I will need to add the new coalesce tests in jobs.py either way and it's the only test at the moment that will use vdi-type
I'm not sure to see how the other selection mechanisms from pytest could work in this case.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At some point to enable "qcow2", this will be required to make modifications in jobs.py

@Nambrok while doing similar for thin vs thick, we did

@pytest.fixture(params=["thin"], scope="session")
def provisioning_type(request):
    return request.param

So here in this case, if we do -

@pytest.fixture(params=["vhd"], scope="session")
def vdi_image_format(request):
    return request.param

and

@pytest.fixture(scope='package')
def ext_sr(host, sr_disk, vdi_image_format):
   """ An EXT SR on first host. """
   sr = host.sr_create('ext', "EXT-local-SR-test", {
       'device': '/dev/' + sr_disk,
       'preferred-image-formats': vdi_image_format
   })
   yield sr
   # teardown
   sr.destroy()

the tests can be upgraded for new types in future for supported SR.

As the whole SR tests needs to be tested for both vhd & qcow2, it makes it suitable per SR tests package.

I'm fine with both approach, to get this added.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can just change the default parameter value in my case, but it also allows to run only one type of image_format when started manually.
Your method always run both.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://github.com/xcp-ng/xcp-ng-tests/pull/290/files#diff-a31c7ed5d35f5ed8233994868c54d625b18e6bacb6794344c4531e62bd9dde59R111
We can just add qcow2 on this line to add qcow2 to also run by default without modifying jobs.py.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Works, I think we can move ahead with this merge.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your method always run both.

Well, there is still -k and -m for filtering as Sam mentioned. We may also want to avoid doing similar things different ways.

@@ -75,6 +75,13 @@ def pytest_addoption(parser):
"4KiB blocksize to be formatted and used in storage tests. "
"Set it to 'auto' to let the fixtures auto-detect available disks."
)
parser.addoption(
"--image-format",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When you work in the context of storage everyday, maybe it's a clear name, but otherwise it might be too generic a name. Both image and format can be many things.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's why I added vdi-image-format as suggestion.
But I wonder if we should not use the naming volume-image-format instead.

No preference.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And vdi-type looked consistent with sm's terminology 🤔

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

# xe vdi-list params=type | sort | uniq
type ( RO)    : CBT metadata
type ( RO)    : HA statefile
type ( RO)    : Redo log
type ( RO)    : User

Not really consistent. In this example here, type is completly different.

  • On the XAPI side, we have a vdi_type stored in the sm-config attribute. Here you can have the VHD/AIO value. However there are weird situations like: sm-config (MRO) : type: raw; vdi_type: aio.
  • Regarding the new QCOW2 format we discussed internally to use a new attribute image-format. type is completely ambiguous depending on what you are talking about and the context in which it is used.
  • In the SMAPIv3, vdi type is completly removed (same for VDI, we use volume instead). And image-format is used.

@glehmann
Copy link
Member

glehmann commented Apr 2, 2025

For the failing test: waiting for #291 to be resolved.

I thought it would take a bit longer before it showed again!

@Nambrok Nambrok force-pushed the dtt-coalesce branch 2 times, most recently from 2ec1e33 to d5ec0a3 Compare May 20, 2025 12:01
@Nambrok Nambrok force-pushed the dtt-coalesce branch 2 times, most recently from f07746f to ca0d349 Compare June 4, 2025 16:40
@Nambrok Nambrok force-pushed the dtt-coalesce branch 3 times, most recently from 4f33a49 to b96297e Compare July 3, 2025 07:47
Comment on lines +110 to +119
if len(image_format) == 0:
image_format = ["vhd"] # Not giving image-format will default to doing tests on vhd
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"vhd,qcow2" are considered as one single option by pytest when passing like --image-format=vhd,qcow2 thus need to convert it into list

image_format = image_format[0].split(",") if image_format else ["vhd"]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, but it's because different input are made into a list so --image-format=vhd --image-format=qcow2 => ["vhd", "qcow2"].
Some fixture are doing both but I didn't explicitely, we could do both.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's cleaner to just use --image-format=vhd --image-format=qcow2 like this, but then example given in --help needs fixing, as it does suggest the other way.

@Nambrok Nambrok requested a review from ydirson July 9, 2025 12:20
@Nambrok Nambrok force-pushed the dtt-coalesce branch 2 times, most recently from dc0ecdd to 46374df Compare July 24, 2025 09:57
Copy link
Contributor

@ydirson ydirson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First pass for the big first commit

action="append",
default=[],
help="Format of VDI to execute tests on."
"Example: vhd,qcow2"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your method always run both.

Well, there is still -k and -m for filtering as Sam mentioned. We may also want to avoid doing similar things different ways.

Comment on lines +110 to +119
if len(image_format) == 0:
image_format = ["vhd"] # Not giving image-format will default to doing tests on vhd
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's cleaner to just use --image-format=vhd --image-format=qcow2 like this, but then example given in --help needs fixing, as it does suggest the other way.

Comment on lines 40 to 41
logging.info(f"<< Unplugging VDI {vdi_uuid} from Dom0")
host.xe("vbd-unplug", {"uuid": vbd_uuid})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC vbd-unplug says/does nothing if already unplugged, wouldn't it be useful to first assert that the VBD is indeed plugged?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little afraid of potential side effects personally, but I have nothing against a warning log. On the other hand, we don't check whether the VDI to unplug is in the self.vdis list. 🤔

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little afraid of potential side effects

I'm myself a bit afraid of what can happen if we proceed while the system is not in the expected state

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Expected state being VBD unplugged so we can then remove it. Calling it and it doing nothing is not a problem, if it errors it will be raised.

@Nambrok Nambrok force-pushed the dtt-coalesce branch 4 times, most recently from a9e09ed to add8109 Compare July 29, 2025 08:47
Nambrok added 2 commits July 29, 2025 11:33
vdi.py: add getting the image_format if available
sr.py: add choosing a image_format on vdi creation
sr.py add get_type to get the type of the SR

Signed-off-by: Damien Thenot <[email protected]>
Add a fixture `shared_sr` to get a shared SR
Also modify local_sr_* function to log the SR type

Signed-off-by: Damien Thenot <[email protected]>
@Nambrok Nambrok force-pushed the dtt-coalesce branch 5 times, most recently from 5287d97 to 84f281d Compare July 29, 2025 10:05
lib/basevm.py Outdated
Comment on lines 26 to 28
# Doesn't work with Dom0 since `vm-disk-list` doesn't work on it so we create empty list
if e.stdout == "Error: No matching VMs found":
logging.info("Couldn't get disks list. We are Dom0. Continuing...")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe rather something like "Couldn't get disks list (maybe we are dom0?). Continuing..." ?

lib/basevm.py Outdated
Comment on lines 62 to 73
def vdi_uuids(self, sr_uuid=None):
def vdi_uuids(self, sr_uuid=None) -> List[str]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could also type sr_uuid

lib/vm.py Outdated
Comment on lines 291 to 309
def connect_vdi(self, vdi: VDI, device: str = "autodetect") -> str:
logging.info(f">> Plugging VDI {vdi.uuid} on VM {self.uuid}")
vbd_uuid = self.host.xe("vbd-create", {
"vdi-uuid": vdi.uuid,
"vm-uuid": self.uuid,
"device": device,
})
self.host.xe("vbd-plug", {"uuid": vbd_uuid})

self.vdis.append(vdi)

return vbd_uuid
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if vbd-plug fails we may want to remove the vbd?

It looks like 2 fixtures for create and plug would be a good fit, except that it would not allow to store self.vdis, but I don't see why we need to store it (note: this advocates for finer commit granularity, where we have a better opportunity to explain why we're adding such a feature)... and creating a VDI object requires at least on XAPI call, which can make this sensibly slower especially for a VM with many VDIs

Comment on lines +43 to +50
@pytest.fixture(scope="module")
def tapdev(local_sr_on_hostA1, vdi_with_vbd_on_dom0):
"""
A tapdev is a blockdevice allowing access to a VDI from the Dom0.

It is usually used to give access to the VDI to Qemu for emulating devices
before PV driver are loaded in the guest.
"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That does not really explain how it is useful for testing.

Maybe naming it like tapdev_on_vdi_on_local_sr would be sufficient.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be mentioned somewhere that tapdev is used for a better test performance, to avoid instantiating a new VM to access the vdi

assert compare_data(host, tapdev, data_file_on_host, offset, length)

@pytest.mark.parametrize("vdi_op", ["snapshot", "clone"])
def test_coalesce(self, host: "Host", tapdev: str, vdi_with_vbd_on_dom0: "VDI", data_file_on_host: str, vdi_op):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having to use vdi_with_vbd_on_dom0 to get access to the VDI behind tapdev is confusing.
Maybe a TapDev object to encapsulate all relevant info would improve this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will have to change because tapdev like this won't be available in the future, it's something made available for the moment but will be lost with removing the blktap2 kernel patch in 9.0. We already have plan to redo this part.
I might had a way to obtain an access from Dom0 directly on the VDI object.

@Nambrok Nambrok force-pushed the dtt-coalesce branch 3 times, most recently from befcbf5 to d4099df Compare August 7, 2025 14:51
Copy link
Member

@stormi stormi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Almost good to merge for me, as we need to move on and I don't think remaining unadressed comments add too much technical debt.

However, I'd prefer if, before the merge, some of the remaining comments were addressed.

And this has to be run on CI before the merge.

@@ -152,13 +154,21 @@ def is_shared(self):
{'uuid': self.uuid, 'param-name': 'shared'}))
return self._is_shared

def create_vdi(self, name_label, virtual_size=64):
def get_type(self) -> str:
if self._type is None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could use functools.lru_cache decorator to simplify the caching (more a note for the future).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure lru_cache is relevant here, as it's an LRU cache that's used and updated based on the parameters given to the function: here, the cache will contain a maximum of one value, as we only pass self as an argument. It's a little overkill. 😄

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is @functools.cache in this case :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMHO I think having own decorator to handle 0 arg (const_cache?) or continuing with an explicit private variable is better. @functools.cache uses a dict in its implementation to deal with arguments. (I think I don't like scripted languages with a tendency to consume more resources where we wouldn't do that in a binary :p )

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The decorator is only applied at (byte-)compile time, so a clever python could get rid of the dict and incur zero runtime cost. No clue if it does 😉

lib/host.py Outdated
def get_dom0_uuid(self):
return self.inventory["CONTROL_DOMAIN_UUID"]

def get_dom0_VM(self) -> VM:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get_dom0_vm to avoid mixing case


import lib.commands as commands
from lib.common import prefix_object_name, safe_split, strtobool, wait_for, wait_for_not
from lib.common import (
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure why this is reformatted

from lib.vdi import VDI

class Test:
def test_write_data(self, host: "Host", tapdev: str, data_file_on_host: str):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand the goal of that test—is it somehow related to the coalesce feature?

Comment on lines +43 to +50
@pytest.fixture(scope="module")
def tapdev(local_sr_on_hostA1, vdi_with_vbd_on_dom0):
"""
A tapdev is a blockdevice allowing access to a VDI from the Dom0.

It is usually used to give access to the VDI to Qemu for emulating devices
before PV driver are loaded in the guest.
"""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be mentioned somewhere that tapdev is used for a better test performance, to avoid instantiating a new VM to access the vdi

assert compare_data(host, tapdev, data_file_on_host, offset, length)

@pytest.mark.parametrize("vdi_op", ["snapshot", "clone"])
def test_coalesce(self, host: "Host", tapdev: str, vdi_with_vbd_on_dom0: "VDI", data_file_on_host: str, vdi_op):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could avoid generating a file on the dom0 if we:

  • generate a file with random content on the vdi directly;
  • compute the check sum and retrieve it locally
  • trigger the coalesce
  • recompute the check sum and verify it's the same as before

Copy link
Contributor

@ydirson ydirson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good for merging

@glehmann
Copy link
Member

5 approvals—not something we see every day 😄

@Nambrok Nambrok force-pushed the dtt-coalesce branch 2 times, most recently from 21673d2 to 1b92265 Compare August 21, 2025 11:49
Add a list of vdis object to a VM
Add get_dom0_vm in host.py
Add function to connect/disconnect a VDI from a VM
Add a path to `Snapshot.revert()` to reset the VDIs list of the VM
object that created it since a revert would change the VDI UUIDs of the
VM.

Signed-off-by: Damien Thenot <[email protected]>
Add a vdi-type parameter to parametrize the vdi_type fixture,
it defaults to vhd at the moment, will be used to run test on others vdi
types, e.g. qcow2.

The tests create a VDI, connect it to Dom0 then use the tapdev to write
random data somewhere in it.
It then compares the data of the VDI to the original data we have to see
if it has changed.
The test creates snapshot/clone before deleting them and waiting for the
coalesce to have happened by observing sm-config:vhd-parent before
checking the integrity of the data.

Signed-off-by: Damien Thenot <[email protected]>
@ydirson ydirson merged commit 7d55cf3 into master Aug 28, 2025
9 checks passed
@ydirson ydirson deleted the dtt-coalesce branch August 28, 2025 14:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants