Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
42 changes: 42 additions & 0 deletions tests/storage/linstor/test_linstor_sr.py
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the test fails while the SR is in a failed state, could this leave the pool in a bad state?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the test fails and SR goes bad, I don't see an easy way to recover from it. There could be scenario based troubleshooting required before cleaning up. However, principally single host failure in XOSTOR should be tolerable and this test should catch in case failures.

We probably should use nested tests for this so that in case the pool goes bad, we can just wipe and start with a new clean one.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or the SR used for physical tests should be a throw-away one too?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are block devices involved (LVM, drbd, tapdisk which gets blocked on IO), and an improper teardown needs careful inspection and recover or a harsh wipe of everything + reboots. Host failure is still tolerated better than a disk/LVM failure.

XOSTOR SR is hardly a throw-away.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well if we use those block devices for nothing else than this test, we can easily blank them to restart from scratch. We do that for all local SRs, and in some way a Linstor SR is "local to the pool". If our test pool is the sole user, it looks throw-away to me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this happens (manually) when test needs clean start. If it's acceptable then we can add it into prepare_test or similar so that manual script is not required.

Original file line number Diff line number Diff line change
Expand Up @@ -131,6 +131,48 @@ def test_linstor_missing(self, linstor_sr, host):
if not linstor_installed:
host.yum_install([LINSTOR_PACKAGE])

@pytest.mark.reboot
@pytest.mark.small_vm
def test_linstor_sr_fail_host(self, linstor_sr, host, vm_on_linstor_sr):
"""
Fail non master host from the same pool Linstor SR.
Ensure that VM is able to boot and shutdown on all hosts.
"""
import random
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

random remains a common module, perhaps we should put it in global import in case a new future function uses it in this module?

sr = linstor_sr
vm = vm_on_linstor_sr
# Ensure that its a single host pool and not multi host pool
assert len(host.pool.hosts) > 2, "This test requires Pool to have more than 2 hosts"

# Remove master from hosts list to avoid xapi calls failure
hosts = list(sr.pool.hosts)
hosts.remove(sr.pool.master)
# Evacuate the node to be deleted
try:
random_host = random.choice(hosts) # TBD: Choose Linstor Diskfull node
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
random_host = random.choice(hosts) # TBD: Choose Linstor Diskfull node
random_host = random.choice(hosts) # TBD: Choose Linstor Diskful node

logging.info("Working on %s", random_host.hostname_or_ip)
random_host.ssh(['echo', 'c', '>', '/proc/sysrq-trigger'])
except Exception as e:
logging.info("Host %s could be crashed with output %s.", random_host.hostname_or_ip, e.stdout)

# Ensure that VM is able to start on all hosts except failed one
for h in sr.pool.hosts:
logging.info("Checking VM on host %s", h.hostname_or_ip)
if h.hostname_or_ip != random_host.hostname_or_ip:
vm.start(on=h.uuid)
vm.wait_for_os_booted()
vm.shutdown(verify=True)

# Wait for radom_host to come online
wait_for(random_host.is_enabled, "Wait for crashed host enabled", timeout_secs=30 * 60)

# Ensure that the VM is able to run on crashed host as well.
vm.start(on=random_host.uuid)
vm.wait_for_os_booted()
vm.shutdown(verify=True)

sr.scan()

# *** End of tests with reboots

# --- Test diskless resources --------------------------------------------------
Expand Down