Skip to content

Conversation

AlexCK-STFC
Copy link
Contributor

openstack_compute_volume_attach_v2.data_volume_attach.device is often wrong at provision time and should not be trusted

Instead, the same approach used in azimuth-images should be used, finding the device from the volume ID.

This avoids errors like:

TASK [azimuth_cloud.azimuth_ops.k3s : Fail if k3s storage device is missing] **************************************************************************************************************************
fatal: [azimuth-stfc-dev]: FAILED! => {"changed": false, "msg": "K3s storage device not found at /dev/vdb"}

when the hypervisor chooses the mount the volume in a different mount than Terraform/OpenStack is expecting.

image image

@AlexCK-STFC AlexCK-STFC requested a review from a team as a code owner August 14, 2025 10:53
@AlexCK-STFC
Copy link
Contributor Author

I wasn't sure why k3s_storage_device was set in /defaults/main.yml so removed it; it should be set by terraform, not sure of a time this default would be used.

@m-bull m-bull added the bug Something isn't working label Aug 15, 2025
@m-bull m-bull changed the title BUG: Find k3s device from volume ID instead of trusting openstack Find k3s device from volume ID instead of trusting openstack Aug 15, 2025
Copy link
Contributor

@m-bull m-bull left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the patch - its a good idea and we should've done it this way from the start.

General comment on the implementation - I would value the view of @JohnGarbutt and @sd109 here too.

/dev/{{
candidate_device_paths_stat.results |
selectattr("stat.exists") |
map(attribute = "stat.lnk_source") |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking more about this, I think we want the actual /dev/disk/by-id path in /etc/fstab (which happens in the mount filesytem task below), otherwise I think we could end up in a situation like this:

  1. Attach a volume, it is /dev/disk/by-id/SOME-UUID => /dev/vdb
  2. Ansible here adds /dev/vdb to /etc/fstab
  3. Reboot the machine, libvirt shuffles the cards and now /dev/disk/by-id/SOME-UUID => /dev/vdc
  4. Machine never boots because its missing a device it expects to see in /etc/fstab.
  5. (Machine miraculously boots, but then we run the ansible again and we have /dev/vdb and /dev/vdc in /etc/fstab.

I think this task should look more like:

    - name: Set volume block device name
      ansible.builtin.set_fact:
        k3s_storage_device: >-
          {{
            candidate_device_paths_stat.results |
              selectattr("stat.exists") |
              map(attribute = "stat.path") |
              first |
              default("") |
             }}

then k3s_storage_device will end up with the /dev/disk/by-id path, and we should be fully immune from libvirt shuffling.

If you go this way, then you can set when: k3s_storage_device == "" in the next task, which is somewhat cleaner I think.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to self - if we move to this method, we should do it in a release where we change the base image, because that way we don't have to worry about the oldstyle (/dev/vdb) and newstyle (/dev/disk/by-id/) coexisting in /etc/fstab, because we'll be rolling the image and will have a fresh fstab.

@AlexCK-STFC
Copy link
Contributor Author

AlexCK-STFC commented Aug 28, 2025

I don't have enough time to adopt this change myself right now, can this PR be taken over @m-bull?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants