Skip to content

fix: convert vSphere UUID to Talos format for hostname persistence#23

Open
ohauer wants to merge 5 commits intosiderolabs:mainfrom
ohauer:fix/use-meaningful-vm-hostname
Open

fix: convert vSphere UUID to Talos format for hostname persistence#23
ohauer wants to merge 5 commits intosiderolabs:mainfrom
ohauer:fix/use-meaningful-vm-hostname

Conversation

@ohauer
Copy link
Contributor

@ohauer ohauer commented Feb 28, 2026

Problem

VMs created via Omni had correct names in vSphere but hostnames didn't persist in Talos systems after reboot. The root cause was a UUID endianness mismatch between vSphere and Talos.

Root Cause

vSphere and Talos report UUIDs in different byte-order formats:

  • vSphere UUID: 422413c3-57c8-96d1-c481-c58dbb837d2d
  • Talos UUID: c3132442-c857-d196-c481-c58dbb837d2d

The first 3 UUID groups are byte-swapped, last 2 groups match. This caused:

  1. Config patch created with vSphere UUID
  2. Machine joins Omni with Talos UUID (different!)
  3. Config patch not linked to machine
  4. Hostname lost after reboot

Solution

This PR adds UUID conversion to handle the endianness difference:

Changes

  • Add ConvertVSphereUUIDToTalosFormat() function to convert UUIDs
  • Convert vSphere UUID to Talos format before setting MachineUUID
  • Create config patch after UUID is set (timing fix)
  • Add comprehensive tests for UUID conversion and provisioning flow
  • Update Go version to 1.25.7 and fix code quality issues

Files Added

  • internal/pkg/provider/uuid.go - UUID conversion utility
  • internal/pkg/provider/uuid_test.go - UUID conversion tests
  • internal/pkg/provider/provision_test.go - Provisioning flow tests

Files Modified

  • internal/pkg/provider/provision.go - Convert UUID before setting MachineUUID
  • Makefile and go.mod - Update Go version to 1.25.7
  • README.md - Clean up docker command

Testing

  • Tested with: Talos v1.12.4 and Omni v1.5.7
  • Requirements: Talos v1.12.0+ (multi-document configuration support)
  • All tests passing ✅

Result

✅ VM name: talos-workers-kv28hl
✅ Initial hostname: talos-workers-kv28hl
✅ Hostname after reboot: talos-workers-kv28hl (persists!)
✅ Only one machine in Omni (no duplicates)


Supersedes #22 (closed due to force-push after git author correction)

vSphere reports UUIDs with byte-swapped first 3 groups compared to
what Talos reports. This causes config patches to not be linked to
the correct machine, resulting in hostname loss after reboot.

Example:
- vSphere: 422413c3-57c8-96d1-c481-c58dbb837d2d
- Talos:   c3132442-c857-d196-c481-c58dbb837d2d

This fix:
- Adds convertVSphereUUIDToTalosFormat() to swap bytes correctly
- Uses the converted UUID when setting MachineUUID
- Creates config patch after UUID is set (timing fix)
- Adds tests to verify the conversion

This ensures the hostname config patch is properly linked to the
machine that actually joins Omni, making hostnames persist across
reboots.
- Document hostname persistence fix with Talos v1.12.0+ requirement
- Add all changes since v0.1.0-alpha.0
- Tested with Talos v1.12.4 and Omni v1.5.7
The UUID conversion and config patch creation code was missing from
provision.go. This adds:
- UUID conversion from vSphere to Talos format
- Setting MachineUUID with converted UUID
- Creating config patch after UUID is set
- Logging both UUIDs for debugging
- Bump Go version from 1.25.6 to 1.25.7 in Makefile and go.mod
- Fix variable shadowing in provision.go (use resizeErr, netErr)
- Export ConvertVSphereUUIDToTalosFormat for external testing
- Improve test package isolation (use provider_test package)
- Standardize import ordering and fix whitespace issues
- Clean up docker run command in README (remove unused -e USER flag)
@talos-bot talos-bot moved this to In Review in Planning Feb 28, 2026
@ohauer
Copy link
Contributor Author

ohauer commented Feb 28, 2026

Update Notice

This PR replaces #22 which was closed after a force-push to correct git commit author information.

What's New Since #22

In addition to the original UUID conversion fix for hostname persistence, this PR now includes:

  • Go version update: Bumped from 1.25.6 to 1.25.7 in Makefile and go.mod
  • Code quality improvements:
    • Fixed variable shadowing in provision.go
    • Improved test package isolation
    • Standardized import ordering
    • Fixed linting issues

All changes have been tested and make lint passes successfully. The core functionality (UUID conversion for hostname persistence) remains unchanged and fully tested.

Ready for review.

@rsmitty
Copy link
Member

rsmitty commented Mar 3, 2026

Super nice. Thx for doing this, @ohauer. I'll take a look and we can merge. Following that, I'll create a new alpha release.

@rsmitty
Copy link
Member

rsmitty commented Mar 3, 2026

@ohauer I'm curious on your thoughts of setting the UUID manually for both talos and vSphere. We could gen a UUID and pass it to Talos as well as vSphere in the spec here. That would get us out of having to do all the conversion, but I'm not sure if there's other fallout we may see in going that route.

What do you think?

@rsmitty
Copy link
Member

rsmitty commented Mar 3, 2026

Eh, maybe not. This may not be exposed in machine config, just in META partition overrides. Nvm.

@rsmitty
Copy link
Member

rsmitty commented Mar 3, 2026

@ohauer I'm now curious how you were hitting this issue initially. From what's on main now, I can't seem to reproduce the initial failure after creating a cluster and rebooting a worker machine through the Talos API and with vSphere's UI. The machine always seems to come back online and hostname shows as the same in vSphere, Omni, and Talos dashboard.

Maybe vSphere version difference or something? I'm on 8.0.2u3.

@ohauer
Copy link
Contributor Author

ohauer commented Mar 3, 2026

Hi @rsmitty, at current time I was running this on an older 6.7 and 7.5 cluster and even with the fix hostname setting for created VMs I got different hostnames between vSphere VM name and the talos host / node name and there I need this UUID patch.

I was surprised myself when I noticed the UUID difference as shown in the patch.
In the coming weeks I will have access to vSphere 8.0u3 or 8.5.x, then I can report whether the behavior changes in these versions.

Here an example of a test that i just created.

UUID in OMNI

image image

UUID on vSphere

govc vm.info talos-UUID-workers-9ml55h | grep UUID
Name:           talos-workers-9ml55h
  UUID:         42249637-4e9e-7ce3-e0fa-358a6e4c77db

govc vm.info -json talos-UUID-workers-9ml55h | jq '.virtualMachines[0].config.uuid'
"42249637-4e9e-7ce3-e0fa-358a6e4c77db"

In case you have a vSphere test environment I'm interested to see it it is the same behavior.

@rsmitty
Copy link
Member

rsmitty commented Mar 4, 2026

I do see that there's a difference in what vSphere reports as the UUID vs what Talos thinks it is.

That said, I don't see anything that breaks because of this. As long as whatever Talos and Omni thinks is the UUID remains constant (which should just be what Talos tells Omni that its UUID is), that seems to allow hostname patches to work fine for me across reboots.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: In Review

Development

Successfully merging this pull request may close these issues.

2 participants