fix: convert vSphere UUID to Talos format for hostname persistence#23
fix: convert vSphere UUID to Talos format for hostname persistence#23ohauer wants to merge 5 commits intosiderolabs:mainfrom
Conversation
vSphere reports UUIDs with byte-swapped first 3 groups compared to what Talos reports. This causes config patches to not be linked to the correct machine, resulting in hostname loss after reboot. Example: - vSphere: 422413c3-57c8-96d1-c481-c58dbb837d2d - Talos: c3132442-c857-d196-c481-c58dbb837d2d This fix: - Adds convertVSphereUUIDToTalosFormat() to swap bytes correctly - Uses the converted UUID when setting MachineUUID - Creates config patch after UUID is set (timing fix) - Adds tests to verify the conversion This ensures the hostname config patch is properly linked to the machine that actually joins Omni, making hostnames persist across reboots.
- Document hostname persistence fix with Talos v1.12.0+ requirement - Add all changes since v0.1.0-alpha.0 - Tested with Talos v1.12.4 and Omni v1.5.7
The UUID conversion and config patch creation code was missing from provision.go. This adds: - UUID conversion from vSphere to Talos format - Setting MachineUUID with converted UUID - Creating config patch after UUID is set - Logging both UUIDs for debugging
- Bump Go version from 1.25.6 to 1.25.7 in Makefile and go.mod - Fix variable shadowing in provision.go (use resizeErr, netErr) - Export ConvertVSphereUUIDToTalosFormat for external testing - Improve test package isolation (use provider_test package) - Standardize import ordering and fix whitespace issues - Clean up docker run command in README (remove unused -e USER flag)
Update NoticeThis PR replaces #22 which was closed after a force-push to correct git commit author information. What's New Since #22In addition to the original UUID conversion fix for hostname persistence, this PR now includes:
All changes have been tested and Ready for review. |
|
Super nice. Thx for doing this, @ohauer. I'll take a look and we can merge. Following that, I'll create a new alpha release. |
|
@ohauer I'm curious on your thoughts of setting the UUID manually for both talos and vSphere. We could gen a UUID and pass it to Talos as well as vSphere in the spec here. That would get us out of having to do all the conversion, but I'm not sure if there's other fallout we may see in going that route. What do you think? |
|
Eh, maybe not. This may not be exposed in machine config, just in META partition overrides. Nvm. |
|
@ohauer I'm now curious how you were hitting this issue initially. From what's on main now, I can't seem to reproduce the initial failure after creating a cluster and rebooting a worker machine through the Talos API and with vSphere's UI. The machine always seems to come back online and hostname shows as the same in vSphere, Omni, and Talos dashboard. Maybe vSphere version difference or something? I'm on 8.0.2u3. |
|
Hi @rsmitty, at current time I was running this on an older 6.7 and 7.5 cluster and even with the fix hostname setting for created VMs I got different hostnames between vSphere VM name and the talos host / node name and there I need this UUID patch. I was surprised myself when I noticed the UUID difference as shown in the patch. Here an example of a test that i just created. UUID in OMNI
UUID on vSpheregovc vm.info talos-UUID-workers-9ml55h | grep UUID
Name: talos-workers-9ml55h
UUID: 42249637-4e9e-7ce3-e0fa-358a6e4c77db
govc vm.info -json talos-UUID-workers-9ml55h | jq '.virtualMachines[0].config.uuid'
"42249637-4e9e-7ce3-e0fa-358a6e4c77db"In case you have a vSphere test environment I'm interested to see it it is the same behavior. |
|
I do see that there's a difference in what vSphere reports as the UUID vs what Talos thinks it is. That said, I don't see anything that breaks because of this. As long as whatever Talos and Omni thinks is the UUID remains constant (which should just be what Talos tells Omni that its UUID is), that seems to allow hostname patches to work fine for me across reboots. |


Problem
VMs created via Omni had correct names in vSphere but hostnames didn't persist in Talos systems after reboot. The root cause was a UUID endianness mismatch between vSphere and Talos.
Root Cause
vSphere and Talos report UUIDs in different byte-order formats:
422413c3-57c8-96d1-c481-c58dbb837d2dc3132442-c857-d196-c481-c58dbb837d2dThe first 3 UUID groups are byte-swapped, last 2 groups match. This caused:
Solution
This PR adds UUID conversion to handle the endianness difference:
Changes
ConvertVSphereUUIDToTalosFormat()function to convert UUIDsFiles Added
internal/pkg/provider/uuid.go- UUID conversion utilityinternal/pkg/provider/uuid_test.go- UUID conversion testsinternal/pkg/provider/provision_test.go- Provisioning flow testsFiles Modified
internal/pkg/provider/provision.go- Convert UUID before setting MachineUUIDMakefileandgo.mod- Update Go version to 1.25.7README.md- Clean up docker commandTesting
Result
✅ VM name:
talos-workers-kv28hl✅ Initial hostname:
talos-workers-kv28hl✅ Hostname after reboot:
talos-workers-kv28hl(persists!)✅ Only one machine in Omni (no duplicates)
Supersedes #22 (closed due to force-push after git author correction)