fix: instance rejoin against legacy join server by nklaassen · Pull Request #64599 · gravitational/teleport

nklaassen · 2026-03-13T05:25:32Z

This commit fixes an issue with Instance identity "self-healing". When starting up with additional enabled services which have no corresponding role in the current Instance identity, the node attempts a new cluster join request with the currently configured join token to get an Instance identity with all required roles.

There's currently a bug if:

the node is new enough to attempt joining via the new join service and fall back to the legacy join service
the Auth is old enough it doesn't support the new join service
the token used for the rejoin includes all of the original instance cert roles and the newly required roles

The bug exists because state.IdentityID.HostUUID is overloaded. When originally joining, it is expected to be populated with a plain UUID. When it is parsed from an existing certificate, it includes a suffix of .<clustername>. So when the rejoin attempt passes in the current IdentityID from the existing identity, it includes the clustername suffix, and then when rejoining auth adds the suffix again, and you end up with a double suffix.

This doesn't affect joining via the new join service because the node doesn't explicitly pass its desired HostUUID at all, Auth extracts it from the authenticated identity used for the rejoin request.

The fix is to call state.IdentityID.HostID() which strips the clustername suffix. This will fix new 18.7.x+ nodes rejoining to older 18.2.10- auth servers.

Added test coverage for four cases:

rejoing with new join service with token including all required roles
rejoing with new join service with token including only the newly required role
rejoing with legacy join service with token including all required roles
rejoing with legacy join service with token including only the newly required role

The bug currently effects case 3, all others still work.

changelog: fixed a bug affecting nodes on v18.3.0+ rejoining with new system roles to clusters with Auth services on v18.2.10-

Manual Test Plan

Test Environment

Cluster with Auth server running v18.2.10 or older.
Node with this fix that has already joined with the Node (ssh) role only.

Test Cases

Add the App role to the join token. Enable the app service in the node config. Restart the node. It should rejoin and work.
Upgrade the Auth server to this branch and retest the above.

Fixes #64598 This commit fixes an issue with Instance identity "self-healing". When starting up with additional enabled services which have no corresponding role in the current Instance identity, the node attempts a new cluster join request with the currently configured join token to get an Instance identity with all required roles. There's currently a bug if: - the node is new enough to attempt joining via the new join service and fall back to the legacy join service - the Auth is old enough it doesn't support the new join service - the token used for the rejoin includes all of the original instance cert roles and the newly required roles The bug exists because `state.IdentityID.HostUUID` is overloaded. When originally joining, it is expected to be populated with a plain UUID. When it is parsed from an existing certificate, it includes a suffix of `.<clustername>`. So when the rejoin attempt passes in the current `IdentityID` from the existing identity, it includes the clustername suffix, and then when rejoining auth adds the suffix again, and you end up with a double suffix. This doesn't affect joining via the new join service because the node doesn't explicitly pass its desired HostUUID at all, Auth extracts it from the authenticated identity used for the rejoin request. The fix is to call `state.IdentityID.HostID()` which strips the clustername suffix. This will fix new 18.7.x+ nodes rejoining to older 18.2.10- auth servers. Added test coverage for four cases: 1. rejoing with new join service with token including all required roles 2. rejoing with new join service with token including only the newly required role 3. rejoing with legacy join service with token including all required roles 4. rejoing with legacy join service with token including only the newly required role The bug currently effects case 3, all others still work. changelog: fixed a bug affecting nodes on v18.3.0+ rejoining with new system roles to clusters with Auth services on v18.2.10-

nklaassen marked this pull request as ready for review March 13, 2026 05:44

github-actions bot added the size/sm label Mar 13, 2026

nklaassen added the backport/branch/v18 label Mar 13, 2026

github-actions bot requested review from capnspacehook and hugoShaka March 13, 2026 05:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: instance rejoin against legacy join server#64599

fix: instance rejoin against legacy join server#64599
nklaassen wants to merge 1 commit intomasterfrom
nklaassen/fix-legacyrejoin

nklaassen commented Mar 13, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

nklaassen commented Mar 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Manual Test Plan

Test Environment

Test Cases

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

nklaassen commented Mar 13, 2026 •

edited

Loading