Skip to content

Refresh remote session during long migrations#6889

Draft
gthvn1 wants to merge 1 commit intoxapi-project:masterfrom
xcp-ng:gtn-refresh-session-master
Draft

Refresh remote session during long migrations#6889
gthvn1 wants to merge 1 commit intoxapi-project:masterfrom
xcp-ng:gtn-refresh-session-master

Conversation

@gthvn1
Copy link
Contributor

@gthvn1 gthvn1 commented Feb 4, 2026

Here is what we want to solve

During a migration, the host where the VM is running (the source) needs to interact with the host where the VM will migrate (the destination). The SM endpoint is used, and it requires an authenticated session. This session Id has, by default, a time to live of 24 hours. It can be modified by setting inactive_session_timeout in xapi.conf.

This session reference is passed as an argument to all API calls. Every session has an associated last active timestamp, which is updated on every API call.

The problem is that when everything is set up to transfer the data of a VDI from the source to the destination (mirror and snapshot are ready), the source creates a thread to do the copy (sparse_dd) and blocks until the thread ends. This works fine when the transfer takes less than 24h, but for large VDIs (for example 2TB) it can take more than 24 hours and the session will time out. When the copy is finished, the source then tries to call API, but the session has already expired.

To avoid, this we need to either refresh the session while the copy is in progress, or create a new session once the copy is done so that the host can still interact with the destination. We can also set inactive_session_timeout to a greater value that is the current workaround proposed to the customer. But it makes sense to automatically refresh the session because increasing the timeout for all session may create new problem like too much session created in the same time.

Long VM migrations can exceed the default SM session timeout (24h), causing the
destination host to fail when using an expired session. This is especially
problematic for large VDI copies.

This patch ensures migrations do not fail due to session expiration by
refreshing the remote session periodically while the data copy is in progress.
The session is passed from the XAPI "core" layer into the storage layer so it can be
refreshed as needed during long-running migrations.

Signed-off-by: Guillaume <guillaume.thouvenin@vates.tech>
@gthvn1 gthvn1 force-pushed the gtn-refresh-session-master branch from da9aa17 to 80f31c1 Compare February 4, 2026 15:57
@gthvn1
Copy link
Contributor Author

gthvn1 commented Feb 4, 2026

Any comment to solve the issue is welcome. I'm not sure that passing remote session is the correct approach.

}
in
let mirror_to_remote new_dp =
let task =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why can't the refresh thread be created here and wait until the migrarion is complete?

Currently you've done a lot of plumbing to refresh the session for smapiv1 plugins, but for smapiv3 ones it doesn't work and you'll need to duplicate the effort as well

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because the problem is that nothing is called here while the data is being copied. As I understand it, we are blocked until sparse_dd finishes copying the data. I can't find a place for a hook/callback here. The only place I found to regularly call the session refresh is in the thread callback of sparse_dd, but that is on the other side of the layer, in the storage part.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And yes I agree that for smapiv3 we will need to do that as well if it is the solution.

Copy link
Member

@psafont psafont Feb 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can't find a place for a hook/callback here.

As a synchronisation mechanism a ref bool + mutex could be used, the child thread, once the migration has finished changes it to true; and the main thread, which was in a loop refreshing the session every X minutes, decides to call Thread.join on the child and continue execution

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it is probably better than passing session through layers 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants