Skip to content

Conversation

@chunjiez
Copy link
Contributor

@chunjiez chunjiez commented Jan 9, 2026

In some corner case, physical-device-path xenstore watch event is fired before slave tapback process ready to process xenstore watch event, thus, slave tapback process would miss xenstore watch event, then blktap io datapath fails to establish.

In xenopsd side, the vbd-script waits for tapback slave process ready by checking /var/run/tapback..statefile, if the file is present and file contains "ping" string, then vbd-script updates the file, writes "pong" to the file and continues to update xenstore, otherwise, just wait.

In tapback slave process side, once it get prepared to process xenstore watch event, it writes "ping" string to /var/run/tapback..statefile, then waits for acknowledge by checking if the file contains "pong" string, after seeing "pong" string, it removes /var/run/tapback..statefile and continues to work.

@chunjiez
Copy link
Contributor Author

chunjiez commented Jan 9, 2026

The xenopsd side code update, xapi-project/xen-api#6825

@chunjiez
Copy link
Contributor Author

chunjiez commented Jan 9, 2026

add @MarkSymsCtx @TimSmithCtx @LunfanZhang to review

Copy link
Contributor

@MarkSymsCtx MarkSymsCtx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This all feels like it is adding complexity and risk to an already fragile part of the VM start process. I can easily see the tapback process getting stuck waiting for the response. I think we need to find a better way of achieving the aims of this pair of PRs.

err = errno;
goto fail;
}
err = fprintf(fp, "ping");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should follow the precendent set by the PV ring handshake and use integer state values.

@lindig
Copy link

lindig commented Jan 9, 2026

Independent of the correct solution, the explanation that is part of this PR should be part of the commit message.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants