CA-395093: sync between tapback slave process and xenopsd #6825

chunjiez · 2026-01-09T07:44:28Z

In some corner case, physical-device-path xenstore watch event is fired before slave tapback process ready to process xenstore watch event, thus, slave tapback process would miss xenstore watch event, then blktap io datapath fails to establish.

In xenopsd side, the vbd-script waits for tapback slave process ready by checking /var/run/tapback..statefile, if the file is present and file contains "ping" string, then vbd-script updates the file, writes "pong" to the file and continues to update xenstore, otherwise, just wait.

In tapback slave process side, once it get prepared to process xenstore watch event, it writes "ping" string to /var/run/tapback..statefile, then waits for acknowledge by checking if the file contains "pong" string, after seeing "pong" string, it removes /var/run/tapback..statefile and continues to work.

ocaml/xenopsd/scripts/block

Signed-off-by: Chunjie Zhu <[email protected]>

chunjiez · 2026-01-09T08:12:09Z

The tapback side code update, xapi-project/blktap#435

chunjiez · 2026-01-09T08:14:41Z

add @MarkSymsCtx @TimSmithCtx @LunfanZhang @minglumlu @BengangY to review

BengangY · 2026-01-09T09:20:22Z

ocaml/xenopsd/scripts/block

+wait_tapback_ready()
+{
+	local statefile="/var/run/tapback.${DOMID}.statefile"
+	while true; do


Is it possible the while never exits if statefile fails to be created? Consider to add a timeout for max number of retries.

A simple way would be:

seq 120 | while read i; do ... sleep 1 done

This would iterate at most 120 times or (2min)

I've already commented on the related PR to tapdisk that I don't think this is the correct approach and it just adds even more complexity and fargility to the system which will induce even more cost of maintenance going forward.

@MarkSymsCtx could you link to that PR. What is a generally better approach?

xapi-project/blktap#435 @lindig

It seems correct to me that this side needs to be sure that tapback is ready as otherwise tapback won't be able to process events. So checking for readiness for a limited time and failing if tapback is not ready seems to me not fragile and I would be curious how it could be avoided @MarkSymsCtx.

If we generally expect tapback to be available and ready, we should only wait briefly before we fail. I agree with @BengangY that we should not wait indefinitely.

github-advanced-security bot found potential problems Jan 9, 2026

View reviewed changes

ocaml/xenopsd/scripts/block Fixed Show fixed Hide fixed

ocaml/xenopsd/scripts/block Fixed Show fixed Hide fixed

ocaml/xenopsd/scripts/block Fixed Show fixed Hide fixed

chunjiez mentioned this pull request Jan 9, 2026

CA-395093: sync between tapback slave process and xenopsd xapi-project/blktap#435

Open

CA-395093: sync between tapback slave process and xenopsd

ce46367

Signed-off-by: Chunjie Zhu <[email protected]>

chunjiez force-pushed the master branch from 296ccfa to ce46367 Compare January 9, 2026 07:52

BengangY reviewed Jan 9, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

CA-395093: sync between tapback slave process and xenopsd #6825

CA-395093: sync between tapback slave process and xenopsd #6825

Uh oh!

chunjiez commented Jan 9, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

chunjiez commented Jan 9, 2026

Uh oh!

chunjiez commented Jan 9, 2026

Uh oh!

BengangY Jan 9, 2026

Uh oh!

lindig Jan 9, 2026 •

edited

Loading

Uh oh!

MarkSymsCtx Jan 9, 2026

Uh oh!

lindig Jan 9, 2026

Uh oh!

BengangY Jan 9, 2026

Uh oh!

lindig Jan 9, 2026

Uh oh!

lindig Jan 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

CA-395093: sync between tapback slave process and xenopsd #6825

Are you sure you want to change the base?

CA-395093: sync between tapback slave process and xenopsd #6825

Uh oh!

Conversation

chunjiez commented Jan 9, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

chunjiez commented Jan 9, 2026

Uh oh!

chunjiez commented Jan 9, 2026

Uh oh!

BengangY Jan 9, 2026

Choose a reason for hiding this comment

Uh oh!

lindig Jan 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

MarkSymsCtx Jan 9, 2026

Choose a reason for hiding this comment

Uh oh!

lindig Jan 9, 2026

Choose a reason for hiding this comment

Uh oh!

BengangY Jan 9, 2026

Choose a reason for hiding this comment

Uh oh!

lindig Jan 9, 2026

Choose a reason for hiding this comment

Uh oh!

lindig Jan 9, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

lindig Jan 9, 2026 •

edited

Loading