-
Notifications
You must be signed in to change notification settings - Fork 296
CA-395093: sync between tapback slave process and xenopsd #6825
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Signed-off-by: Chunjie Zhu <[email protected]>
|
The tapback side code update, xapi-project/blktap#435 |
|
add @MarkSymsCtx @TimSmithCtx @LunfanZhang @minglumlu @BengangY to review |
| wait_tapback_ready() | ||
| { | ||
| local statefile="/var/run/tapback.${DOMID}.statefile" | ||
| while true; do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible the while never exits if statefile fails to be created? Consider to add a timeout for max number of retries.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A simple way would be:
seq 120 | while read i; do
...
sleep 1
done
This would iterate at most 120 times or (2min)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've already commented on the related PR to tapdisk that I don't think this is the correct approach and it just adds even more complexity and fargility to the system which will induce even more cost of maintenance going forward.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@MarkSymsCtx could you link to that PR. What is a generally better approach?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems correct to me that this side needs to be sure that tapback is ready as otherwise tapback won't be able to process events. So checking for readiness for a limited time and failing if tapback is not ready seems to me not fragile and I would be curious how it could be avoided @MarkSymsCtx.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we generally expect tapback to be available and ready, we should only wait briefly before we fail. I agree with @BengangY that we should not wait indefinitely.
In some corner case, physical-device-path xenstore watch event is fired before slave tapback process ready to process xenstore watch event, thus, slave tapback process would miss xenstore watch event, then blktap io datapath fails to establish.
In xenopsd side, the vbd-script waits for tapback slave process ready by checking /var/run/tapback..statefile, if the file is present and file contains "ping" string, then vbd-script updates the file, writes "pong" to the file and continues to update xenstore, otherwise, just wait.
In tapback slave process side, once it get prepared to process xenstore watch event, it writes "ping" string to /var/run/tapback..statefile, then waits for acknowledge by checking if the file contains "pong" string, after seeing "pong" string, it removes /var/run/tapback..statefile and continues to work.