-
Notifications
You must be signed in to change notification settings - Fork 2.2k
Try to make UFFD handlers more robust #5097
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Try to make UFFD handlers more robust #5097
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #5097 +/- ##
=======================================
Coverage 83.14% 83.14%
=======================================
Files 248 248
Lines 26925 26925
=======================================
Hits 22388 22388
Misses 4537 4537
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
According to our UFFD protocol, UFFD handlers negotiate with Firecracker during initialization and wait for it to send over a UDS the UFFD file descriptor along with the memory mappings that are being handled over the UFFD. During this handshake, our (testing only/not production grade) UFFD handlers issue what essentially is a `recvmsg` that should return with the UFFD fd and the mappings. Some times instead of the file descriptor, the `recvmsg` wrapper returns a `None` value for the file descriptor. When this happens, the UFFD handler crashes and Firecracker process hangs. According to `man recv(2)`: ``` Datagram sockets in various domains (e.g., the UNIX and Internet domains) permit zero-length datagrams. When such a datagram is received, the return value is 0. ``` which means it is possible to receive a zero-length message (we are communicating with Firecracker over a UDS). Add logic to our UFFD handlers to retry the negotiation with Firecracker up to 5 times before giving up. This helps making them (slightly) more robust. Also, we add some logging in the receive logic so that we can inspect failures post-mortem. Signed-off-by: Babis Chalios <[email protected]>
a3328e2 to
4c5359a
Compare
roypat
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's no chance that the body and the fds simply arrive in two distinct packets?
I would assume that |
Changes
Add logic to our UFFD handlers to retry the negotiation with Firecracker up to 5 times before giving up. This helps making them (slightly) more robust. Also, we add some logging in the receive logic so that we can inspect failures post-mortem.
Reason
According to our UFFD protocol, UFFD handlers negotiate with Firecracker during initialization and wait for it to send over a UDS the UFFD file descriptor along with the memory mappings that are being handled over the UFFD.
During this handshake, our (testing only/not production grade) UFFD handlers issue what essentially is a
recvmsgthat should return with the UFFD fd and the mappings. Some times instead of the file descriptor, therecvmsgwrapper returns aNonevalue for the file descriptor. When this happens, the UFFD handler crashes and Firecracker process hangs.According to
man recv(2):which means it is possible to receive a zero-length message (we are communicating with Firecracker over a UDS).
License Acceptance
By submitting this pull request, I confirm that my contribution is made under
the terms of the Apache 2.0 license. For more information on following Developer
Certificate of Origin and signing off your commits, please check
CONTRIBUTING.md.PR Checklist
tools/devtool checkstyleto verify that the PR passes theautomated style checks.
how they are solving the problem in a clear and encompassing way.
in the PR.
CHANGELOG.md.Runbook for Firecracker API changes.
integration tests.
TODO.rust-vmm.