-
Notifications
You must be signed in to change notification settings - Fork 15.1k
[lldb-dap] Retry unbuffered reads. #165823
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
We need to use an unbuffered socket / pipe to the lldb-dap subprocess to make sure that when we perform a select, there is no partial messages in the file buffer. If there are partial messages in the file buffer we could end up doing a read that crosses the buffer size and results in a hang until a new message is sent, that may or may not occur. As a result, when we perform a read we can get less than the requested number of bytes. We need to retry the read operation if this occurs.
|
@llvm/pr-subscribers-lldb Author: John Harrison (ashgti) ChangesWe need to use an unbuffered socket / pipe to the lldb-dap subprocess to make sure that when we perform a select, there is no partial messages in the file buffer. If there are partial messages in the file buffer we could end up doing a read that crosses the buffer size and results in a hang until a new message is sent, that may or may not occur. As a result, when we perform a read we can get less than the requested number of bytes. We need to retry the read operation if this occurs. I think this should resolve issue #165784 Full diff: https://github.com/llvm/llvm-project/pull/165823.diff 1 Files Affected:
diff --git a/lldb/packages/Python/lldbsuite/test/tools/lldb-dap/dap_server.py b/lldb/packages/Python/lldbsuite/test/tools/lldb-dap/dap_server.py
index 8f3652172dfdf..5000a8ec0a0ce 100644
--- a/lldb/packages/Python/lldbsuite/test/tools/lldb-dap/dap_server.py
+++ b/lldb/packages/Python/lldbsuite/test/tools/lldb-dap/dap_server.py
@@ -220,6 +220,11 @@ def _read_packet(
followed by the JSON bytes from self.recv. Returns None on EOF.
"""
+ # NOTE: We open the socket or pipe to the subprocess in an unbuffered
+ # mode to ensure we do not end up with a partial message in the buffer
+ # when we perform our select. Otherwise, we may run into a case where we
+ # attempt a read and the buffer is partially full but a new message is
+ # not sent to fill in the requested buffer size.
ready = self.selector.select(timeout)
if not ready:
warnings.warn(
@@ -243,10 +248,16 @@ def _read_packet(
if separator != "":
Exception("malformed DAP content header, unexpected line: " + separator)
# Read JSON bytes
- json_str = self.recv.read(length).decode()
+ # NOTE: Because the read channel is unbuffered we may receive less
+ # than the requested number of bytes. In, which case we need to
+ # perform a new read.
+ json_str = b""
+ while len(json_str) < length:
+ json_str += self.recv.read(length - len(json_str))
+
if self.trace_file:
self.trace_file.write(
- "%s from adapter:\n%s\n" % (time.time(), json_str)
+ "%s from adapter:\n%s\n" % (time.time(), json_str.decode())
)
# Decode the JSON bytes into a python dictionary
return json.loads(json_str)
|
DavidSpickett
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this in theory makes the tests more stable and does not defeat the point of them in the process, fine with me.
Someone more familiar with lldb-dap should approve.
(I'm still going to make changes to Linaro's Arm bot, but stability has been an issue elsewhere so I'm sure this change will help some anyway)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The change LGTM, but it makes me wonder if we should adopt non-blocking reads like we do in JSONTransport. We already have a thread queueing up ProtocolMessages, so maybe it's time to double down on that and prevent anyone from doing a blocking read directly?
That would also help with the randomized delay idea I posted yesterday (where we could add the messages to the queue after a random amount of time to simulate slow CI nodes and non-deterministic ordering between request/responses and events).
|
A further improvement would be to switch to the python Its a bit of a refactor to push the async operations down from the read but you can use |
|
Is there a middle ground that does async reads but still provides an abstraction over that that's not necessarily async, or is that just going to leave us in a "worst-of-both-worlds" situation? While I'm sure that using an async model would be a better fit for the kinds of tests we write for DAP, I'm also somewhat worried that it's not a common paradigm in LLDB. I think a contributing factor to the instability of the DAP tests is that we (LLDB developers) are not really used to writing tests that have this asynchronous nature. Finding a principled way to do this (with something like |
|
I also think that it would be beneficial to move towards waiting on specific state instead of on specific events. E.g. most of the places we use When we trigger a request, any events emitted during the request COULD come before or after the response, depending on the timing. Moving to tracking state specifically, could help improve this since we could then better handle out of order packets. |
We need to use an unbuffered socket / pipe to the lldb-dap subprocess to make sure that when we perform a select, there is no partial messages in the file buffer. If there are partial messages in the file buffer we could end up doing a read that crosses the buffer size and results in a hang until a new message is sent, that may or may not occur.
As a result, when we perform a read we can get less than the requested number of bytes. We need to retry the read operation if this occurs.
I think this should resolve issue #165784