[lldb-dap] Retry unbuffered reads. #165823

ashgti · 2025-10-31T04:04:47Z

We need to use an unbuffered socket / pipe to the lldb-dap subprocess to make sure that when we perform a select, there is no partial messages in the file buffer. If there are partial messages in the file buffer we could end up doing a read that crosses the buffer size and results in a hang until a new message is sent, that may or may not occur.

As a result, when we perform a read we can get less than the requested number of bytes. We need to retry the read operation if this occurs.

I think this should resolve issue #165784

We need to use an unbuffered socket / pipe to the lldb-dap subprocess to make sure that when we perform a select, there is no partial messages in the file buffer. If there are partial messages in the file buffer we could end up doing a read that crosses the buffer size and results in a hang until a new message is sent, that may or may not occur. As a result, when we perform a read we can get less than the requested number of bytes. We need to retry the read operation if this occurs.

llvmbot · 2025-10-31T04:05:23Z

@llvm/pr-subscribers-lldb

Author: John Harrison (ashgti)

Changes

We need to use an unbuffered socket / pipe to the lldb-dap subprocess to make sure that when we perform a select, there is no partial messages in the file buffer. If there are partial messages in the file buffer we could end up doing a read that crosses the buffer size and results in a hang until a new message is sent, that may or may not occur.

As a result, when we perform a read we can get less than the requested number of bytes. We need to retry the read operation if this occurs.

I think this should resolve issue #165784

Full diff: https://github.com/llvm/llvm-project/pull/165823.diff

1 Files Affected:

(modified) lldb/packages/Python/lldbsuite/test/tools/lldb-dap/dap_server.py (+13-2)

diff --git a/lldb/packages/Python/lldbsuite/test/tools/lldb-dap/dap_server.py b/lldb/packages/Python/lldbsuite/test/tools/lldb-dap/dap_server.py
index 8f3652172dfdf..5000a8ec0a0ce 100644
--- a/lldb/packages/Python/lldbsuite/test/tools/lldb-dap/dap_server.py
+++ b/lldb/packages/Python/lldbsuite/test/tools/lldb-dap/dap_server.py
@@ -220,6 +220,11 @@ def _read_packet(
         followed by the JSON bytes from self.recv. Returns None on EOF.
         """
 
+        # NOTE: We open the socket or pipe to the subprocess in an unbuffered
+        # mode to ensure we do not end up with a partial message in the buffer
+        # when we perform our select. Otherwise, we may run into a case where we
+        # attempt a read and the buffer is partially full but a new message is
+        # not sent to fill in the requested buffer size.
         ready = self.selector.select(timeout)
         if not ready:
             warnings.warn(
@@ -243,10 +248,16 @@ def _read_packet(
             if separator != "":
                 Exception("malformed DAP content header, unexpected line: " + separator)
             # Read JSON bytes
-            json_str = self.recv.read(length).decode()
+            # NOTE: Because the read channel is unbuffered we may receive less
+            # than the requested number of bytes. In, which case we need to
+            # perform a new read.
+            json_str = b""
+            while len(json_str) < length:
+                json_str += self.recv.read(length - len(json_str))
+
             if self.trace_file:
                 self.trace_file.write(
-                    "%s from adapter:\n%s\n" % (time.time(), json_str)
+                    "%s from adapter:\n%s\n" % (time.time(), json_str.decode())
                 )
             # Decode the JSON bytes into a python dictionary
             return json.loads(json_str)

DavidSpickett

If this in theory makes the tests more stable and does not defeat the point of them in the process, fine with me.

Someone more familiar with lldb-dap should approve.

(I'm still going to make changes to Linaro's Arm bot, but stability has been an issue elsewhere so I'm sure this change will help some anyway)

JDevlieghere

The change LGTM, but it makes me wonder if we should adopt non-blocking reads like we do in JSONTransport. We already have a thread queueing up ProtocolMessages, so maybe it's time to double down on that and prevent anyone from doing a blocking read directly?

That would also help with the randomized delay idea I posted yesterday (where we could add the messages to the queue after a random amount of time to simulate slow CI nodes and non-deterministic ordering between request/responses and events).

ashgti · 2025-10-31T17:08:54Z

A further improvement would be to switch to the python asyncio library for making the reads truly async. I had prototyped that some in ashgti@4468af2 but it was getting to be pretty big/invasive so I was trying to make this into a more targeted change, which is where I switched to using the selectors module to do this without any of the other async parts.

Its a bit of a refactor to push the async operations down from the read but you can use asyncio.loop.run_until_complete docs to block a thread until a task/coroutine is complete.

JDevlieghere · 2025-10-31T17:19:12Z

Is there a middle ground that does async reads but still provides an abstraction over that that's not necessarily async, or is that just going to leave us in a "worst-of-both-worlds" situation? While I'm sure that using an async model would be a better fit for the kinds of tests we write for DAP, I'm also somewhat worried that it's not a common paradigm in LLDB.

I think a contributing factor to the instability of the DAP tests is that we (LLDB developers) are not really used to writing tests that have this asynchronous nature. Finding a principled way to do this (with something like asyncio) may force that, or it may lead to the same kind of issues if we use it wrong. I'm not arguing against it, just trying to think out loud about how we can make DAP tests easy to write correctly and hard to write incorrectly.

ashgti · 2025-10-31T17:36:29Z

I also think that it would be beneficial to move towards waiting on specific state instead of on specific events. E.g. most of the places we use wait_for_event could be updated to track and check a specific state.

When we trigger a request, any events emitted during the request COULD come before or after the response, depending on the timing.

Moving to tracking state specifically, could help improve this since we could then better handle out of order packets.

ashgti requested a review from JDevlieghere as a code owner October 31, 2025 04:04

ashgti requested a review from da-viper October 31, 2025 04:04

llvmbot added the lldb label Oct 31, 2025

ashgti assigned DavidSpickett and unassigned DavidSpickett Oct 31, 2025

ashgti requested a review from DavidSpickett October 31, 2025 04:05

DavidSpickett reviewed Oct 31, 2025

View reviewed changes

da-viper approved these changes Oct 31, 2025

View reviewed changes

JDevlieghere approved these changes Oct 31, 2025

View reviewed changes

ashgti mentioned this pull request Oct 31, 2025

LLDB tests hanging on premerge runners #165887

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[lldb-dap] Retry unbuffered reads. #165823

[lldb-dap] Retry unbuffered reads. #165823

ashgti commented Oct 31, 2025

Uh oh!

llvmbot commented Oct 31, 2025

Uh oh!

DavidSpickett left a comment

Uh oh!

JDevlieghere left a comment •

edited

Loading

Uh oh!

ashgti commented Oct 31, 2025

Uh oh!

JDevlieghere commented Oct 31, 2025

Uh oh!

ashgti commented Oct 31, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

[lldb-dap] Retry unbuffered reads. #165823

Are you sure you want to change the base?

[lldb-dap] Retry unbuffered reads. #165823

Conversation

ashgti commented Oct 31, 2025

Uh oh!

llvmbot commented Oct 31, 2025

Uh oh!

DavidSpickett left a comment

Choose a reason for hiding this comment

Uh oh!

JDevlieghere left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ashgti commented Oct 31, 2025

Uh oh!

JDevlieghere commented Oct 31, 2025

Uh oh!

ashgti commented Oct 31, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

JDevlieghere left a comment •

edited

Loading