Skip to content

Commit 88558d5

Browse files
authored
Avoid stalls when MainLoop::Interrupt fails to wake up the MainLoop (#164905)
Turns out there's a bug in the current lldb sources that if you fork, set the stdio file handles to close on exec and then exec lldb with some commands and the `--batch` flag, lldb will stall on exit. The first cause of the bug is that the Python session handler - and probably other places in lldb - think 0, 1, and 2 HAVE TO BE the stdio file handles, and open and close and dup them as needed. NB: I am NOT trying to fix that bug. I'm not convinced running the lldb driver headless is worth a lot of effort, it's just as easy to redirect them to /dev/null, which does work. But I would like to keep lldb from stalling on the way out when this happens. The reason we stall is that we have a MainLoop waiting for signals, and we try to Interrupt it, but because stdio was closed, the interrupt pipe for the MainLoop gets the file descriptor 0, which gets closed by the Python session handler if you run some script command. So the Interrupt fails. We were running the Write to the interrupt pipe wrapped in `llvm::cantFail`, but in a no asserts build that just drops the error on the floor. So then lldb went on to call std::thread::join on the still active MainLoop, and that stalls I made Interrupt (and AddCallback & AddPendingCallback) return a bool for "interrupt success" instead. All the places where code was requesting termination, I added checks for that failure, and skip the std::thread::join call on the MainLoop thread, since that is almost certainly going to stall at this point. I didn't do the same for the Windows MainLoop, as I don't know if/when the WSASetEvent call can fail, so I always return true here. I also didn't turn the test off for Windows. According to the Python docs all the API's I used should work on Windows... If that turns out not to be true I'll make the test Darwin/Unix only.
1 parent 263377a commit 88558d5

File tree

14 files changed

+133
-45
lines changed

14 files changed

+133
-45
lines changed

lldb/include/lldb/Host/MainLoopBase.h

Lines changed: 14 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -57,18 +57,23 @@ class MainLoopBase {
5757
// Add a pending callback that will be executed once after all the pending
5858
// events are processed. The callback will be executed even if termination
5959
// was requested.
60-
void AddPendingCallback(const Callback &callback) {
61-
AddCallback(callback, std::chrono::steady_clock::time_point());
60+
// Returns false if an interrupt was needed to get the loop to act on the new
61+
// callback, but the interrupt failed, true otherwise. Mostly used when the
62+
// pending callback is a RequestTermination, since if the interrupt fails for
63+
// that callback, waiting for the MainLoop thread to terminate could stall.
64+
bool AddPendingCallback(const Callback &callback) {
65+
return AddCallback(callback, std::chrono::steady_clock::time_point());
6266
}
6367

6468
// Add a callback that will be executed after a certain amount of time has
65-
// passed.
66-
void AddCallback(const Callback &callback, std::chrono::nanoseconds delay) {
67-
AddCallback(callback, std::chrono::steady_clock::now() + delay);
69+
// passed. See AddPendingCallback comment for the return value.
70+
bool AddCallback(const Callback &callback, std::chrono::nanoseconds delay) {
71+
return AddCallback(callback, std::chrono::steady_clock::now() + delay);
6872
}
6973

7074
// Add a callback that will be executed after a given point in time.
71-
void AddCallback(const Callback &callback, TimePoint point);
75+
// See AddPendingCallback comment for the return value.
76+
bool AddCallback(const Callback &callback, TimePoint point);
7277

7378
// Waits for registered events and invoke the proper callbacks. Returns when
7479
// all callbacks deregister themselves or when someone requests termination.
@@ -85,8 +90,9 @@ class MainLoopBase {
8590

8691
virtual void UnregisterReadObject(IOObject::WaitableHandle handle) = 0;
8792

88-
// Interrupt the loop that is currently waiting for events.
89-
virtual void Interrupt() = 0;
93+
/// Interrupt the loop that is currently waiting for events. Return true if
94+
/// the interrupt succeeded, false if it failed.
95+
virtual bool Interrupt() = 0;
9096

9197
void ProcessCallbacks();
9298

lldb/include/lldb/Host/posix/MainLoopPosix.h

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -54,7 +54,7 @@ class MainLoopPosix : public MainLoopBase {
5454
void UnregisterReadObject(IOObject::WaitableHandle handle) override;
5555
void UnregisterSignal(int signo, std::list<Callback>::iterator callback_it);
5656

57-
void Interrupt() override;
57+
bool Interrupt() override;
5858

5959
private:
6060
void ProcessReadObject(IOObject::WaitableHandle handle);

lldb/include/lldb/Host/windows/MainLoopWindows.h

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -50,7 +50,7 @@ class MainLoopWindows : public MainLoopBase {
5050
protected:
5151
void UnregisterReadObject(IOObject::WaitableHandle handle) override;
5252

53-
void Interrupt() override;
53+
bool Interrupt() override;
5454

5555
private:
5656
llvm::Expected<size_t> Poll();

lldb/source/Host/common/MainLoopBase.cpp

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -12,8 +12,9 @@
1212
using namespace lldb;
1313
using namespace lldb_private;
1414

15-
void MainLoopBase::AddCallback(const Callback &callback, TimePoint point) {
15+
bool MainLoopBase::AddCallback(const Callback &callback, TimePoint point) {
1616
bool interrupt_needed;
17+
bool interrupt_succeeded = true;
1718
{
1819
std::lock_guard<std::mutex> lock{m_callback_mutex};
1920
// We need to interrupt the main thread if this callback is scheduled to
@@ -22,7 +23,8 @@ void MainLoopBase::AddCallback(const Callback &callback, TimePoint point) {
2223
m_callbacks.emplace(point, callback);
2324
}
2425
if (interrupt_needed)
25-
Interrupt();
26+
interrupt_succeeded = Interrupt();
27+
return interrupt_succeeded;
2628
}
2729

2830
void MainLoopBase::ProcessCallbacks() {

lldb/source/Host/posix/MainLoopPosix.cpp

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -387,10 +387,11 @@ void MainLoopPosix::ProcessSignal(int signo) {
387387
}
388388
}
389389

390-
void MainLoopPosix::Interrupt() {
390+
bool MainLoopPosix::Interrupt() {
391391
if (m_interrupting.exchange(true))
392-
return;
392+
return true;
393393

394394
char c = '.';
395-
cantFail(m_interrupt_pipe.Write(&c, 1));
395+
llvm::Expected<size_t> result = m_interrupt_pipe.Write(&c, 1);
396+
return result && *result != 0;
396397
}

lldb/source/Host/windows/MainLoopWindows.cpp

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -272,4 +272,6 @@ Status MainLoopWindows::Run() {
272272
return Status();
273273
}
274274

275-
void MainLoopWindows::Interrupt() { WSASetEvent(m_interrupt_event); }
275+
bool MainLoopWindows::Interrupt() {
276+
return WSASetEvent(m_interrupt_event);
277+
}

lldb/source/Plugins/Protocol/MCP/ProtocolServerMCP.cpp

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -133,11 +133,12 @@ llvm::Error ProtocolServerMCP::Stop() {
133133
}
134134

135135
// Stop the main loop.
136-
m_loop.AddPendingCallback(
136+
bool addition_succeeded = m_loop.AddPendingCallback(
137137
[](lldb_private::MainLoopBase &loop) { loop.RequestTermination(); });
138138

139-
// Wait for the main loop to exit.
140-
if (m_loop_thread.joinable())
139+
// Wait for the main loop to exit, but not if we didn't succeed in inserting
140+
// our pending callback or we'll wait forever.
141+
if (addition_succeeded && m_loop_thread.joinable())
141142
m_loop_thread.join();
142143

143144
m_accept_handles.clear();
Lines changed: 51 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,51 @@
1+
"""
2+
Test that if you exec lldb with the stdio file handles
3+
closed, it is able to exit without hanging.
4+
"""
5+
6+
7+
import lldb
8+
import os
9+
import sys
10+
import socket
11+
import fcntl
12+
13+
import lldbsuite.test.lldbutil as lldbutil
14+
from lldbsuite.test.lldbtest import *
15+
16+
17+
class TestDriverWithClosedSTDIO(TestBase):
18+
# If your test case doesn't stress debug info, then
19+
# set this to true. That way it won't be run once for
20+
# each debug info format.
21+
NO_DEBUG_INFO_TESTCASE = True
22+
23+
def test_run_lldb_and_wait(self):
24+
"""This test forks, closes the stdio channels and exec's lldb.
25+
Then it waits for it to exit and asserts it did that successfully"""
26+
pid = os.fork()
27+
if pid == 0:
28+
fcntl.fcntl(sys.stdin, fcntl.F_SETFD, fcntl.FD_CLOEXEC)
29+
fcntl.fcntl(sys.stdout, fcntl.F_SETFD, fcntl.FD_CLOEXEC)
30+
fcntl.fcntl(sys.stderr, fcntl.F_SETFD, fcntl.FD_CLOEXEC)
31+
lldb = lldbtest_config.lldbExec
32+
print(f"About to run: {lldb}")
33+
os.execlp(
34+
lldb,
35+
lldb,
36+
"-x",
37+
"-o",
38+
"script print(lldb.debugger.GetNumTargets())",
39+
"--batch",
40+
)
41+
else:
42+
if pid == -1:
43+
print("Couldn't fork a process.")
44+
return
45+
ret_pid, status = os.waitpid(pid, 0)
46+
# We're really just checking that lldb doesn't stall.
47+
# At the time this test was written, if you close stdin
48+
# in an asserts build, lldb aborts. So handle both
49+
# of those cases. The failure will just be that the
50+
# waitpid doesn't return, and the test times out.
51+
self.assertFalse(os.WIFSTOPPED(status), "We either exited or crashed.")

lldb/tools/driver/Driver.cpp

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -902,9 +902,10 @@ int main(int argc, char const *argv[]) {
902902
}
903903

904904
#if !defined(_WIN32)
905-
signal_loop.AddPendingCallback(
906-
[](MainLoopBase &loop) { loop.RequestTermination(); });
907-
signal_thread.join();
905+
// Try to interrupt the signal thread. If that succeeds, wait for it to exit.
906+
if (signal_loop.AddPendingCallback(
907+
[](MainLoopBase &loop) { loop.RequestTermination(); }))
908+
signal_thread.join();
908909
#endif
909910

910911
return exit_code;

lldb/tools/lldb-dap/DAP.cpp

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1104,9 +1104,11 @@ llvm::Error DAP::Loop() {
11041104
"unhandled packet");
11051105
}
11061106

1107-
m_loop.AddPendingCallback(
1108-
[](MainLoopBase &loop) { loop.RequestTermination(); });
1109-
thread.join();
1107+
// Don't wait to join the mainloop thread if our callback wasn't added
1108+
// successfully, or we'll wait forever.
1109+
if (m_loop.AddPendingCallback(
1110+
[](MainLoopBase &loop) { loop.RequestTermination(); }))
1111+
thread.join();
11101112

11111113
if (m_error_occurred)
11121114
return llvm::createStringError(llvm::inconvertibleErrorCode(),

0 commit comments

Comments
 (0)