Skip to content

Conversation

@GeorgeHuyubo
Copy link
Contributor

Fixes a deadlock that occurs during parallel module loading when symbol indexing
operations conflict with module list access. The deadlock happens when:

  • Thread A (symbol loading): holds Module.m_mutex → tries ModuleList.m_modules_mutex
  • Thread B (parallel module loading): holds ModuleList.m_modules_mutex → tries Module.m_mutex

Solution: Create a dedicated symbol thread pool separate from the main thread pool
used for parallel module loading. This isolates DWARF indexing operations from
module loading operations, preventing the mutex ordering conflict.

Changes:

  • Add Debugger::GetSymbolThreadPool() alongside existing GetThreadPool()
  • Update ManualDWARFIndex to use symbol thread pool for indexing operations
  • Initialize/cleanup symbol thread pool in Debugger::Initialize/Terminate

This maintains performance while eliminating the deadlock without major
architectural changes to the threading model.

@GeorgeHuyubo GeorgeHuyubo force-pushed the merge_bug branch 2 times, most recently from b4f6da6 to f9edaec Compare September 23, 2025 04:01
@llvm llvm deleted a comment from github-actions bot Sep 23, 2025
@GeorgeHuyubo GeorgeHuyubo marked this pull request as ready for review September 23, 2025 04:19
@llvmbot llvmbot added the lldb label Sep 23, 2025
@llvmbot
Copy link
Member

llvmbot commented Sep 23, 2025

@llvm/pr-subscribers-lldb

Author: None (GeorgeHuyubo)

Changes

Fixes a deadlock that occurs during parallel module loading when symbol indexing
operations conflict with module list access. The deadlock happens when:

  • Thread A (symbol loading): holds Module.m_mutex → tries ModuleList.m_modules_mutex
  • Thread B (parallel module loading): holds ModuleList.m_modules_mutex → tries Module.m_mutex

Solution: Create a dedicated symbol thread pool separate from the main thread pool
used for parallel module loading. This isolates DWARF indexing operations from
module loading operations, preventing the mutex ordering conflict.

Changes:

  • Add Debugger::GetSymbolThreadPool() alongside existing GetThreadPool()
  • Update ManualDWARFIndex to use symbol thread pool for indexing operations
  • Initialize/cleanup symbol thread pool in Debugger::Initialize/Terminate

This maintains performance while eliminating the deadlock without major
architectural changes to the threading model.


Full diff: https://github.com/llvm/llvm-project/pull/160225.diff

3 Files Affected:

  • (modified) lldb/include/lldb/Core/Debugger.h (+5)
  • (modified) lldb/source/Core/Debugger.cpp (+16)
  • (modified) lldb/source/Plugins/SymbolFile/DWARF/ManualDWARFIndex.cpp (+4-2)
diff --git a/lldb/include/lldb/Core/Debugger.h b/lldb/include/lldb/Core/Debugger.h
index 250ad64b76d9a..c3019ccb3077d 100644
--- a/lldb/include/lldb/Core/Debugger.h
+++ b/lldb/include/lldb/Core/Debugger.h
@@ -506,6 +506,11 @@ class Debugger : public std::enable_shared_from_this<Debugger>,
   /// Shared thread pool. Use only with ThreadPoolTaskGroup.
   static llvm::ThreadPoolInterface &GetThreadPool();
 
+  /// Dedicated symbol thread pool to prevent deadlock with module loading.
+  /// Use this for symbol indexing operations that might need to access
+  /// the shared module list while holding module mutexes.
+  static llvm::ThreadPoolInterface &GetSymbolThreadPool();
+
   /// Report warning events.
   ///
   /// Warning events will be delivered to any debuggers that have listeners
diff --git a/lldb/source/Core/Debugger.cpp b/lldb/source/Core/Debugger.cpp
index ed674ee1275c7..9e662d5b6a024 100644
--- a/lldb/source/Core/Debugger.cpp
+++ b/lldb/source/Core/Debugger.cpp
@@ -110,6 +110,7 @@ static std::recursive_mutex *g_debugger_list_mutex_ptr =
 static Debugger::DebuggerList *g_debugger_list_ptr =
     nullptr; // NOTE: intentional leak to avoid issues with C++ destructor chain
 static llvm::DefaultThreadPool *g_thread_pool = nullptr;
+static llvm::DefaultThreadPool *g_symbol_thread_pool = nullptr;
 
 static constexpr OptionEnumValueElement g_show_disassembly_enum_values[] = {
     {
@@ -715,6 +716,8 @@ void Debugger::Initialize(LoadPluginCallbackType load_plugin_callback) {
   g_debugger_list_mutex_ptr = new std::recursive_mutex();
   g_debugger_list_ptr = new DebuggerList();
   g_thread_pool = new llvm::DefaultThreadPool(llvm::optimal_concurrency());
+  g_symbol_thread_pool =
+      new llvm::DefaultThreadPool(llvm::optimal_concurrency());
   g_load_plugin_callback = load_plugin_callback;
 }
 
@@ -731,6 +734,13 @@ void Debugger::Terminate() {
   if (g_thread_pool) {
     // The destructor will wait for all the threads to complete.
     delete g_thread_pool;
+    g_thread_pool = nullptr;
+  }
+
+  if (g_symbol_thread_pool) {
+    // The destructor will wait for all the threads to complete.
+    delete g_symbol_thread_pool;
+    g_symbol_thread_pool = nullptr;
   }
 
   if (g_debugger_list_ptr && g_debugger_list_mutex_ptr) {
@@ -2383,3 +2393,9 @@ llvm::ThreadPoolInterface &Debugger::GetThreadPool() {
          "Debugger::GetThreadPool called before Debugger::Initialize");
   return *g_thread_pool;
 }
+
+llvm::ThreadPoolInterface &Debugger::GetSymbolThreadPool() {
+  assert(g_symbol_thread_pool &&
+         "Debugger::GetSymbolThreadPool called before Debugger::Initialize");
+  return *g_symbol_thread_pool;
+}
diff --git a/lldb/source/Plugins/SymbolFile/DWARF/ManualDWARFIndex.cpp b/lldb/source/Plugins/SymbolFile/DWARF/ManualDWARFIndex.cpp
index d90108f687f84..27298be688261 100644
--- a/lldb/source/Plugins/SymbolFile/DWARF/ManualDWARFIndex.cpp
+++ b/lldb/source/Plugins/SymbolFile/DWARF/ManualDWARFIndex.cpp
@@ -86,10 +86,12 @@ void ManualDWARFIndex::Index() {
                     total_progress, /*debugger=*/nullptr,
                     Progress::kDefaultHighFrequencyReportTime);
 
+  // Use separate symbol thread pool to avoid deadlock with module loading
   // Share one thread pool across operations to avoid the overhead of
   // recreating the threads.
-  llvm::ThreadPoolTaskGroup task_group(Debugger::GetThreadPool());
-  const size_t num_threads = Debugger::GetThreadPool().getMaxConcurrency();
+  llvm::ThreadPoolTaskGroup task_group(Debugger::GetSymbolThreadPool());
+  const size_t num_threads =
+      Debugger::GetSymbolThreadPool().getMaxConcurrency();
 
   // Run a function for each compile unit in parallel using as many threads as
   // are available. This is significantly faster than submiting a new task for

Copy link
Member

@JDevlieghere JDevlieghere left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain how having a separate thread pool addresses the issue? It's unclear to me how that would avoid the deadlock.

Regardless, I don't think we should have two thread pools as that pretty much defeats the purpose of using a pool in the first place. I think we should address the architectural issue. Would changing the ModuleList to use a R/W lock address this problem? That's something that has repeatedly come up in the past.

Copy link
Collaborator

@clayborg clayborg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto, I would change this patch to use a read/write lock. We definitely don't want a new thread pool, we must always use the one global thread pool for operations.

@GeorgeHuyubo
Copy link
Contributor Author

Fixes a deadlock that occurs during parallel module loading when symbol indexing operations conflict with module list access. The deadlock happens when:

  • Thread A (symbol loading): holds Module.m_mutex → tries ModuleList.m_modules_mutex
  • Thread B (parallel module loading): holds ModuleList.m_modules_mutex → tries Module.m_mutex

@JDevlieghere @clayborg More context for the deadlock.
bt.txt

Thread 1:
Module::PreloadSymbols()->context switch-> ModuleList::GetSharedModule()
Thread 2:
ModuleList::GetSharedModule() -> Module::GetUUID()

So separating the thread pool will avoid thread 1 from context switching hence fixing this deadlock.
I am not sure how switching to read/write lock will fix this though?
Module::PreloadSymbols() – need write lock
ModuleList::GetSharedModule() – need write lock
Module::GetUUID() – need write lock(Unless we don’t do lazy initialization?)

@jimingham
Copy link
Collaborator

It's a little tricky to deal with API's that are clearly "reader" API's, and only require the writer side of the lock if they haven't been filled in already.

I did a similar conversion (for much the same reason) with the StackFrameList a while back, where for things like GetFrameAtIndex, you would get the reader lock, then check whether the frame was realized, and only switch to the writer side if it wasn't.

You can probably do something of the same sort for these locks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants