Skip to content

Conversation

@lhames
Copy link
Contributor

@lhames lhames commented Oct 20, 2025

…fixes.

This reapplies c8c86ef, which was reverted in 13ca872 due to bot failures. Those failures were compilation errors exposed when LLVM is built with LLVM_ENABLE_EXPENSIVE_CHECKS=On. They have been fixed in this commit.

@github-actions
Copy link

github-actions bot commented Oct 20, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

This reapplies c8c86ef, which was reverted in 13ca872 due to bot
failures. Those failures were compilation errors exposed when LLVM is built
with LLVM_ENABLE_EXPENSIVE_CHECKS=On. They have been fixed in this commit.
@lhames lhames force-pushed the orc-waitingongraph branch from ff8e0fb to 1cf8ffd Compare October 20, 2025 23:51
@lhames lhames merged commit 9173846 into llvm:main Oct 21, 2025
10 checks passed
@lhames lhames deleted the orc-waitingongraph branch October 21, 2025 00:40
googlewalt added a commit to googlewalt/llvm-project that referenced this pull request Oct 21, 2025
This fixes the following error we're getting internally, perhaps only
with the latest clang:

llvm/include/llvm/ExecutionEngine/Orc/WaitingOnGraph.h:136:24: error: missing '#include <__fwd/vector.h>'; default argument of 'vector' must be defined before it is used

This is a fix for llvm#164340.
googlewalt added a commit that referenced this pull request Oct 21, 2025
This fixes the following error we're getting internally, perhaps only
with the latest clang:

llvm/include/llvm/ExecutionEngine/Orc/WaitingOnGraph.h:136:24: error:
missing '#include <__fwd/vector.h>'; default argument of 'vector' must
be defined before it is used

This is a fix for #164340.
googlewalt added a commit to googlewalt/llvm-project that referenced this pull request Oct 21, 2025
This fixes a potentially unused variable that's only used in an
assert.

This is a fix for llvm#164340.
googlewalt added a commit that referenced this pull request Oct 21, 2025
This fixes a potentially unused variable that's only used in an assert.

This is a fix for #164340.
@googlewalt
Copy link
Contributor

googlewalt commented Oct 24, 2025

We're seeing test failures when tsan is enabled. Attached is a scrubbed backtrace with the relevant information: test.log. Can you please take a look?

@lhames
Copy link
Contributor Author

lhames commented Oct 24, 2025

We're seeing test failures when tsan is enabled. Attached is a scrubbed backtrace with the relevant information: test.log. Can you please take a look?

Hi @googlewalt. I'm traveling at the moment and not in a position to reproduce, but I suspect this is caused by the assert at the top of AsynchronousSymbolQuery::handleComplete. Any chance you could try the following:

diff --git a/llvm/lib/ExecutionEngine/Orc/Core.cpp b/llvm/lib/ExecutionEngine/Orc/Core.cpp
index 8d413a35f5a9..b82d27b096e6 100644
--- a/llvm/lib/ExecutionEngine/Orc/Core.cpp
+++ b/llvm/lib/ExecutionEngine/Orc/Core.cpp
@@ -230,8 +230,12 @@ void AsynchronousSymbolQuery::notifySymbolMetRequiredState(
 }
 
 void AsynchronousSymbolQuery::handleComplete(ExecutionSession &ES) {
-  assert(OutstandingSymbolsCount == 0 &&
+#ifndef NDEBUG
+  ES.runSessionLocked([this]() {
+    assert(OutstandingSymbolsCount == 0 &&
          "Symbols remain, handleComplete called prematurely");
+  });
+#endif // NDEBUG
 
   class RunQueryCompleteTask : public Task {
   public:

@rupprecht
Copy link
Collaborator

The race is from decrementing OutstandingSymbolsCount here:

--OutstandingSymbolsCount;

Which happens after a different thread reads it as part of UQ->isComplete():

for (auto &UQ : EmitQueries->Updated)
if (UQ->isComplete())
UQ->handleComplete(*this);

Previously, that call to isComplete() was only in an assert (assert(Q->isComplete() && "Q is not complete")), so I wonder if this is exposing an existing race issue -- we only run these tests in tsan mode w/ assertions disabled.

@rupprecht
Copy link
Collaborator

Previously, that call to isComplete() was only in an assert (assert(Q->isComplete() && "Q is not complete")), so I wonder if this is exposing an existing race issue -- we only run these tests in tsan mode w/ assertions disabled.

Nope. Prior to this change, tsan tests pass even with that assert changed to this:

  for (auto &Q : *CompletedQueries) {
    if (!Q->isComplete())
      dbgs() << "Q is not complete";
    Q->handleComplete(*this);
  }

So this seems like a new race somehow.

Anyway, at trunk, this fix seems to make the tsan errors go away, although I have no clue if it's a good change:

  runSessionLocked([&]() {
    for (auto &UQ : EmitQueries->Updated)
      if (UQ->isComplete())
        UQ->handleComplete(*this);
  });

WDYT?

@lhames
Copy link
Contributor Author

lhames commented Oct 24, 2025

Which happens after a different thread reads it as part of UQ->isComplete():...

Ahh. Thanks for identifying that!

Anyway, at trunk, this fix seems to make the tsan errors go away, although I have no clue if it's a good change:

  runSessionLocked([&]() {
    for (auto &UQ : EmitQueries->Updated)
      if (UQ->isComplete())
        UQ->handleComplete(*this);
  });

WDYT?

We don't want that -- query handlers need to be run outside the session lock.

I think we'll just need to filter the EmitQueries->Updated list by Q->IsComplete() at the end of IL_emit just before we return. So that member should become EmitQueries::CompletedQueries, and the loop outside can become:

for (auto &CQ : EmitQueries->CompletedQueries)
  CQ->handleComplete(*this);

@rupprecht
Copy link
Collaborator

WDYT?

We don't want that -- query handlers need to be run outside the session lock.

I didn't think so, but it was worth a shot ;)

I think we'll just need to filter the EmitQueries->Updated list by Q->IsComplete() at the end of IL_emit just before we return. So that member should become EmitQueries::CompletedQueries, and the loop outside can become:

for (auto &CQ : EmitQueries->CompletedQueries)
  CQ->handleComplete(*this);

Created #165063 to do that, and it fixes the tsan issues.

@lhames
Copy link
Contributor Author

lhames commented Oct 25, 2025

Thanks very much for the fix!

googlewalt pushed a commit that referenced this pull request Oct 25, 2025
After #164340 there is a tsan race on `OutstandingSymbolsCount` when
decrementing it in `notifySymbolMetRequiredState` vs reading it in
`isComplete()`. Fix this by having `IL_emit` filter out non-completed
queries when it has the lock to do so, and that way we avoid needing to
call `isComplete()` later.
dvbuka pushed a commit to dvbuka/llvm-project that referenced this pull request Oct 27, 2025
After llvm#164340 there is a tsan race on `OutstandingSymbolsCount` when
decrementing it in `notifySymbolMetRequiredState` vs reading it in
`isComplete()`. Fix this by having `IL_emit` filter out non-completed
queries when it has the lock to do so, and that way we avoid needing to
call `isComplete()` later.
Lukacma pushed a commit to Lukacma/llvm-project that referenced this pull request Oct 29, 2025
…ith … (llvm#164340)

…fixes.

This reapplies c8c86ef, which was reverted in 13ca872 due to bot
failures. Those failures were compilation errors exposed when LLVM is
built with LLVM_ENABLE_EXPENSIVE_CHECKS=On. They have been fixed in this
commit.
Lukacma pushed a commit to Lukacma/llvm-project that referenced this pull request Oct 29, 2025
This fixes the following error we're getting internally, perhaps only
with the latest clang:

llvm/include/llvm/ExecutionEngine/Orc/WaitingOnGraph.h:136:24: error:
missing '#include <__fwd/vector.h>'; default argument of 'vector' must
be defined before it is used

This is a fix for llvm#164340.
Lukacma pushed a commit to Lukacma/llvm-project that referenced this pull request Oct 29, 2025
This fixes a potentially unused variable that's only used in an assert.

This is a fix for llvm#164340.
Lukacma pushed a commit to Lukacma/llvm-project that referenced this pull request Oct 29, 2025
After llvm#164340 there is a tsan race on `OutstandingSymbolsCount` when
decrementing it in `notifySymbolMetRequiredState` vs reading it in
`isComplete()`. Fix this by having `IL_emit` filter out non-completed
queries when it has the lock to do so, and that way we avoid needing to
call `isComplete()` later.
aokblast pushed a commit to aokblast/llvm-project that referenced this pull request Oct 30, 2025
…ith … (llvm#164340)

…fixes.

This reapplies c8c86ef, which was reverted in 13ca872 due to bot
failures. Those failures were compilation errors exposed when LLVM is
built with LLVM_ENABLE_EXPENSIVE_CHECKS=On. They have been fixed in this
commit.
aokblast pushed a commit to aokblast/llvm-project that referenced this pull request Oct 30, 2025
This fixes the following error we're getting internally, perhaps only
with the latest clang:

llvm/include/llvm/ExecutionEngine/Orc/WaitingOnGraph.h:136:24: error:
missing '#include <__fwd/vector.h>'; default argument of 'vector' must
be defined before it is used

This is a fix for llvm#164340.
aokblast pushed a commit to aokblast/llvm-project that referenced this pull request Oct 30, 2025
This fixes a potentially unused variable that's only used in an assert.

This is a fix for llvm#164340.
aokblast pushed a commit to aokblast/llvm-project that referenced this pull request Oct 30, 2025
After llvm#164340 there is a tsan race on `OutstandingSymbolsCount` when
decrementing it in `notifySymbolMetRequiredState` vs reading it in
`isComplete()`. Fix this by having `IL_emit` filter out non-completed
queries when it has the lock to do so, and that way we avoid needing to
call `isComplete()` later.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants