Skip to content

Conversation

guitargeek
Copy link
Contributor

@guitargeek guitargeek commented Sep 1, 2025

The RecurseCountsThreadLocal makes the assumption that only two instances are ever created per process. Therefore, its implementation can be optimized to the point that it should be as fast or faster than RecurseCountsTBBUnique, avoiding the Core dependence on TBB without compromising performance.

@guitargeek guitargeek self-assigned this Sep 1, 2025
@guitargeek guitargeek requested a review from dpiparo as a code owner September 1, 2025 23:43
Copy link

github-actions bot commented Sep 2, 2025

Test Results

    16 files      16 suites   2d 13h 47m 55s ⏱️
 3 618 tests  3 617 ✅ 0 💤 1 ❌
57 538 runs  57 537 ✅ 0 💤 1 ❌

For more details on these failures, see this check.

Results for commit d7797bc.

♻️ This comment has been updated with latest results.

@guitargeek guitargeek requested a review from pcanal as a code owner September 2, 2025 11:58
@guitargeek guitargeek changed the title [ci] Check what happens when building with tbb=ON [core] Implement RecurseCountsThreadLocal to be used in gCoreMutex Sep 2, 2025
@bendavid
Copy link
Contributor

bendavid commented Sep 2, 2025

I'm confused, doesn't RecurseCountsThreadLocal still depend on tbb::enumerable_thread_specific in this case?

The `RecurseCountsThreadLocal` makes the assumption that only two
instances are ever created per process. Therefore, its implementation
can be optimized to the point that it *should* be as fast or faster than
`RecurseCountsTBBUnique`, avoiding the Core dependence on TBB without
compromising performance.
As described in root-project#19798, we have an old build option that was supposedly
removed, but it is still considered in the implementation of
`ROOT::gCoreMutex`.

It would be good to know what happens when testing ROOT with it (no
differences are expected from the performance optimization with TBB, and
actually it was always enabled when building with `builtin_tbb=ON` as
well, as we do for macOS and Windows).
@guitargeek
Copy link
Contributor Author

Oh, pardon I forgot to remove the tbb::enumerable_thread_specific<LocalCounts> fLocalCounts data member! It's updated now.

@bendavid
Copy link
Contributor

bendavid commented Sep 2, 2025

So there's a maybe longstanding issue here related to bd1894b and discussed in #6919 (comment)

Given that this is similar to the old implementation in

struct UniqueLockRecurseCount {
using Hint_t = TVirtualRWMutex::Hint_t;
struct LocalCounts {
size_t fReadersCount = 0;
bool fIsWriter = false;
};
size_t fWriteRecurse = 0; ///<! Number of re-entry in the lock by the same thread.
UniqueLockRecurseCount();
using local_t = LocalCounts*;
local_t GetLocal(){
TTHREAD_TLS_DECL(LocalCounts, gLocal);
return &gLocal;
}
Hint_t *IncrementReadCount(local_t &local) {
++(local->fReadersCount);
return reinterpret_cast<TVirtualRWMutex::Hint_t *>(&(local->fReadersCount));
}
template <typename MutexT>
Hint_t *IncrementReadCount(local_t &local, MutexT &) {
return IncrementReadCount(local);
}
Hint_t *DecrementReadCount(local_t &local) {
--(local->fReadersCount);
return reinterpret_cast<TVirtualRWMutex::Hint_t *>(&(local->fReadersCount));
}
template <typename MutexT>
Hint_t *DecrementReadCount(local_t &local, MutexT &) {
return DecrementReadCount(local);
}
void ResetReadCount(local_t &local, int newvalue) {
local->fReadersCount = newvalue;
}
bool IsCurrentWriter(local_t &local) { return local->fIsWriter; }
bool IsNotCurrentWriter(local_t &local) { return !local->fIsWriter; }
void SetIsWriter(local_t &local)
{
// if (fWriteRecurse == std::numeric_limits<decltype(fWriteRecurse)>::max()) {
// ::Fatal("TRWSpinLock::WriteLock", "Too many recursions in TRWSpinLock!");
// }
++fWriteRecurse;
local->fIsWriter = true;
}
void DecrementWriteCount() { --fWriteRecurse; }
void ResetIsWriter(local_t &local) { local->fIsWriter = false; }
size_t &GetLocalReadersCount(local_t &local) { return local->fReadersCount; }
};

@guitargeek
Copy link
Contributor Author

Oh! I see. Interesting

@guitargeek
Copy link
Contributor Author

I wonder how TBB avoids this problem that Philippe described...

@guitargeek guitargeek marked this pull request as draft September 2, 2025 12:52
@pcanal
Copy link
Member

pcanal commented Sep 2, 2025

I wonder how TBB avoids this problem that Philippe described...

The difference between UniqueLockRecurseCount and RecurseCounts (and likely the TBB implementation) was switching from using thread_local (which, if I recall correctly, requires/required via a call to tls_get_addr_tail to indirectly take the same lock as ...something else maybe related to library loading or more likely static initialization) to only using std::this_thread::get_id().

The real challenge with this area is that seeing it work in most (all?) cases does not tell us we don't have the problem ... as I was unable at the time to deterministically reproduce the issue (see log bd1894b).

At the very least, one should revert bd1894b (i.e. using the old RecurseCounts) to check if we see the problem. If we see the problem then we have some degree of confidence that having the alternative code not failing means that they might not have the same problem. If we do not see the problem then ... we can not tell :(

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants