Skip to content

Conversation

@davidben
Copy link
Contributor

guard_acquire is a helper function used to implement TSan's __cxa_guard_acquire and pthread_once interceptors. https://reviews.llvm.org/D54664 introduced optional hooks to support cooperative multi-threading. It worked by marking the entire guard_acquire call as a potentially blocking region.

In principle, only the contended case needs to be a potentially blocking region. This didn't matter for __cxa_guard_acquire because the compiler emits an inline fast path before calling __cxa_guard_acquire. That is, once we call __cxa_guard_acquire at all, we know we're in the contended case.

https://reviews.llvm.org/D107359 then unified the __cxa_guard_acquire and pthread_once interceptors, adding the hooks to pthread_once. However, unlike __cxa_guard_acquire, pthread_once callers are not expected to have an inline fast path. The fast path is inside the function.

As a result, TSan unnecessarily calls into the cooperative multi-threading engine on every pthread_once call, despite applications generally expecting pthread_once to be fast after initialization. Fix this by deferring the hooks to the contended case inside guard_acquire.

… region

guard_acquire is a helper function used to implement TSan's
__cxa_guard_acquire and pthread_once interceptors.
https://reviews.llvm.org/D54664 introduced optional hooks to support
cooperative multi-threading. It worked by marking the entire
guard_acquire call as a potentially blocking region.

In principle, only the contended case needs to be a potentially blocking
region. This didn't matter for __cxa_guard_acquire because the compiler
emits an inline fast path before calling __cxa_guard_acquire. That is,
once we call __cxa_guard_acquire at all, we know we're in the contended
case.

https://reviews.llvm.org/D107359 then unified the __cxa_guard_acquire
and pthread_once interceptors, adding the hooks to pthread_once.
However, unlike __cxa_guard_acquire, pthread_once callers are not
expected to have an inline fast path. The fast path is inside the
function.

As a result, TSan unnecessarily calls into the cooperative
multi-threading engine on every pthread_once call, despite applications
generally expecting pthread_once to be fast after initialization. Fix
this by deferring the hooks to the contended case inside guard_acquire.
@llvmbot
Copy link
Member

llvmbot commented Mar 21, 2025

@llvm/pr-subscribers-compiler-rt-sanitizer

Author: David Benjamin (davidben)

Changes

guard_acquire is a helper function used to implement TSan's __cxa_guard_acquire and pthread_once interceptors. https://reviews.llvm.org/D54664 introduced optional hooks to support cooperative multi-threading. It worked by marking the entire guard_acquire call as a potentially blocking region.

In principle, only the contended case needs to be a potentially blocking region. This didn't matter for __cxa_guard_acquire because the compiler emits an inline fast path before calling __cxa_guard_acquire. That is, once we call __cxa_guard_acquire at all, we know we're in the contended case.

https://reviews.llvm.org/D107359 then unified the __cxa_guard_acquire and pthread_once interceptors, adding the hooks to pthread_once. However, unlike __cxa_guard_acquire, pthread_once callers are not expected to have an inline fast path. The fast path is inside the function.

As a result, TSan unnecessarily calls into the cooperative multi-threading engine on every pthread_once call, despite applications generally expecting pthread_once to be fast after initialization. Fix this by deferring the hooks to the contended case inside guard_acquire.


Full diff: https://github.com/llvm/llvm-project/pull/132477.diff

2 Files Affected:

  • (modified) compiler-rt/lib/tsan/rtl/tsan_interceptors_posix.cpp (+9-5)
  • (modified) compiler-rt/test/tsan/cxa_guard_acquire.cpp (+70-6)
diff --git a/compiler-rt/lib/tsan/rtl/tsan_interceptors_posix.cpp b/compiler-rt/lib/tsan/rtl/tsan_interceptors_posix.cpp
index f671c8167a3c4..6d79b80593379 100644
--- a/compiler-rt/lib/tsan/rtl/tsan_interceptors_posix.cpp
+++ b/compiler-rt/lib/tsan/rtl/tsan_interceptors_posix.cpp
@@ -892,10 +892,9 @@ constexpr u32 kGuardWaiter = 1 << 17;
 
 static int guard_acquire(ThreadState *thr, uptr pc, atomic_uint32_t *g,
                          bool blocking_hooks = true) {
-  if (blocking_hooks)
-    OnPotentiallyBlockingRegionBegin();
-  auto on_exit = at_scope_exit([blocking_hooks] {
-    if (blocking_hooks)
+  bool in_potentially_blocking_region = false;
+  auto on_exit = at_scope_exit([&] {
+    if (in_potentially_blocking_region)
       OnPotentiallyBlockingRegionEnd();
   });
 
@@ -912,8 +911,13 @@ static int guard_acquire(ThreadState *thr, uptr pc, atomic_uint32_t *g,
     } else {
       if ((cmp & kGuardWaiter) ||
           atomic_compare_exchange_strong(g, &cmp, cmp | kGuardWaiter,
-                                         memory_order_relaxed))
+                                         memory_order_relaxed)) {
+        if (blocking_hooks && !in_potentially_blocking_region) {
+          in_potentially_blocking_region = true;
+          OnPotentiallyBlockingRegionBegin();
+        }
         FutexWait(g, cmp | kGuardWaiter);
+      }
     }
   }
 }
diff --git a/compiler-rt/test/tsan/cxa_guard_acquire.cpp b/compiler-rt/test/tsan/cxa_guard_acquire.cpp
index 100a40b281410..fc407259e8968 100644
--- a/compiler-rt/test/tsan/cxa_guard_acquire.cpp
+++ b/compiler-rt/test/tsan/cxa_guard_acquire.cpp
@@ -1,31 +1,95 @@
 // RUN: %clangxx_tsan -O1 %s -o %t && %run %t 2>&1 | FileCheck %s
 
+#include "test.h"
+#include <assert.h>
 #include <sanitizer/tsan_interface.h>
 #include <stdio.h>
 
+// We only enter a potentially blocking region on thread contention. To reliably
+// trigger this, we force the initialization function to block until another
+// thread has entered the potentially blocking region.
+
+static bool init_done = false;
+
 namespace __tsan {
 
 #if (__APPLE__)
 __attribute__((weak))
 #endif
 void OnPotentiallyBlockingRegionBegin() {
-  printf("Enter __cxa_guard_acquire\n");
+  assert(!init_done);
+  printf("Enter potentially blocking region\n");
+  // Signal the other thread to finish initialization.
+  barrier_wait(&barrier);
 }
 
 #if (__APPLE__)
 __attribute__((weak))
 #endif
-void OnPotentiallyBlockingRegionEnd() { printf("Exit __cxa_guard_acquire\n"); }
+void OnPotentiallyBlockingRegionEnd() {
+  printf("Exit potentially blocking region\n");
+}
 
 } // namespace __tsan
 
+struct LazyInit {
+  LazyInit() {
+    assert(!init_done);
+    printf("Enter constructor\n");
+    // Wait for the other thread to get to the blocking region.
+    barrier_wait(&barrier);
+    printf("Exit constructor\n");
+  }
+};
+
+const LazyInit &get_lazy_init() {
+  static const LazyInit lazy_init;
+  return lazy_init;
+}
+
+void *thread(void *arg) {
+  get_lazy_init();
+  return nullptr;
+}
+
+struct LazyInit2 {
+  LazyInit2() { printf("Enter constructor 2\n"); }
+};
+
+const LazyInit2 &get_lazy_init2() {
+  static const LazyInit2 lazy_init2;
+  return lazy_init2;
+}
+
 int main(int argc, char **argv) {
   // CHECK: Enter main
   printf("Enter main\n");
-  // CHECK-NEXT: Enter __cxa_guard_acquire
-  // CHECK-NEXT: Exit __cxa_guard_acquire
-  static int s = argc;
-  (void)s;
+
+  // If initialization is contended, the blocked thread should enter a
+  // potentially blocking region.
+  //
+  // CHECK-NEXT: Enter constructor
+  // CHECK-NEXT: Enter potentially blocking region
+  // CHECK-NEXT: Exit constructor
+  // CHECK-NEXT: Exit potentially blocking region
+  barrier_init(&barrier, 2);
+  pthread_t th1, th2;
+  pthread_create(&th1, nullptr, thread, nullptr);
+  pthread_create(&th2, nullptr, thread, nullptr);
+  pthread_join(th1, nullptr);
+  pthread_join(th2, nullptr);
+
+  // Now that the value has been initialized, subsequent calls should not enter
+  // a potentially blocking region.
+  init_done = true;
+  get_lazy_init();
+
+  // If uncontended, there is no potentially blocking region.
+  //
+  // CHECK-NEXT: Enter constructor 2
+  get_lazy_init2();
+  get_lazy_init2();
+
   // CHECK-NEXT: Exit main
   printf("Exit main\n");
   return 0;

@nico nico merged commit e6de45a into llvm:main Mar 24, 2025
14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants