-
Notifications
You must be signed in to change notification settings - Fork 15.4k
[tsan] Don't treat uncontended pthread_once as a potentially blocking region #132477
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
… region guard_acquire is a helper function used to implement TSan's __cxa_guard_acquire and pthread_once interceptors. https://reviews.llvm.org/D54664 introduced optional hooks to support cooperative multi-threading. It worked by marking the entire guard_acquire call as a potentially blocking region. In principle, only the contended case needs to be a potentially blocking region. This didn't matter for __cxa_guard_acquire because the compiler emits an inline fast path before calling __cxa_guard_acquire. That is, once we call __cxa_guard_acquire at all, we know we're in the contended case. https://reviews.llvm.org/D107359 then unified the __cxa_guard_acquire and pthread_once interceptors, adding the hooks to pthread_once. However, unlike __cxa_guard_acquire, pthread_once callers are not expected to have an inline fast path. The fast path is inside the function. As a result, TSan unnecessarily calls into the cooperative multi-threading engine on every pthread_once call, despite applications generally expecting pthread_once to be fast after initialization. Fix this by deferring the hooks to the contended case inside guard_acquire.
|
@llvm/pr-subscribers-compiler-rt-sanitizer Author: David Benjamin (davidben) Changesguard_acquire is a helper function used to implement TSan's __cxa_guard_acquire and pthread_once interceptors. https://reviews.llvm.org/D54664 introduced optional hooks to support cooperative multi-threading. It worked by marking the entire guard_acquire call as a potentially blocking region. In principle, only the contended case needs to be a potentially blocking region. This didn't matter for __cxa_guard_acquire because the compiler emits an inline fast path before calling __cxa_guard_acquire. That is, once we call __cxa_guard_acquire at all, we know we're in the contended case. https://reviews.llvm.org/D107359 then unified the __cxa_guard_acquire and pthread_once interceptors, adding the hooks to pthread_once. However, unlike __cxa_guard_acquire, pthread_once callers are not expected to have an inline fast path. The fast path is inside the function. As a result, TSan unnecessarily calls into the cooperative multi-threading engine on every pthread_once call, despite applications generally expecting pthread_once to be fast after initialization. Fix this by deferring the hooks to the contended case inside guard_acquire. Full diff: https://github.com/llvm/llvm-project/pull/132477.diff 2 Files Affected:
diff --git a/compiler-rt/lib/tsan/rtl/tsan_interceptors_posix.cpp b/compiler-rt/lib/tsan/rtl/tsan_interceptors_posix.cpp
index f671c8167a3c4..6d79b80593379 100644
--- a/compiler-rt/lib/tsan/rtl/tsan_interceptors_posix.cpp
+++ b/compiler-rt/lib/tsan/rtl/tsan_interceptors_posix.cpp
@@ -892,10 +892,9 @@ constexpr u32 kGuardWaiter = 1 << 17;
static int guard_acquire(ThreadState *thr, uptr pc, atomic_uint32_t *g,
bool blocking_hooks = true) {
- if (blocking_hooks)
- OnPotentiallyBlockingRegionBegin();
- auto on_exit = at_scope_exit([blocking_hooks] {
- if (blocking_hooks)
+ bool in_potentially_blocking_region = false;
+ auto on_exit = at_scope_exit([&] {
+ if (in_potentially_blocking_region)
OnPotentiallyBlockingRegionEnd();
});
@@ -912,8 +911,13 @@ static int guard_acquire(ThreadState *thr, uptr pc, atomic_uint32_t *g,
} else {
if ((cmp & kGuardWaiter) ||
atomic_compare_exchange_strong(g, &cmp, cmp | kGuardWaiter,
- memory_order_relaxed))
+ memory_order_relaxed)) {
+ if (blocking_hooks && !in_potentially_blocking_region) {
+ in_potentially_blocking_region = true;
+ OnPotentiallyBlockingRegionBegin();
+ }
FutexWait(g, cmp | kGuardWaiter);
+ }
}
}
}
diff --git a/compiler-rt/test/tsan/cxa_guard_acquire.cpp b/compiler-rt/test/tsan/cxa_guard_acquire.cpp
index 100a40b281410..fc407259e8968 100644
--- a/compiler-rt/test/tsan/cxa_guard_acquire.cpp
+++ b/compiler-rt/test/tsan/cxa_guard_acquire.cpp
@@ -1,31 +1,95 @@
// RUN: %clangxx_tsan -O1 %s -o %t && %run %t 2>&1 | FileCheck %s
+#include "test.h"
+#include <assert.h>
#include <sanitizer/tsan_interface.h>
#include <stdio.h>
+// We only enter a potentially blocking region on thread contention. To reliably
+// trigger this, we force the initialization function to block until another
+// thread has entered the potentially blocking region.
+
+static bool init_done = false;
+
namespace __tsan {
#if (__APPLE__)
__attribute__((weak))
#endif
void OnPotentiallyBlockingRegionBegin() {
- printf("Enter __cxa_guard_acquire\n");
+ assert(!init_done);
+ printf("Enter potentially blocking region\n");
+ // Signal the other thread to finish initialization.
+ barrier_wait(&barrier);
}
#if (__APPLE__)
__attribute__((weak))
#endif
-void OnPotentiallyBlockingRegionEnd() { printf("Exit __cxa_guard_acquire\n"); }
+void OnPotentiallyBlockingRegionEnd() {
+ printf("Exit potentially blocking region\n");
+}
} // namespace __tsan
+struct LazyInit {
+ LazyInit() {
+ assert(!init_done);
+ printf("Enter constructor\n");
+ // Wait for the other thread to get to the blocking region.
+ barrier_wait(&barrier);
+ printf("Exit constructor\n");
+ }
+};
+
+const LazyInit &get_lazy_init() {
+ static const LazyInit lazy_init;
+ return lazy_init;
+}
+
+void *thread(void *arg) {
+ get_lazy_init();
+ return nullptr;
+}
+
+struct LazyInit2 {
+ LazyInit2() { printf("Enter constructor 2\n"); }
+};
+
+const LazyInit2 &get_lazy_init2() {
+ static const LazyInit2 lazy_init2;
+ return lazy_init2;
+}
+
int main(int argc, char **argv) {
// CHECK: Enter main
printf("Enter main\n");
- // CHECK-NEXT: Enter __cxa_guard_acquire
- // CHECK-NEXT: Exit __cxa_guard_acquire
- static int s = argc;
- (void)s;
+
+ // If initialization is contended, the blocked thread should enter a
+ // potentially blocking region.
+ //
+ // CHECK-NEXT: Enter constructor
+ // CHECK-NEXT: Enter potentially blocking region
+ // CHECK-NEXT: Exit constructor
+ // CHECK-NEXT: Exit potentially blocking region
+ barrier_init(&barrier, 2);
+ pthread_t th1, th2;
+ pthread_create(&th1, nullptr, thread, nullptr);
+ pthread_create(&th2, nullptr, thread, nullptr);
+ pthread_join(th1, nullptr);
+ pthread_join(th2, nullptr);
+
+ // Now that the value has been initialized, subsequent calls should not enter
+ // a potentially blocking region.
+ init_done = true;
+ get_lazy_init();
+
+ // If uncontended, there is no potentially blocking region.
+ //
+ // CHECK-NEXT: Enter constructor 2
+ get_lazy_init2();
+ get_lazy_init2();
+
// CHECK-NEXT: Exit main
printf("Exit main\n");
return 0;
|
guard_acquire is a helper function used to implement TSan's __cxa_guard_acquire and pthread_once interceptors. https://reviews.llvm.org/D54664 introduced optional hooks to support cooperative multi-threading. It worked by marking the entire guard_acquire call as a potentially blocking region.
In principle, only the contended case needs to be a potentially blocking region. This didn't matter for __cxa_guard_acquire because the compiler emits an inline fast path before calling __cxa_guard_acquire. That is, once we call __cxa_guard_acquire at all, we know we're in the contended case.
https://reviews.llvm.org/D107359 then unified the __cxa_guard_acquire and pthread_once interceptors, adding the hooks to pthread_once. However, unlike __cxa_guard_acquire, pthread_once callers are not expected to have an inline fast path. The fast path is inside the function.
As a result, TSan unnecessarily calls into the cooperative multi-threading engine on every pthread_once call, despite applications generally expecting pthread_once to be fast after initialization. Fix this by deferring the hooks to the contended case inside guard_acquire.