-
Notifications
You must be signed in to change notification settings - Fork 15.1k
[libc][patch 2/n] provide _malloc_thread_cleanup option
#133729
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
SchrodingerZhu
wants to merge
77
commits into
llvm:main
from
SchrodingerZhu:libc/malloc-thread-cleanup
Closed
[libc][patch 2/n] provide _malloc_thread_cleanup option
#133729
SchrodingerZhu
wants to merge
77
commits into
llvm:main
from
SchrodingerZhu:libc/malloc-thread-cleanup
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Member
|
@llvm/pr-subscribers-libc Author: Schrodinger ZHU Yifan (SchrodingerZhu) ChangesOn top of #133641 Full diff: https://github.com/llvm/llvm-project/pull/133729.diff 12 Files Affected:
diff --git a/libc/config/config.json b/libc/config/config.json
index d738aade74427..b2688f1b29309 100644
--- a/libc/config/config.json
+++ b/libc/config/config.json
@@ -89,6 +89,10 @@
"LIBC_CONF_RWLOCK_DEFAULT_SPIN_COUNT": {
"value": 100,
"doc": "Default number of spins before blocking if a rwlock is in contention (default to 100)."
+ },
+ "LIBC_CONF_ENABLE_MALLOC_THREAD_CLEANUP": {
+ "value": false,
+ "doc": "Enable the `_malloc_thread_cleanup` weak symbol. When defined, this is function is called after `__cxa` and pthread-specific dtors. On main thread, this will be called after `atexit` functions and `.fini` dtors, right before TLS tearing down. This function can be overridden by allocators to perform cleanup. Allocators can use this symbol to avoid registering thread dtors using potentially reentrant routines."
}
},
"math": {
diff --git a/libc/docs/configure.rst b/libc/docs/configure.rst
index dee9a63101eb9..182d373c075f6 100644
--- a/libc/docs/configure.rst
+++ b/libc/docs/configure.rst
@@ -47,6 +47,7 @@ to learn about the defaults for your platform and target.
- ``LIBC_CONF_PRINTF_FLOAT_TO_STR_USE_MEGA_LONG_DOUBLE_TABLE``: Use large table for better printf long double performance.
- ``LIBC_CONF_PRINTF_RUNTIME_DISPATCH``: Use dynamic dispatch for the output mechanism to reduce code size.
* **"pthread" options**
+ - ``LIBC_CONF_ENABLE_MALLOC_THREAD_CLEANUP``: Enable the `_malloc_thread_cleanup` weak symbol. When defined, this is function is called after `__cxa` and pthread-specific dtors. On main thread, this will be called after `atexit` functions and `.fini` dtors, right before TLS tearing down. This function can be overridden by allocators to perform cleanup. Allocators can use this symbol to avoid registering thread dtors using potentially reentrant routines.
- ``LIBC_CONF_RAW_MUTEX_DEFAULT_SPIN_COUNT``: Default number of spins before blocking if a mutex is in contention (default to 100).
- ``LIBC_CONF_RWLOCK_DEFAULT_SPIN_COUNT``: Default number of spins before blocking if a rwlock is in contention (default to 100).
- ``LIBC_CONF_TIMEOUT_ENSURE_MONOTONICITY``: Automatically adjust timeout to CLOCK_MONOTONIC (default to true). POSIX API may require CLOCK_REALTIME, which can be unstable and leading to unexpected behavior. This option will convert the real-time timestamp to monotonic timestamp relative to the time of call.
diff --git a/libc/src/__support/threads/linux/CMakeLists.txt b/libc/src/__support/threads/linux/CMakeLists.txt
index 364e7e2b90585..3e7c16afe0f6e 100644
--- a/libc/src/__support/threads/linux/CMakeLists.txt
+++ b/libc/src/__support/threads/linux/CMakeLists.txt
@@ -71,6 +71,12 @@ add_header_library(
libc.src.__support.threads.mutex_common
)
+if (LIBC_CONF_ENABLE_MALLOC_THREAD_CLEANUP)
+ set(malloc_cleanup_flags -DLIBC_COPT_ENABLE_MALLOC_THREAD_CLEANUP)
+else()
+ set(malloc_cleanup_flags)
+endif()
+
add_object_library(
thread
SRCS
@@ -89,6 +95,7 @@ add_object_library(
libc.src.__support.threads.thread_common
COMPILE_OPTIONS
${libc_opt_high_flag}
+ ${malloc_cleanup_flags}
-fno-omit-frame-pointer # This allows us to sniff out the thread args from
# the new thread's stack reliably.
-Wno-frame-address # Yes, calling __builtin_return_address with a
diff --git a/libc/src/__support/threads/linux/thread.cpp b/libc/src/__support/threads/linux/thread.cpp
index c531d74c53355..2d6d4e517064d 100644
--- a/libc/src/__support/threads/linux/thread.cpp
+++ b/libc/src/__support/threads/linux/thread.cpp
@@ -482,6 +482,10 @@ int Thread::get_name(cpp::StringStream &name) const {
return 0;
}
+#ifdef LIBC_COPT_ENABLE_MALLOC_THREAD_CLEANUP
+extern "C" [[gnu::weak]] void _malloc_thread_cleanup();
+#endif // LIBC_COPT_ENABLE_MALLOC_THREAD_CLEANUP
+
void thread_exit(ThreadReturnValue retval, ThreadStyle style) {
auto attrib = self.attrib;
@@ -494,6 +498,11 @@ void thread_exit(ThreadReturnValue retval, ThreadStyle style) {
// different thread. The destructors of thread local and TSS objects should
// be called by the thread which owns them.
internal::call_atexit_callbacks(attrib);
+#ifdef LIBC_COPT_ENABLE_MALLOC_THREAD_CLEANUP
+ // call _malloc_thread_cleanup after the atexit callbacks
+ if (_malloc_thread_cleanup)
+ _malloc_thread_cleanup();
+#endif // LIBC_COPT_ENABLE_MALLOC_THREAD_CLEANUP
uint32_t joinable_state = uint32_t(DetachState::JOINABLE);
if (!attrib->detach_state.compare_exchange_strong(
diff --git a/libc/src/__support/threads/thread.cpp b/libc/src/__support/threads/thread.cpp
index 6f6b75be5766d..c7135596622c6 100644
--- a/libc/src/__support/threads/thread.cpp
+++ b/libc/src/__support/threads/thread.cpp
@@ -154,6 +154,9 @@ ThreadAtExitCallbackMgr *get_thread_atexit_callback_mgr() {
}
void call_atexit_callbacks(ThreadAttributes *attrib) {
+ if (attrib->dtors_called)
+ return;
+ attrib->dtors_called = true;
attrib->atexit_callback_mgr->call();
for (size_t i = 0; i < TSS_KEY_COUNT; ++i) {
TSSValueUnit &unit = tss_values[i];
@@ -163,6 +166,8 @@ void call_atexit_callbacks(ThreadAttributes *attrib) {
}
}
+extern "C" void __cxa_thread_finalize() { call_atexit_callbacks(self.attrib); }
+
} // namespace internal
cpp::optional<unsigned int> new_tss_key(TSSDtor *dtor) {
diff --git a/libc/src/__support/threads/thread.h b/libc/src/__support/threads/thread.h
index f2b1f6bbb253d..f7710fde2c70d 100644
--- a/libc/src/__support/threads/thread.h
+++ b/libc/src/__support/threads/thread.h
@@ -109,12 +109,14 @@ struct alignas(STACK_ALIGNMENT) ThreadAttributes {
ThreadReturnValue retval;
ThreadAtExitCallbackMgr *atexit_callback_mgr;
void *platform_data;
+ bool dtors_called;
- constexpr ThreadAttributes()
+ LIBC_INLINE constexpr ThreadAttributes()
: detach_state(uint32_t(DetachState::DETACHED)), stack(nullptr),
stacksize(0), guardsize(0), tls(0), tls_size(0), owned_stack(false),
tid(-1), style(ThreadStyle::POSIX), retval(),
- atexit_callback_mgr(nullptr), platform_data(nullptr) {}
+ atexit_callback_mgr(nullptr), platform_data(nullptr),
+ dtors_called(false) {}
};
using TSSDtor = void(void *);
diff --git a/libc/src/stdlib/CMakeLists.txt b/libc/src/stdlib/CMakeLists.txt
index 74ae864f72e23..7dd0c969cf9b2 100644
--- a/libc/src/stdlib/CMakeLists.txt
+++ b/libc/src/stdlib/CMakeLists.txt
@@ -589,6 +589,12 @@ add_header_library(
)
endif()
+if (LIBC_CONF_ENABLE_MALLOC_THREAD_CLEANUP)
+ set(malloc_cleanup_flags -DLIBC_COPT_ENABLE_MALLOC_THREAD_CLEANUP)
+else()
+ set(malloc_cleanup_flags)
+endif()
+
add_entrypoint_object(
atexit
SRCS
@@ -599,6 +605,8 @@ add_entrypoint_object(
20 # For constinit
DEPENDS
.exit_handler
+ COMPILE_OPTIONS
+ ${malloc_cleanup_flags}
)
add_entrypoint_object(
diff --git a/libc/src/stdlib/atexit.cpp b/libc/src/stdlib/atexit.cpp
index 799aad136bda5..ac375edc98257 100644
--- a/libc/src/stdlib/atexit.cpp
+++ b/libc/src/stdlib/atexit.cpp
@@ -18,6 +18,10 @@ constinit ExitCallbackList atexit_callbacks;
Mutex handler_list_mtx(false, false, false, false);
[[gnu::weak]] extern void teardown_main_tls();
+#ifdef LIBC_COPT_ENABLE_MALLOC_THREAD_CLEANUP
+extern "C" [[gnu::weak]] void _malloc_thread_cleanup();
+#endif // LIBC_COPT_ENABLE_MALLOC_THREAD_CLEANUP
+
extern "C" {
int __cxa_atexit(AtExitCallback *callback, void *payload, void *) {
@@ -27,6 +31,11 @@ int __cxa_atexit(AtExitCallback *callback, void *payload, void *) {
void __cxa_finalize(void *dso) {
if (!dso) {
call_exit_callbacks(atexit_callbacks);
+#ifdef LIBC_COPT_ENABLE_MALLOC_THREAD_CLEANUP
+ // clean up malloc TLS before TLS teardown.
+ if (_malloc_thread_cleanup)
+ _malloc_thread_cleanup();
+#endif // LIBC_COPT_ENABLE_MALLOC_THREAD_CLEANUP
if (teardown_main_tls)
teardown_main_tls();
}
diff --git a/libc/src/stdlib/exit.cpp b/libc/src/stdlib/exit.cpp
index 28a6f8a63c0c6..097a52339e5e8 100644
--- a/libc/src/stdlib/exit.cpp
+++ b/libc/src/stdlib/exit.cpp
@@ -14,8 +14,12 @@
namespace LIBC_NAMESPACE_DECL {
extern "C" void __cxa_finalize(void *);
+extern "C" [[gnu::weak]] void __cxa_thread_finalize();
+// TODO: use recursive mutex to protect this routine.
[[noreturn]] LLVM_LIBC_FUNCTION(void, exit, (int status)) {
+ if (__cxa_thread_finalize)
+ __cxa_thread_finalize();
__cxa_finalize(nullptr);
internal::exit(status);
}
diff --git a/libc/test/integration/src/__support/threads/CMakeLists.txt b/libc/test/integration/src/__support/threads/CMakeLists.txt
index 5a12d28ada3fd..40e96681b1207 100644
--- a/libc/test/integration/src/__support/threads/CMakeLists.txt
+++ b/libc/test/integration/src/__support/threads/CMakeLists.txt
@@ -25,3 +25,24 @@ add_integration_test(
DEPENDS
libc.src.__support.threads.thread
)
+
+add_integration_test(
+ main_exit_test
+ SUITE
+ libc-support-threads-integration-tests
+ SRCS
+ main_exit_test.cpp
+ DEPENDS
+ libc.src.__support.threads.thread
+)
+
+add_integration_test(
+ double_exit_test
+ SUITE
+ libc-support-threads-integration-tests
+ SRCS
+ double_exit_test.cpp
+ DEPENDS
+ libc.src.__support.threads.thread
+ libc.src.stdlib.exit
+)
diff --git a/libc/test/integration/src/__support/threads/double_exit_test.cpp b/libc/test/integration/src/__support/threads/double_exit_test.cpp
new file mode 100644
index 0000000000000..e4a163644a970
--- /dev/null
+++ b/libc/test/integration/src/__support/threads/double_exit_test.cpp
@@ -0,0 +1,23 @@
+//===-- Test handling of thread local data --------------------------------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#include "src/__support/threads/thread.h"
+#include "src/stdlib/exit.h"
+#include "test/IntegrationTest/test.h"
+
+extern "C" {
+[[gnu::weak]]
+void *__dso_handle = nullptr;
+int __cxa_thread_atexit_impl(void (*func)(void *), void *arg, void *dso);
+}
+
+TEST_MAIN() {
+ __cxa_thread_atexit_impl([](void *) { LIBC_NAMESPACE::exit(0); }, nullptr,
+ __dso_handle);
+ return 0;
+}
diff --git a/libc/test/integration/src/__support/threads/main_exit_test.cpp b/libc/test/integration/src/__support/threads/main_exit_test.cpp
new file mode 100644
index 0000000000000..c90e4e569cfba
--- /dev/null
+++ b/libc/test/integration/src/__support/threads/main_exit_test.cpp
@@ -0,0 +1,30 @@
+//===-- Test handling of thread local data --------------------------------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#include "src/__support/threads/thread.h"
+#include "test/IntegrationTest/test.h"
+
+bool called = false;
+
+extern "C" {
+[[gnu::weak]]
+void *__dso_handle = nullptr;
+int __cxa_thread_atexit_impl(void (*func)(void *), void *arg, void *dso);
+}
+
+[[gnu::destructor]]
+void destructor() {
+ if (!called)
+ __builtin_trap();
+}
+
+TEST_MAIN() {
+ __cxa_thread_atexit_impl([](void *) { called = true; }, nullptr,
+ __dso_handle);
+ return 0;
+}
|
…#129781) This is a follow-up patch of llvm#125756 In this PR, static-data-splitter pass produces the aggregated profile counts of constants for constant pools in a global state (`StateDataProfileInfo`), and asm printer consumes the profile counts to produce `.hot` or `.unlikely` prefixes. This implementation covers both x86 and aarch64 asm printer.
…3561) Closes llvm#132696 before the patch like this: ``` ---------------------------- *** Reducing GlobalObjects... ---------------------------- *** Reducing GV Initializers... ---------------------------- *** Reducing GlobalVariables... ---------------------------- ``` after the patch like this: ``` ---------------------------- *** Reducing GlobalObjects (global-objects)... ---------------------------- *** Reducing GV Initializers (global-initializers)... ---------------------------- *** Reducing GlobalVariables (global-variables)... ---------------------------- ```
This macro is only ever used inside the definiton for the various visibility macros on windows. There, it's defined in multiple places with different expansions, which makes it more confusing than helpful when trying to figure out what macro expands to what.
This reverts a single file from ad1ba15. llvm::append_range in this context fails to compile with recent Clang and libc++: libcxx/include/__algorithm/copy_backward.h:221:68: error: invalid operands to binary expression ('llvm::SuccIterator<llvm::Instruction, llvm::BasicBlock>' and 'long') ... llvm-project/llvm/lib/Target/X86/X86WinEHState.cpp:724:11: note: in instantiation of function template specialization 'llvm::append_range<std::deque<llvm::BasicBlock *>, llvm::iterator_range<llvm::SuccIterator<llvm::Instruction, llvm::BasicBlock >>>' requested here 724 | llvm::append_range(Worklist, successors(BB)); | ^
…nfo (llvm#133637) Query the correct TTI for the current target instead of constructing some random default one. Also query the pass manager for ProfileSummaryInfo. This should only change the printing, not the actual result.
AUIPCTarget as a relocatable expression cannot have a SubSym or @-specifier.
Remove unused declarations after llvm#132569. Simplify some code as we no longer use MCSymbolRefExpr::VariantKind.
…llvm#68997) The issue is caused by [D133860](https://reviews.llvm.org/D133860). The guard would be inserted in wrong place in some cases, like the test case showed below. This patch fixed the issue by using `isInTailCallPosition()` to verify whether the tail call is in right position.
…`modernize-` checks (llvm#133525) Improved "options" sections of `bugprone-` and `modernize-` checks: 1. Added `Options` keyword to be a delimiter between "body" and "options" parts of docs 2. Added default values where was absent. 3. Improved readability of some default values by converting `1` to `true`.
This PR add `DenseMap::insert_range` to `DenseMap` for consistency with existing `DenseSet::insert_range`, `SmallSet::insert_range` and `std::map::insert_range`.
Similar to previous migration done for all other ELF targets. Switch from the confusing `VariantKind` to `Specifier`, which aligns with Arm and IBM AIX's documentation. Moving forward, relocation specifiers should be integrated into AMDGPUMCExpr rather than MCSymbolRefExpr::SubclassData. (Note: the term AMDGPUMCExpr::VariantKind is for expressions without relocation specifiers: llvm#82022 It's up to AMDGPU maintainers to integrate these constants into Specifier. ) Pull Request: llvm#133608
Lowering of mpi.all_reduce to LLVM function call
Required for mingw-w64, which uses the alias attribute in its CRT. Follows ARM64EC mangling rules by mangling the alias symbol and emitting an unmangled anti-dependency alias. Since metadata is not allowed on GlobalAlias objects, extend arm64ec_unmangled_name to support multiple unmangled names and attach the alias anti-dependency name to the target function's metadata.
…llvm#121156) The patch splits the store-load forwarding distance analysis from other dependency analysis in LAA. Currently it supports only power-of-2 distances, required to support non-power-of-2 distances in future. Part of llvm#100755
…lvm#132174) I recently added a new option to update_test_checks.py that can filter out all CHECK lines after a certain point. We usually don't care about checking for the original scalar loop after the vector loop because it doesn't change. Cutting out unnecessary CHECK lines makes the files smaller and hopefully the tests run quicker.
…#133705) Reduces diff for an updated version of llvm#133083
After f4ec179, AbsImm is no longer signed and cannot be < 0.
…CT_SUBVECTOR(INSERT_SUBVECTOR()) -> BITCAST fold (llvm#133695) Always allow later folds to try to match as well.
before
```Verilog
wait fork
;
wait fork
;
wait fork
;
```
after
```Verilog
wait fork;
wait fork;
wait fork;
```
The `wait fork` statement should not start a block. Previously the
formatter treated the `fork` part as the start of a new block. Now the
problem is fixed.
…2941) There is some code to make sure that C++ keywords that are identifiers in the other languages are not treated as keywords. Right now, the kind is set to identifier, and the identifier info is cleared. The latter is probably so that the code for identifying C++ structures does not recognize those structures by mistake when formatting a language that does not have those structures. But we did not find an instance where the language can have the sequence of tokens, the code tries to parse the structure as if it is C++ using the identifier info instead of the token kind, but without checking for the language setting. However, there are places where the code checks whether the identifier info field is null or not. They are places where an identifier and a keyword are treated the same way. For example, the name of a function in JavaScript. This patch removes the lines that clear the identifier info. This way, a C++ keyword gets treated in the same way as an identifier in those places. JavaScript New ```JavaScript async function union( myparamnameiswaytooloooong) { } ``` Old ```JavaScript async function union( myparamnameiswaytooloooong) { } ``` Java New ```Java enum union { ABC, CDE } ``` Old ```Java enum union { ABC, CDE } ```
Before: ``` offset of on non-POD type ``` After: ``` offsetof on non-POD type ``` --------- Co-authored-by: Aaron Ballman <[email protected]>
…oadcast)) pattern identified in llvm#133083 Infinite loop check
For example for the following situation: %6:gpr = SLLI %2:gpr, 2 %7:gpr = ADDI killed %6:gpr, 24 %8:gpr = ADD %0:gpr, %7:gpr If we swap the two add instrucions we can merge the shift and add. The final code will look something like this: %7 = SH2ADD %0, %2 %8 = ADDI %7, 24
… and m[no-]evex512 (llvm#132542) The 256-bit maximum vector register size control was removed from AVX10 whitepaper, ref: https://cdrdv2.intel.com/v1/dl/getContent/784343 - Re-target m[no-]avx10.1 to enable AVX10.1 with 512-bit maximum vector register size; - Emit warning for mavx10.x-256, noting AVX10/256 is not supported; - Emit warning for mavx10.x-512, noting to use m[no-]avx10.x instead; - Emit warning for m[no-]evex512, noting AVX10/256 is not supported; This patch only changes Clang driver behavior. The features avx10.x-256/512 keep unchanged and will be removed in the next release.
Implement hypot for Float16 along with tests.
…#128918) We only emits v_mov_b32/64_dpp. Don't combine t16 instructions with mov dpp. Update the test inputs to be legal. It is future work to emit v_mov_b16_dpp, and then update GCNDPPCombine to combine it with the 16-bit instructions.
This turns on the unnecessary-virtual-specifier warning in general, but disables it when building LLVM. It also tweaks the warning description to be slightly more accurate. Background: I've been working on cleaning up this warning in two codebases: LLVM and chromium (plus its dependencies). The chromium cleanup has been straightforward. Git archaeology shows that there are two reasons for the warnings: classes to which `final` was added after they were initially committed, and classes with virtual destructors that nobody remarks on. Presumably the latter case is because people are just very used to destructors being virtual. The LLVM cleanup was more surprising: I discovered that we have an [old policy](https://llvm.org/docs/CodingStandards.html#provide-a-virtual-method-anchor-for-classes-in-headers) about including out-of-line virtual functions in every class with a vtable, even `final` ones. This means our codebase has many virtual "anchor" functions which do nothing except control where the vtable is emitted, and which trigger the warning. I looked into alternatives to satisfy the policy, such as using destructors instead of introducing a new function, but it wasn't clear if they had larger implications. Overall, it seems like the warning is genuinely useful in most codebases (evidenced by chromium and its dependencies), and LLVM is an unusual case. Therefore we should enable the warning by default, and turn it off only for LLVM builds.
This adds DWARF generation for fixed-point types. This feature is needed by Ada. Note that a pre-existing GNU extension is used in one case. This has been emitted by GCC for years, and is needed because standard DWARF is otherwise incapable of representing these types.
5b24a53 to
c4c11c3
Compare
Contributor
Author
|
history clash |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
On top of #133641