Skip to content

Conversation

@SchrodingerZhu
Copy link
Contributor

@SchrodingerZhu SchrodingerZhu commented Mar 31, 2025

On top of #133641

@llvmbot llvmbot added the libc label Mar 31, 2025
@llvmbot
Copy link
Member

llvmbot commented Mar 31, 2025

@llvm/pr-subscribers-libc

Author: Schrodinger ZHU Yifan (SchrodingerZhu)

Changes

On top of #133641


Full diff: https://github.com/llvm/llvm-project/pull/133729.diff

12 Files Affected:

  • (modified) libc/config/config.json (+4)
  • (modified) libc/docs/configure.rst (+1)
  • (modified) libc/src/__support/threads/linux/CMakeLists.txt (+7)
  • (modified) libc/src/__support/threads/linux/thread.cpp (+9)
  • (modified) libc/src/__support/threads/thread.cpp (+5)
  • (modified) libc/src/__support/threads/thread.h (+4-2)
  • (modified) libc/src/stdlib/CMakeLists.txt (+8)
  • (modified) libc/src/stdlib/atexit.cpp (+9)
  • (modified) libc/src/stdlib/exit.cpp (+4)
  • (modified) libc/test/integration/src/__support/threads/CMakeLists.txt (+21)
  • (added) libc/test/integration/src/__support/threads/double_exit_test.cpp (+23)
  • (added) libc/test/integration/src/__support/threads/main_exit_test.cpp (+30)
diff --git a/libc/config/config.json b/libc/config/config.json
index d738aade74427..b2688f1b29309 100644
--- a/libc/config/config.json
+++ b/libc/config/config.json
@@ -89,6 +89,10 @@
     "LIBC_CONF_RWLOCK_DEFAULT_SPIN_COUNT": {
       "value": 100,
       "doc": "Default number of spins before blocking if a rwlock is in contention (default to 100)."
+    },
+    "LIBC_CONF_ENABLE_MALLOC_THREAD_CLEANUP": {
+      "value": false,
+      "doc": "Enable the `_malloc_thread_cleanup` weak symbol. When defined, this is function is called after `__cxa` and pthread-specific dtors. On main thread, this will be called after `atexit` functions and `.fini` dtors, right before TLS tearing down. This function can be overridden by allocators to perform cleanup. Allocators can use this symbol to avoid registering thread dtors using potentially reentrant routines."
     }
   },
   "math": {
diff --git a/libc/docs/configure.rst b/libc/docs/configure.rst
index dee9a63101eb9..182d373c075f6 100644
--- a/libc/docs/configure.rst
+++ b/libc/docs/configure.rst
@@ -47,6 +47,7 @@ to learn about the defaults for your platform and target.
     - ``LIBC_CONF_PRINTF_FLOAT_TO_STR_USE_MEGA_LONG_DOUBLE_TABLE``: Use large table for better printf long double performance.
     - ``LIBC_CONF_PRINTF_RUNTIME_DISPATCH``: Use dynamic dispatch for the output mechanism to reduce code size.
 * **"pthread" options**
+    - ``LIBC_CONF_ENABLE_MALLOC_THREAD_CLEANUP``: Enable the `_malloc_thread_cleanup` weak symbol. When defined, this is function is called after `__cxa` and pthread-specific dtors. On main thread, this will be called after `atexit` functions and `.fini` dtors, right before TLS tearing down. This function can be overridden by allocators to perform cleanup. Allocators can use this symbol to avoid registering thread dtors using potentially reentrant routines.
     - ``LIBC_CONF_RAW_MUTEX_DEFAULT_SPIN_COUNT``: Default number of spins before blocking if a mutex is in contention (default to 100).
     - ``LIBC_CONF_RWLOCK_DEFAULT_SPIN_COUNT``: Default number of spins before blocking if a rwlock is in contention (default to 100).
     - ``LIBC_CONF_TIMEOUT_ENSURE_MONOTONICITY``: Automatically adjust timeout to CLOCK_MONOTONIC (default to true). POSIX API may require CLOCK_REALTIME, which can be unstable and leading to unexpected behavior. This option will convert the real-time timestamp to monotonic timestamp relative to the time of call.
diff --git a/libc/src/__support/threads/linux/CMakeLists.txt b/libc/src/__support/threads/linux/CMakeLists.txt
index 364e7e2b90585..3e7c16afe0f6e 100644
--- a/libc/src/__support/threads/linux/CMakeLists.txt
+++ b/libc/src/__support/threads/linux/CMakeLists.txt
@@ -71,6 +71,12 @@ add_header_library(
     libc.src.__support.threads.mutex_common
 )
 
+if (LIBC_CONF_ENABLE_MALLOC_THREAD_CLEANUP)
+  set(malloc_cleanup_flags -DLIBC_COPT_ENABLE_MALLOC_THREAD_CLEANUP)
+else()
+  set(malloc_cleanup_flags)
+endif()
+
 add_object_library(
   thread
   SRCS
@@ -89,6 +95,7 @@ add_object_library(
     libc.src.__support.threads.thread_common
   COMPILE_OPTIONS
     ${libc_opt_high_flag}
+    ${malloc_cleanup_flags}
     -fno-omit-frame-pointer # This allows us to sniff out the thread args from
                             # the new thread's stack reliably.
     -Wno-frame-address      # Yes, calling __builtin_return_address with a
diff --git a/libc/src/__support/threads/linux/thread.cpp b/libc/src/__support/threads/linux/thread.cpp
index c531d74c53355..2d6d4e517064d 100644
--- a/libc/src/__support/threads/linux/thread.cpp
+++ b/libc/src/__support/threads/linux/thread.cpp
@@ -482,6 +482,10 @@ int Thread::get_name(cpp::StringStream &name) const {
   return 0;
 }
 
+#ifdef LIBC_COPT_ENABLE_MALLOC_THREAD_CLEANUP
+extern "C" [[gnu::weak]] void _malloc_thread_cleanup();
+#endif // LIBC_COPT_ENABLE_MALLOC_THREAD_CLEANUP
+
 void thread_exit(ThreadReturnValue retval, ThreadStyle style) {
   auto attrib = self.attrib;
 
@@ -494,6 +498,11 @@ void thread_exit(ThreadReturnValue retval, ThreadStyle style) {
   // different thread. The destructors of thread local and TSS objects should
   // be called by the thread which owns them.
   internal::call_atexit_callbacks(attrib);
+#ifdef LIBC_COPT_ENABLE_MALLOC_THREAD_CLEANUP
+  // call _malloc_thread_cleanup after the atexit callbacks
+  if (_malloc_thread_cleanup)
+    _malloc_thread_cleanup();
+#endif // LIBC_COPT_ENABLE_MALLOC_THREAD_CLEANUP
 
   uint32_t joinable_state = uint32_t(DetachState::JOINABLE);
   if (!attrib->detach_state.compare_exchange_strong(
diff --git a/libc/src/__support/threads/thread.cpp b/libc/src/__support/threads/thread.cpp
index 6f6b75be5766d..c7135596622c6 100644
--- a/libc/src/__support/threads/thread.cpp
+++ b/libc/src/__support/threads/thread.cpp
@@ -154,6 +154,9 @@ ThreadAtExitCallbackMgr *get_thread_atexit_callback_mgr() {
 }
 
 void call_atexit_callbacks(ThreadAttributes *attrib) {
+  if (attrib->dtors_called)
+    return;
+  attrib->dtors_called = true;
   attrib->atexit_callback_mgr->call();
   for (size_t i = 0; i < TSS_KEY_COUNT; ++i) {
     TSSValueUnit &unit = tss_values[i];
@@ -163,6 +166,8 @@ void call_atexit_callbacks(ThreadAttributes *attrib) {
   }
 }
 
+extern "C" void __cxa_thread_finalize() { call_atexit_callbacks(self.attrib); }
+
 } // namespace internal
 
 cpp::optional<unsigned int> new_tss_key(TSSDtor *dtor) {
diff --git a/libc/src/__support/threads/thread.h b/libc/src/__support/threads/thread.h
index f2b1f6bbb253d..f7710fde2c70d 100644
--- a/libc/src/__support/threads/thread.h
+++ b/libc/src/__support/threads/thread.h
@@ -109,12 +109,14 @@ struct alignas(STACK_ALIGNMENT) ThreadAttributes {
   ThreadReturnValue retval;
   ThreadAtExitCallbackMgr *atexit_callback_mgr;
   void *platform_data;
+  bool dtors_called;
 
-  constexpr ThreadAttributes()
+  LIBC_INLINE constexpr ThreadAttributes()
       : detach_state(uint32_t(DetachState::DETACHED)), stack(nullptr),
         stacksize(0), guardsize(0), tls(0), tls_size(0), owned_stack(false),
         tid(-1), style(ThreadStyle::POSIX), retval(),
-        atexit_callback_mgr(nullptr), platform_data(nullptr) {}
+        atexit_callback_mgr(nullptr), platform_data(nullptr),
+        dtors_called(false) {}
 };
 
 using TSSDtor = void(void *);
diff --git a/libc/src/stdlib/CMakeLists.txt b/libc/src/stdlib/CMakeLists.txt
index 74ae864f72e23..7dd0c969cf9b2 100644
--- a/libc/src/stdlib/CMakeLists.txt
+++ b/libc/src/stdlib/CMakeLists.txt
@@ -589,6 +589,12 @@ add_header_library(
 )
 endif()
 
+if (LIBC_CONF_ENABLE_MALLOC_THREAD_CLEANUP)
+  set(malloc_cleanup_flags -DLIBC_COPT_ENABLE_MALLOC_THREAD_CLEANUP)
+else()
+  set(malloc_cleanup_flags)
+endif()
+
 add_entrypoint_object(
   atexit
   SRCS
@@ -599,6 +605,8 @@ add_entrypoint_object(
     20 # For constinit
   DEPENDS
     .exit_handler
+  COMPILE_OPTIONS
+    ${malloc_cleanup_flags}
 )
 
 add_entrypoint_object(
diff --git a/libc/src/stdlib/atexit.cpp b/libc/src/stdlib/atexit.cpp
index 799aad136bda5..ac375edc98257 100644
--- a/libc/src/stdlib/atexit.cpp
+++ b/libc/src/stdlib/atexit.cpp
@@ -18,6 +18,10 @@ constinit ExitCallbackList atexit_callbacks;
 Mutex handler_list_mtx(false, false, false, false);
 [[gnu::weak]] extern void teardown_main_tls();
 
+#ifdef LIBC_COPT_ENABLE_MALLOC_THREAD_CLEANUP
+extern "C" [[gnu::weak]] void _malloc_thread_cleanup();
+#endif // LIBC_COPT_ENABLE_MALLOC_THREAD_CLEANUP
+
 extern "C" {
 
 int __cxa_atexit(AtExitCallback *callback, void *payload, void *) {
@@ -27,6 +31,11 @@ int __cxa_atexit(AtExitCallback *callback, void *payload, void *) {
 void __cxa_finalize(void *dso) {
   if (!dso) {
     call_exit_callbacks(atexit_callbacks);
+#ifdef LIBC_COPT_ENABLE_MALLOC_THREAD_CLEANUP
+    // clean up malloc TLS before TLS teardown.
+    if (_malloc_thread_cleanup)
+      _malloc_thread_cleanup();
+#endif // LIBC_COPT_ENABLE_MALLOC_THREAD_CLEANUP
     if (teardown_main_tls)
       teardown_main_tls();
   }
diff --git a/libc/src/stdlib/exit.cpp b/libc/src/stdlib/exit.cpp
index 28a6f8a63c0c6..097a52339e5e8 100644
--- a/libc/src/stdlib/exit.cpp
+++ b/libc/src/stdlib/exit.cpp
@@ -14,8 +14,12 @@
 namespace LIBC_NAMESPACE_DECL {
 
 extern "C" void __cxa_finalize(void *);
+extern "C" [[gnu::weak]] void __cxa_thread_finalize();
 
+// TODO: use recursive mutex to protect this routine.
 [[noreturn]] LLVM_LIBC_FUNCTION(void, exit, (int status)) {
+  if (__cxa_thread_finalize)
+    __cxa_thread_finalize();
   __cxa_finalize(nullptr);
   internal::exit(status);
 }
diff --git a/libc/test/integration/src/__support/threads/CMakeLists.txt b/libc/test/integration/src/__support/threads/CMakeLists.txt
index 5a12d28ada3fd..40e96681b1207 100644
--- a/libc/test/integration/src/__support/threads/CMakeLists.txt
+++ b/libc/test/integration/src/__support/threads/CMakeLists.txt
@@ -25,3 +25,24 @@ add_integration_test(
   DEPENDS
     libc.src.__support.threads.thread
 )
+
+add_integration_test(
+  main_exit_test
+  SUITE
+    libc-support-threads-integration-tests
+  SRCS
+    main_exit_test.cpp
+  DEPENDS
+    libc.src.__support.threads.thread
+)
+
+add_integration_test(
+  double_exit_test
+  SUITE
+    libc-support-threads-integration-tests
+  SRCS
+    double_exit_test.cpp
+  DEPENDS
+    libc.src.__support.threads.thread
+    libc.src.stdlib.exit
+)
diff --git a/libc/test/integration/src/__support/threads/double_exit_test.cpp b/libc/test/integration/src/__support/threads/double_exit_test.cpp
new file mode 100644
index 0000000000000..e4a163644a970
--- /dev/null
+++ b/libc/test/integration/src/__support/threads/double_exit_test.cpp
@@ -0,0 +1,23 @@
+//===-- Test handling of thread local data --------------------------------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#include "src/__support/threads/thread.h"
+#include "src/stdlib/exit.h"
+#include "test/IntegrationTest/test.h"
+
+extern "C" {
+[[gnu::weak]]
+void *__dso_handle = nullptr;
+int __cxa_thread_atexit_impl(void (*func)(void *), void *arg, void *dso);
+}
+
+TEST_MAIN() {
+  __cxa_thread_atexit_impl([](void *) { LIBC_NAMESPACE::exit(0); }, nullptr,
+                           __dso_handle);
+  return 0;
+}
diff --git a/libc/test/integration/src/__support/threads/main_exit_test.cpp b/libc/test/integration/src/__support/threads/main_exit_test.cpp
new file mode 100644
index 0000000000000..c90e4e569cfba
--- /dev/null
+++ b/libc/test/integration/src/__support/threads/main_exit_test.cpp
@@ -0,0 +1,30 @@
+//===-- Test handling of thread local data --------------------------------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#include "src/__support/threads/thread.h"
+#include "test/IntegrationTest/test.h"
+
+bool called = false;
+
+extern "C" {
+[[gnu::weak]]
+void *__dso_handle = nullptr;
+int __cxa_thread_atexit_impl(void (*func)(void *), void *arg, void *dso);
+}
+
+[[gnu::destructor]]
+void destructor() {
+  if (!called)
+    __builtin_trap();
+}
+
+TEST_MAIN() {
+  __cxa_thread_atexit_impl([](void *) { called = true; }, nullptr,
+                           __dso_handle);
+  return 0;
+}

mizvekov and others added 21 commits March 31, 2025 10:46
…#129781)

This is a follow-up patch of
llvm#125756

In this PR, static-data-splitter pass produces the aggregated profile
counts of constants for constant pools in a global state
(`StateDataProfileInfo`), and asm printer consumes the profile counts to
produce `.hot` or `.unlikely` prefixes.

This implementation covers both x86 and aarch64 asm printer.
…3561)

Closes llvm#132696

before the patch like this:
```
----------------------------
*** Reducing GlobalObjects...
----------------------------
*** Reducing GV Initializers...
----------------------------
*** Reducing GlobalVariables...
----------------------------
```
after the patch like this:
```
----------------------------
*** Reducing GlobalObjects (global-objects)...
----------------------------
*** Reducing GV Initializers (global-initializers)...
----------------------------
*** Reducing GlobalVariables (global-variables)...
----------------------------
```
This macro is only ever used inside the definiton for the various
visibility macros on windows. There, it's defined in multiple places
with different expansions, which makes it more confusing than helpful
when trying to figure out what macro expands to what.
This reverts a single file from ad1ba15.
llvm::append_range in this context fails to compile with recent Clang and
libc++:

libcxx/include/__algorithm/copy_backward.h:221:68: error: invalid operands to
binary expression ('llvm::SuccIterator<llvm::Instruction, llvm::BasicBlock>'
and 'long')
...
llvm-project/llvm/lib/Target/X86/X86WinEHState.cpp:724:11: note: in
instantiation of function template specialization
'llvm::append_range<std::deque<llvm::BasicBlock *>,
llvm::iterator_range<llvm::SuccIterator<llvm::Instruction, llvm::BasicBlock
>>>' requested here
  724 |     llvm::append_range(Worklist, successors(BB));
      |           ^
…nfo (llvm#133637)

Query the correct TTI for the current target instead of constructing
some random default one. Also query the pass manager for
ProfileSummaryInfo.
This should only change the printing, not the actual result.
AUIPCTarget as a relocatable expression cannot have a SubSym or
@-specifier.
Remove unused declarations after llvm#132569.
Simplify some code as we no longer use MCSymbolRefExpr::VariantKind.
…llvm#68997)

The issue is caused by [D133860](https://reviews.llvm.org/D133860).
The guard would be inserted in wrong place in some cases, like the test
case showed below.
This patch fixed the issue by using `isInTailCallPosition()` to verify
whether the tail call is in right position.
…`modernize-` checks (llvm#133525)

Improved "options" sections of `bugprone-` and `modernize-` checks:

1. Added `Options` keyword to be a delimiter between "body" and
"options" parts of docs
2. Added default values where was absent.
3. Improved readability of some default values by converting `1` to
`true`.
This PR add `DenseMap::insert_range` to `DenseMap` for consistency with
existing `DenseSet::insert_range`, `SmallSet::insert_range` and
`std::map::insert_range`.
Similar to previous migration done for all other ELF targets.
Switch from the confusing `VariantKind` to `Specifier`, which aligns
with Arm and IBM AIX's documentation.

Moving forward, relocation specifiers should be integrated into
AMDGPUMCExpr rather than MCSymbolRefExpr::SubclassData.

(Note: the term AMDGPUMCExpr::VariantKind is for expressions
without relocation specifiers:
llvm#82022

It's up to AMDGPU maintainers to integrate these constants into Specifier.
)

Pull Request: llvm#133608
fschlimb and others added 20 commits March 31, 2025 10:46
Lowering of mpi.all_reduce to LLVM function call
Required for mingw-w64, which uses the alias attribute in its CRT.

Follows ARM64EC mangling rules by mangling the alias symbol and emitting
an unmangled anti-dependency alias. Since metadata is not allowed on
GlobalAlias objects, extend arm64ec_unmangled_name to support multiple
unmangled names and attach the alias anti-dependency name to the target
function's metadata.
…llvm#121156)

The patch splits the store-load forwarding distance analysis from other
dependency analysis in LAA. Currently it supports only power-of-2
distances, required to support non-power-of-2 distances in future.

Part of llvm#100755
…lvm#132174)

I recently added a new option to update_test_checks.py that can
filter out all CHECK lines after a certain point. We usually don't
care about checking for the original scalar loop after the vector
loop because it doesn't change. Cutting out unnecessary CHECK
lines makes the files smaller and hopefully the tests run quicker.
After f4ec179, AbsImm is no longer signed and cannot be < 0.
…CT_SUBVECTOR(INSERT_SUBVECTOR()) -> BITCAST fold (llvm#133695)

Always allow later folds to try to match as well.
before

```Verilog
wait fork
  ;
  wait fork
    ;
    wait fork
      ;
```

after

```Verilog
wait fork;
wait fork;
wait fork;
```

The `wait fork` statement should not start a block. Previously the
formatter treated the `fork` part as the start of a new block. Now the
problem is fixed.
…2941)

There is some code to make sure that C++ keywords that are identifiers
in the other languages are not treated as keywords.  Right now, the kind
is set to identifier, and the identifier info is cleared.  The latter is
probably so that the code for identifying C++ structures does not
recognize those structures by mistake when formatting a language that
does not have those structures.  But we did not find an instance where
the language can have the sequence of tokens, the code tries to parse
the structure as if it is C++ using the identifier info instead of the
token kind, but without checking for the language setting.  However,
there are places where the code checks whether the identifier info field
is null or not.  They are places where an identifier and a keyword are
treated the same way.  For example, the name of a function in
JavaScript.  This patch removes the lines that clear the identifier
info.  This way, a C++ keyword gets treated in the same way as an
identifier in those places.

JavaScript

New

```JavaScript
async function
union(
    myparamnameiswaytooloooong) {
}
```

Old

```JavaScript
async function
    union(
        myparamnameiswaytooloooong) {
}
```

Java

New

```Java
enum union { ABC, CDE }
```

Old

```Java
enum
union { ABC, CDE }
```
Before:
```
offset of on non-POD type
```
After:
```
offsetof on non-POD type
```

---------

Co-authored-by: Aaron Ballman <[email protected]>
For example for the following situation:
  %6:gpr = SLLI %2:gpr, 2
  %7:gpr = ADDI killed %6:gpr, 24
  %8:gpr = ADD %0:gpr, %7:gpr

If we swap the two add instrucions we can merge the shift and add. The
final code will look something like this:
  %7 = SH2ADD %0, %2
  %8 = ADDI %7, 24
… and m[no-]evex512 (llvm#132542)

The 256-bit maximum vector register size control was removed from AVX10
whitepaper, ref: https://cdrdv2.intel.com/v1/dl/getContent/784343

- Re-target m[no-]avx10.1 to enable AVX10.1 with 512-bit maximum vector
register size;
- Emit warning for mavx10.x-256, noting AVX10/256 is not supported;
- Emit warning for mavx10.x-512, noting to use m[no-]avx10.x instead;
- Emit warning for m[no-]evex512, noting AVX10/256 is not supported;

This patch only changes Clang driver behavior. The features
avx10.x-256/512 keep unchanged and will be removed in the next release.
Implement hypot for Float16 along with tests.
…#128918)

We only emits v_mov_b32/64_dpp. Don't combine t16 instructions with mov
dpp. Update the test inputs to be legal.

It is future work to emit v_mov_b16_dpp, and then update GCNDPPCombine
to combine it with the 16-bit instructions.
This turns on the unnecessary-virtual-specifier warning in general, but
disables it when building LLVM. It also tweaks the warning description
to be slightly more accurate.

Background: I've been working on cleaning up this warning in two
codebases: LLVM and chromium (plus its dependencies). The chromium
cleanup has been straightforward. Git archaeology shows that there are
two reasons for the warnings: classes to which `final` was added after
they were initially committed, and classes with virtual destructors that
nobody remarks on. Presumably the latter case is because people are just
very used to destructors being virtual.

The LLVM cleanup was more surprising: I discovered that we have an [old
policy](https://llvm.org/docs/CodingStandards.html#provide-a-virtual-method-anchor-for-classes-in-headers)
about including out-of-line virtual functions in every class with a
vtable, even `final` ones. This means our codebase has many virtual
"anchor" functions which do nothing except control where the vtable is
emitted, and which trigger the warning. I looked into alternatives to
satisfy the policy, such as using destructors instead of introducing a
new function, but it wasn't clear if they had larger implications.

Overall, it seems like the warning is genuinely useful in most codebases
(evidenced by chromium and its dependencies), and LLVM is an unusual
case. Therefore we should enable the warning by default, and turn it off
only for LLVM builds.
This adds DWARF generation for fixed-point types. This feature is needed
by Ada.

Note that a pre-existing GNU extension is used in one case. This has
been emitted by GCC for years, and is needed because standard DWARF is
otherwise incapable of representing these types.
@SchrodingerZhu
Copy link
Contributor Author

history clash

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.