Build pthreadpool with hidden visibility on Apple #14838

GregoryComer · 2025-10-07T00:02:05Z

Summary

We are seeing pthreadpool-related crashes on Mac when running with pybindings. This appears to be due to XNNPACK using the Google fork of pthreadpool and extension/threadpool using the pthreadpool in libtorch_cpu. See #14321 for more details.

Beyond the obvious one definition rule issues, the specific failure happens because the pthreadpool functions in the copy of pthreadpool built with ET are marked as weak on Apple platforms. The functions are not marked as weak in source code or in the build, and the behavior appears to be specific to Apple's toolchain.

Weak symbols are compiled as indirect calls and can be overridden at runtime by strong symbols in another dylib. For reasons that I don't fully understand, the pthreadpool symbols in libtorch_cpu are strong. Also, the calls in XNNPACK prefer the symbols from the local pthreadpool

This PR works around the issue by building pthreadpool with -fvisibility=hidden, which causes the symbols to not be exposed in the final dylib, and thus not end up in the symbol table as an indirect symbol. Instead, the call to pthreadpool_create in extension_threadpool is compiled as a direct call to the pthreadpool_create in the pthreadpool built by executorch.

This isn't a proper fix for the issue, as there are still two pthreadpool implementations in the process whenever we link libtorch_cpu. However, it does appear to mitigate the symptoms and thus prevent crashes. Long-term, we'll need to find a proper solution, such as namespacing the pthreadpool fork.

Test plan

In addition to validating this change on CI (including trunk CI), I manually verified the fix by testing the repro in #14321 before and after the change. I verified that ASan does not trip upon resetting the threadpool. I also verified with nm and otool that pthreadpool_create does not show up in the indirect symbol table, and thus cannot (to my knowledge) be overridden at runtime by the implementation in libtorch_cpu.

pytorch-bot · 2025-10-07T00:02:10Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/14838

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

Driver update on H100 and A100 instances

❌ 1 New Failure, 1 Cancelled Job, 1 Unrelated Failure

As of commit eba8e39 with merge base bba9d26 ():

NEW FAILURE - The following job has failed:

pull / test-samsung-models-linux / linux-job (gh)
RuntimeError: Command docker exec -t 00f4b3709d2192ad4bde02b776186ec38694dd4b628fef74fa5543644b75b34e /exec failed with exit code 1

CANCELLED JOB - The following job was cancelled. Please retry:

pull / unittest-buck / macos / macos-job (gh)

BROKEN TRUNK - The following job failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

Apple / build-demo-ios / macos-job (gh) (trunk failure)
RuntimeError: Command bash /Users/runner/work/_temp/exec_script failed with exit code 74

This comment was automatically generated by Dr. CI and updates every 15 minutes.

GregoryComer · 2025-10-07T04:54:59Z

Samsung model failure is due to running on a fork. Demo app build is broken on trunk, so CI is otherwise green on this PR.

mergennachin

In

executorch/extension/threadpool/threadpool.cpp

Line 54 in 1b8d380

threadpool_.reset(pthreadpool_create(new_thread_count));

What are your thoughts on adding this logging to see if the old and new threadpool address space is drastically different.

+  void* old_pool = threadpool_.get();
  threadpool_.reset(pthreadpool_create(new_thread_count));
+  ET_LOG(Info, "Reset threadpool from %p to %p (size %u)", old_pool, threadpool_.get(), new_thread_count);

kimishpatel · 2025-10-07T14:07:29Z

In

executorch/extension/threadpool/threadpool.cpp

Line 54 in 1b8d380

threadpool_.reset(pthreadpool_create(new_thread_count));

What are your thoughts on adding this logging to see if the old and new threadpool address space is drastically different.
+  void* old_pool = threadpool_.get();
  threadpool_.reset(pthreadpool_create(new_thread_count));
+  ET_LOG(Info, "Reset threadpool from %p to %p (size %u)", old_pool, threadpool_.get(), new_thread_count);

what does that help with?

GregoryComer · 2025-10-07T21:12:29Z

In

executorch/extension/threadpool/threadpool.cpp

Line 54 in 1b8d380

threadpool_.reset(pthreadpool_create(new_thread_count));

What are your thoughts on adding this logging to see if the old and new threadpool address space is drastically different.
+  void* old_pool = threadpool_.get();
  threadpool_.reset(pthreadpool_create(new_thread_count));
+  ET_LOG(Info, "Reset threadpool from %p to %p (size %u)", old_pool, threadpool_.get(), new_thread_count);

Since the google and upstream pthreadpool use the same allocator for the threadpool state, I wouldn't expect to be able to extract much from the addresses. We should probably have some logging though, so I'll add some. Thanks.

… (#14842) This reverts commit 750cba7. Re-applying the better threadpool size defaults from #14090 with the fix from #14838. This gives a 2-4x speedup for many models and platforms (I measured 4x speedup on M1 with MobileNet V3 + XNNPACK). On high core count server platforms (doing evals, for example), this can give a 100x speedup out of box.

digantdesai · 2025-10-09T12:23:30Z

by building pthreadpool with -fvisibility=hidden

Clever!

GregoryComer · 2025-10-09T16:53:43Z

@pytorchbot cherry-pick --onto release/1.0 -c critical

### Summary We are seeing pthreadpool-related crashes on Mac when running with pybindings. This appears to be due to XNNPACK using the Google fork of pthreadpool and extension/threadpool using the pthreadpool in libtorch_cpu. See #14321 for more details. Beyond the obvious one definition rule issues, the specific failure happens because the pthreadpool functions in the copy of pthreadpool built with ET are marked as weak on Apple platforms. The functions are not marked as weak in source code or in the build, and the behavior appears to be specific to Apple's toolchain. Weak symbols are compiled as indirect calls and can be overridden at runtime by strong symbols in another dylib. For reasons that I don't fully understand, the pthreadpool symbols in libtorch_cpu are strong. Also, the calls in XNNPACK prefer the symbols from the local pthreadpool This PR works around the issue by building pthreadpool with -fvisibility=hidden, which causes the symbols to not be exposed in the final dylib, and thus not end up in the symbol table as an indirect symbol. Instead, the call to pthreadpool_create in extension_threadpool is compiled as a direct call to the pthreadpool_create in the pthreadpool built by executorch. This isn't a proper fix for the issue, as there are still two pthreadpool implementations in the process whenever we link libtorch_cpu. However, it does appear to mitigate the symptoms and thus prevent crashes. Long-term, we'll need to find a proper solution, such as namespacing the pthreadpool fork. ### Test plan In addition to validating this change on CI (including trunk CI), I manually verified the fix by testing the repro in #14321 before and after the change. I verified that ASan does not trip upon resetting the threadpool. I also verified with `nm` and `otool` that `pthreadpool_create` does not show up in the indirect symbol table, and thus cannot (to my knowledge) be overridden at runtime by the implementation in libtorch_cpu. (cherry picked from commit 1da530d)

pytorchbot · 2025-10-09T16:56:53Z

Cherry picking #14838

The cherry pick PR is at #14951 and it is recommended to link a critical cherry pick PR with an issue. The following tracker issues are updated:

[v1.0.0] Release Tracker #14288 (comment)

Details for Dev Infra team

Raised by workflow job

Build pthreadpool with hidden visibility on Apple

99bd291

meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Oct 7, 2025

GregoryComer requested review from digantdesai, kimishpatel and swolchok October 7, 2025 00:10

GregoryComer added release notes: none Do not include this in the release notes ciflow/trunk labels Oct 7, 2025

GregoryComer mentioned this pull request Oct 7, 2025

Out-of-bounds access in pthreadpool when reducing threadpool size on macos #14321

Closed

GregoryComer requested a review from mergennachin October 7, 2025 00:11

GregoryComer mentioned this pull request Oct 7, 2025

Reapply "Add EXECUTORCH_THREADPOOL_SIZE options, default to u… (#14307) #14842

Merged

GregoryComer marked this pull request as ready for review October 7, 2025 04:55

GregoryComer requested review from kirklandsign and larryliu0820 as code owners October 7, 2025 04:55

mergennachin requested a review from metascroy October 7, 2025 13:37

mergennachin approved these changes Oct 7, 2025

View reviewed changes

GregoryComer force-pushed the fix-apple-pthreadpool branch from 644a021 to 3dec676 Compare October 7, 2025 21:21

Merge branch 'main' into fix-apple-pthreadpool

8e76199

GregoryComer force-pushed the fix-apple-pthreadpool branch from 3dec676 to 8e76199 Compare October 7, 2025 21:39

Merge branch 'main' into fix-apple-pthreadpool

eba8e39

GregoryComer merged commit 1da530d into pytorch:main Oct 8, 2025
360 of 373 checks passed

pytorchbot mentioned this pull request Oct 9, 2025

[v1.0.0] Release Tracker #14288

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Build pthreadpool with hidden visibility on Apple #14838

Build pthreadpool with hidden visibility on Apple #14838

Uh oh!

GregoryComer commented Oct 7, 2025 •

edited

Loading

Uh oh!

pytorch-bot bot commented Oct 7, 2025 •

edited

Loading

Uh oh!

GregoryComer commented Oct 7, 2025

Uh oh!

mergennachin left a comment

Uh oh!

kimishpatel commented Oct 7, 2025

Uh oh!

GregoryComer commented Oct 7, 2025 •

edited

Loading

Uh oh!

Uh oh!

digantdesai commented Oct 9, 2025

Uh oh!

GregoryComer commented Oct 9, 2025

Uh oh!

pytorchbot commented Oct 9, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Build pthreadpool with hidden visibility on Apple #14838

Build pthreadpool with hidden visibility on Apple #14838

Uh oh!

Conversation

GregoryComer commented Oct 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Uh oh!

pytorch-bot bot commented Oct 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/14838

❗ 1 Active SEVs

❌ 1 New Failure, 1 Cancelled Job, 1 Unrelated Failure

Uh oh!

GregoryComer commented Oct 7, 2025

Uh oh!

mergennachin left a comment

Choose a reason for hiding this comment

Uh oh!

kimishpatel commented Oct 7, 2025

Uh oh!

GregoryComer commented Oct 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

digantdesai commented Oct 9, 2025

Uh oh!

GregoryComer commented Oct 9, 2025

Uh oh!

pytorchbot commented Oct 9, 2025

Cherry picking #14838

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

GregoryComer commented Oct 7, 2025 •

edited

Loading

pytorch-bot bot commented Oct 7, 2025 •

edited

Loading

GregoryComer commented Oct 7, 2025 •

edited

Loading