-
Notifications
You must be signed in to change notification settings - Fork 688
Build pthreadpool with hidden visibility on Apple #14838
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/14838
Note: Links to docs will display an error until the docs builds have been completed. ❗ 1 Active SEVsThere are 1 currently active SEVs. If your PR is affected, please view them below: ❌ 1 New Failure, 1 Cancelled Job, 1 Unrelated FailureAs of commit eba8e39 with merge base bba9d26 ( NEW FAILURE - The following job has failed:
CANCELLED JOB - The following job was cancelled. Please retry:
BROKEN TRUNK - The following job failed but were present on the merge base:👉 Rebase onto the `viable/strict` branch to avoid these failures
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
Samsung model failure is due to running on a fork. Demo app build is broken on trunk, so CI is otherwise green on this PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In
threadpool_.reset(pthreadpool_create(new_thread_count)); |
What are your thoughts on adding this logging to see if the old and new threadpool address space is drastically different.
+ void* old_pool = threadpool_.get();
threadpool_.reset(pthreadpool_create(new_thread_count));
+ ET_LOG(Info, "Reset threadpool from %p to %p (size %u)", old_pool, threadpool_.get(), new_thread_count);
what does that help with? |
Since the google and upstream pthreadpool use the same allocator for the threadpool state, I wouldn't expect to be able to extract much from the addresses. We should probably have some logging though, so I'll add some. Thanks. |
644a021
to
3dec676
Compare
3dec676
to
8e76199
Compare
… (#14842) This reverts commit 750cba7. Re-applying the better threadpool size defaults from #14090 with the fix from #14838. This gives a 2-4x speedup for many models and platforms (I measured 4x speedup on M1 with MobileNet V3 + XNNPACK). On high core count server platforms (doing evals, for example), this can give a 100x speedup out of box.
Clever! |
@pytorchbot cherry-pick --onto release/1.0 -c critical |
### Summary We are seeing pthreadpool-related crashes on Mac when running with pybindings. This appears to be due to XNNPACK using the Google fork of pthreadpool and extension/threadpool using the pthreadpool in libtorch_cpu. See #14321 for more details. Beyond the obvious one definition rule issues, the specific failure happens because the pthreadpool functions in the copy of pthreadpool built with ET are marked as weak on Apple platforms. The functions are not marked as weak in source code or in the build, and the behavior appears to be specific to Apple's toolchain. Weak symbols are compiled as indirect calls and can be overridden at runtime by strong symbols in another dylib. For reasons that I don't fully understand, the pthreadpool symbols in libtorch_cpu are strong. Also, the calls in XNNPACK prefer the symbols from the local pthreadpool This PR works around the issue by building pthreadpool with -fvisibility=hidden, which causes the symbols to not be exposed in the final dylib, and thus not end up in the symbol table as an indirect symbol. Instead, the call to pthreadpool_create in extension_threadpool is compiled as a direct call to the pthreadpool_create in the pthreadpool built by executorch. This isn't a proper fix for the issue, as there are still two pthreadpool implementations in the process whenever we link libtorch_cpu. However, it does appear to mitigate the symptoms and thus prevent crashes. Long-term, we'll need to find a proper solution, such as namespacing the pthreadpool fork. ### Test plan In addition to validating this change on CI (including trunk CI), I manually verified the fix by testing the repro in #14321 before and after the change. I verified that ASan does not trip upon resetting the threadpool. I also verified with `nm` and `otool` that `pthreadpool_create` does not show up in the indirect symbol table, and thus cannot (to my knowledge) be overridden at runtime by the implementation in libtorch_cpu. (cherry picked from commit 1da530d)
Cherry picking #14838The cherry pick PR is at #14951 and it is recommended to link a critical cherry pick PR with an issue. The following tracker issues are updated: Details for Dev Infra teamRaised by workflow job |
Summary
We are seeing pthreadpool-related crashes on Mac when running with pybindings. This appears to be due to XNNPACK using the Google fork of pthreadpool and extension/threadpool using the pthreadpool in libtorch_cpu. See #14321 for more details.
Beyond the obvious one definition rule issues, the specific failure happens because the pthreadpool functions in the copy of pthreadpool built with ET are marked as weak on Apple platforms. The functions are not marked as weak in source code or in the build, and the behavior appears to be specific to Apple's toolchain.
Weak symbols are compiled as indirect calls and can be overridden at runtime by strong symbols in another dylib. For reasons that I don't fully understand, the pthreadpool symbols in libtorch_cpu are strong. Also, the calls in XNNPACK prefer the symbols from the local pthreadpool
This PR works around the issue by building pthreadpool with -fvisibility=hidden, which causes the symbols to not be exposed in the final dylib, and thus not end up in the symbol table as an indirect symbol. Instead, the call to pthreadpool_create in extension_threadpool is compiled as a direct call to the pthreadpool_create in the pthreadpool built by executorch.
This isn't a proper fix for the issue, as there are still two pthreadpool implementations in the process whenever we link libtorch_cpu. However, it does appear to mitigate the symptoms and thus prevent crashes. Long-term, we'll need to find a proper solution, such as namespacing the pthreadpool fork.
Test plan
In addition to validating this change on CI (including trunk CI), I manually verified the fix by testing the repro in #14321 before and after the change. I verified that ASan does not trip upon resetting the threadpool. I also verified with
nm
andotool
thatpthreadpool_create
does not show up in the indirect symbol table, and thus cannot (to my knowledge) be overridden at runtime by the implementation in libtorch_cpu.