-
Notifications
You must be signed in to change notification settings - Fork 3.5k
Delay destruction of mutexes #26204
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Delay destruction of mutexes #26204
Conversation
Move function-local static mutexes to file-scope statics. This is to delay the destruction of mutexes, as in libc++ the destructor is non-trivial. When a function-local static variable with a non-trivial destructor is defined, the compiler generates code to register its destruction at program exit. This can lead to order of destruction issues, especially in a multi-threaded environment. By moving the mutex to a file-scope static, we ensure that it is initialized at program startup and its destructor is called at program exit, but in a more controlled manner. This avoids potential race conditions and other issues related to the order of destruction of static variables.
4ab9ac7
to
8ba0fce
Compare
Will get a macOS machine and test this change. |
I don't know what problem to fix ... |
#endif | ||
|
||
#ifndef ORT_CONSTINIT | ||
#define ORT_CONSTINIT |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, this might work, but we need to resolve this non C++20 case because w/o constinit static mutexes are worse than local function statics.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently only macOS platform has this problem, and macOS build is using C++20.
I will continue to work on upgrading all pipelines to use C++20. I would prefer to get this PR merged before that work is done, since a lot of users are waiting for it. The constinit keyword is a sanitize check, which should not impact functionality.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of an empty definition could we instead use #ifdef's in the files? If constinit is available, use that with an #ifdef around the file scope declaration. Otherwise have an #ifdef around the existing function scope declaration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then it will have very different behavior on different platforms, since function local statics are deallocated earlier than global vars. Then it will increase the complexity further.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The ideal way to deal with it is to move the mutex into the structure it is trying to protect and not to have it static. I realize it may not be possible in every case, but I can see it is possible in some cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For example, in onnxruntime/core/providers/shared_library/provider_bridge_provider.cc, the lifetime of the mutex is shorter than the object that it protects. So, clearly the current code is wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then it will have very different behavior on different platforms, since function local statics are deallocated earlier than global vars. Then it will increase the complexity further.
Isn't the problem we're trying to address limited to macOS or is that not the case?
For macOS we have C++20. If we use the ifdef's the current behavior on non-C++20 builds is unchanged, and we migrate to constinit automatically as soon as we build with C++20 which should be the safe long term solution.
Is that not more predictable than changing function scope mutexes to file scope for non-C++20 builds?
Abandoned. |
If anyone has a better fix, welcome to propose it. |
Move function-local static mutexes to file-scope statics.
This is to delay the destruction of mutexes, as in libc++ the destructor is non-trivial.
When a function-local static variable with a non-trivial destructor is defined, the compiler generates code to register its destruction at program exit. This can lead to order of destruction issues. By moving the mutex to a file-scope static, we ensure that it is initialized at program startup and its destructor is called at program exit, but in a more controlled manner. This avoids potential race conditions and other issues related to the order of destruction of static variables. The construction and destruction of std::mutex do not have any dependency other than the standard C++ runtime, therefore this change is safe.
Previously we made the mutexes function local because before VS 17.10 std::mutex's constructor was not constexpr(they cannot be initialized at compile time). So, it would cause constructor order problems if we didn't make them function local.
This PR replaces #25770 .