Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 16 additions & 0 deletions include/onnxruntime/core/common/common.h
Original file line number Diff line number Diff line change
Expand Up @@ -66,6 +66,22 @@ using TimePoint = std::chrono::high_resolution_clock::time_point;
#define ORT_ATTRIBUTE_UNUSED
#endif

// ORT_CONSTINIT
//
// C++20 constinit keyword ensures that a variable is initialized at compile time
// and can be safely used in static initialization contexts
#if defined(__cpp_constinit) && __cpp_constinit >= 201907L
#define ORT_CONSTINIT constinit
#elif defined(__clang__) && defined(__has_cpp_attribute)
#if __has_cpp_attribute(clang::require_constant_initialization)
#define ORT_CONSTINIT [[clang::require_constant_initialization]]
#endif
#endif

#ifndef ORT_CONSTINIT
#define ORT_CONSTINIT
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, this might work, but we need to resolve this non C++20 case because w/o constinit static mutexes are worse than local function statics.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently only macOS platform has this problem, and macOS build is using C++20.
I will continue to work on upgrading all pipelines to use C++20. I would prefer to get this PR merged before that work is done, since a lot of users are waiting for it. The constinit keyword is a sanitize check, which should not impact functionality.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of an empty definition could we instead use #ifdef's in the files? If constinit is available, use that with an #ifdef around the file scope declaration. Otherwise have an #ifdef around the existing function scope declaration.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then it will have very different behavior on different platforms, since function local statics are deallocated earlier than global vars. Then it will increase the complexity further.

Copy link
Member

@yuslepukhin yuslepukhin Oct 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ideal way to deal with it is to move the mutex into the structure it is trying to protect and not to have it static. I realize it may not be possible in every case, but I can see it is possible in some cases.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For example, in onnxruntime/core/providers/shared_library/provider_bridge_provider.cc, the lifetime of the mutex is shorter than the object that it protects. So, clearly the current code is wrong.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then it will have very different behavior on different platforms, since function local statics are deallocated earlier than global vars. Then it will increase the complexity further.

Isn't the problem we're trying to address limited to macOS or is that not the case?

For macOS we have C++20. If we use the ifdef's the current behavior on non-C++20 builds is unchanged, and we migrate to constinit automatically as soon as we build with C++20 which should be the safe long term solution.

Is that not more predictable than changing function scope mutexes to file scope for non-C++20 builds?

#endif

#ifdef ORT_NO_EXCEPTIONS
// Print the given final message, the message must be a null terminated char*
// ORT will abort after printing the message.
Expand Down
5 changes: 3 additions & 2 deletions onnxruntime/core/common/logging/logging.cc
Original file line number Diff line number Diff line change
Expand Up @@ -64,9 +64,10 @@ LoggingManager* LoggingManager::GetDefaultInstance() {
#pragma warning(disable : 26426)
#endif

ORT_CONSTINIT static std::mutex default_logger_mutex;

static std::mutex& DefaultLoggerMutex() noexcept {
static std::mutex mutex;
return mutex;
return default_logger_mutex;
}

Logger* LoggingManager::s_default_logger_ = nullptr;
Expand Down
5 changes: 4 additions & 1 deletion onnxruntime/core/framework/model_metadef_id_generator.cc
Original file line number Diff line number Diff line change
Expand Up @@ -7,11 +7,14 @@
#include "core/framework/murmurhash3.h"

namespace onnxruntime {
namespace {
ORT_CONSTINIT static std::mutex mutex;
}

int ModelMetadefIdGenerator::GenerateId(const onnxruntime::GraphViewer& graph_viewer,
HashValue& model_hash) const {
// if the EP is shared across multiple sessions there's a very small potential for concurrency issues.
// use a lock when generating an id to be paranoid
static std::mutex mutex;
std::lock_guard<std::mutex> lock(mutex);
model_hash = 0;

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -290,9 +290,12 @@ TensorrtLogger& GetTensorrtLogger(bool verbose_log) {
return trt_logger;
}

namespace {
ORT_CONSTINIT std::mutex trt_api_lock;
} // namespace

std::unique_lock<std::mutex> NvExecutionProvider::GetApiLock() const {
static std::mutex singleton;
return std::unique_lock<std::mutex>(singleton);
return std::unique_lock<std::mutex>(trt_api_lock);
}

/*
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,11 @@
#endif

namespace onnxruntime {

namespace {
ORT_CONSTINIT static std::mutex trt_custom_op_mutex;
} // namespace

extern TensorrtLogger& GetTensorrtLogger(bool verbose);

/*
Expand All @@ -41,8 +46,7 @@ extern TensorrtLogger& GetTensorrtLogger(bool verbose);
common::Status CreateTensorRTCustomOpDomainList(std::vector<OrtCustomOpDomain*>& domain_list, const std::string extra_plugin_lib_paths) {
static std::unique_ptr<OrtCustomOpDomain> custom_op_domain = std::make_unique<OrtCustomOpDomain>();
static std::vector<std::unique_ptr<TensorRTCustomOp>> created_custom_op_list;
static std::mutex mutex;
std::lock_guard<std::mutex> lock(mutex);
std::lock_guard<std::mutex> lock(trt_custom_op_mutex);
if (custom_op_domain->domain_ != "" && custom_op_domain->custom_ops_.size() > 0) {
domain_list.push_back(custom_op_domain.get());
return Status::OK();
Expand Down
7 changes: 5 additions & 2 deletions onnxruntime/core/providers/qnn/builder/qnn_utils.cc
Original file line number Diff line number Diff line change
Expand Up @@ -790,16 +790,19 @@ Status GetQnnDataType(const bool is_quantized_tensor, const ONNX_NAMESPACE::Type
return Status::OK();
}

namespace {
ORT_CONSTINIT static std::mutex counter_mutex;
} // namespace

std::string GetUniqueName(const std::string& base, std::string_view suffix) {
std::string name = base;
if (!suffix.empty()) {
name += suffix;
}
{
static std::unordered_map<std::string, int> counter;
static std::mutex counter_mutex;
std::lock_guard<std::mutex> lock(counter_mutex);

static std::unordered_map<std::string, int> counter;
int& count = counter[name];
if (count++ > 0) {
return name + "_" + std::to_string(count);
Expand Down
8 changes: 6 additions & 2 deletions onnxruntime/core/providers/qnn/ort_api.cc
Original file line number Diff line number Diff line change
Expand Up @@ -10,11 +10,15 @@
namespace onnxruntime {

#if BUILD_QNN_EP_STATIC_LIB

namespace {
ORT_CONSTINIT static std::mutex run_on_unload_mutex;
} // namespace

static std::unique_ptr<std::vector<std::function<void()>>> s_run_on_unload_;

void RunOnUnload(std::function<void()> function) {
static std::mutex mutex;
std::lock_guard<std::mutex> guard(mutex);
std::lock_guard<std::mutex> guard(run_on_unload_mutex);
if (!s_run_on_unload_) {
s_run_on_unload_ = std::make_unique<std::vector<std::function<void()>>>();
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -81,6 +81,44 @@ void operator delete(void* p, size_t /*size*/) noexcept { return Provider_GetHos
#endif

namespace onnxruntime {

namespace {

class OnUnloadManager {
public:
~OnUnloadManager() {
std::unique_ptr<std::vector<std::function<void()>>> funcs;
{
std::lock_guard<std::mutex> guard(mutex_);
funcs = std::move(run_on_unload_functions_);
}

if (!funcs) {
return;
}

for (auto& function : *funcs) {
function();
}
}

void Add(std::function<void()> function) {
std::lock_guard<std::mutex> guard(mutex_);
if (!run_on_unload_functions_) {
run_on_unload_functions_ = std::make_unique<std::vector<std::function<void()>>>();
}
run_on_unload_functions_->push_back(std::move(function));
}

private:
std::mutex mutex_;
std::unique_ptr<std::vector<std::function<void()>>> run_on_unload_functions_;
};

static OnUnloadManager g_on_unload_manager;

} // namespace

#if defined(_MSC_VER) && !defined(__clang__)
#pragma warning(push)
// "Global initializer calls a non-constexpr function."
Expand All @@ -91,30 +129,11 @@ ProviderHostCPU& g_host_cpu = g_host->GetProviderHostCPU();
#if defined(_MSC_VER) && !defined(__clang__)
#pragma warning(pop)
#endif
static std::unique_ptr<std::vector<std::function<void()>>> s_run_on_unload_;

void RunOnUnload(std::function<void()> function) {
static std::mutex mutex;
std::lock_guard<std::mutex> guard{mutex};
if (!s_run_on_unload_)
s_run_on_unload_ = std::make_unique<std::vector<std::function<void()>>>();
s_run_on_unload_->push_back(std::move(function));
g_on_unload_manager.Add(std::move(function));
}

// This object is destroyed as part of the DLL unloading code and handles running all of the RunOnLoad functions
struct OnUnload {
~OnUnload() {
if (!s_run_on_unload_)
return;

for (auto& function : *s_run_on_unload_)
function();

s_run_on_unload_.reset();
}

} g_on_unload;

void* CPUAllocator::Alloc(size_t size) { return g_host->CPUAllocator__Alloc(this, size); }
void CPUAllocator::Free(void* p) { g_host->CPUAllocator__Free(this, p); }

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -452,9 +452,12 @@ TensorrtLogger& GetTensorrtLogger(bool verbose_log) {
return trt_logger;
}

namespace {
ORT_CONSTINIT std::mutex trt_api_lock;
} // namespace

std::unique_lock<std::mutex> TensorrtExecutionProvider::GetApiLock() const {
static std::mutex singleton;
return std::unique_lock<std::mutex>(singleton);
return std::unique_lock<std::mutex>(trt_api_lock);
}

/*
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,39 @@
#define ORT_DEF2STR(x) ORT_DEF2STR_HELPER(x)

namespace onnxruntime {

namespace {
class TrtCustomOpDomainStateLocker;

struct TrtCustomOpDomainState {
friend class TrtCustomOpDomainStateLocker;

private:
std::mutex mutex;

public:
std::unique_ptr<OrtCustomOpDomain> custom_op_domain{std::make_unique<OrtCustomOpDomain>()};

Check warning on line 41 in onnxruntime/core/providers/tensorrt/tensorrt_execution_provider_custom_ops.cc

View workflow job for this annotation

GitHub Actions / Optional Lint C++

[cpplint] reported by reviewdog 🐶 Add #include <memory> for unique_ptr<> [build/include_what_you_use] [4] Raw Output: onnxruntime/core/providers/tensorrt/tensorrt_execution_provider_custom_ops.cc:41: Add #include <memory> for unique_ptr<> [build/include_what_you_use] [4]
std::vector<std::unique_ptr<TensorRTCustomOp>> created_custom_op_list;

Check warning on line 42 in onnxruntime/core/providers/tensorrt/tensorrt_execution_provider_custom_ops.cc

View workflow job for this annotation

GitHub Actions / Optional Lint C++

[cpplint] reported by reviewdog 🐶 Add #include <vector> for vector<> [build/include_what_you_use] [4] Raw Output: onnxruntime/core/providers/tensorrt/tensorrt_execution_provider_custom_ops.cc:42: Add #include <vector> for vector<> [build/include_what_you_use] [4]
bool is_loaded{false};
};

class TrtCustomOpDomainStateLocker {
public:
explicit TrtCustomOpDomainStateLocker(TrtCustomOpDomainState& state) : state_{state}, lock_{state_.mutex} {}
TrtCustomOpDomainState* operator->() { return &state_; }
const TrtCustomOpDomainState* operator->() const { return &state_; }

private:
TrtCustomOpDomainState& state_;
std::lock_guard<std::mutex> lock_;
};

TrtCustomOpDomainState& GetTrtCustomOpDomainState() {
static TrtCustomOpDomainState state;
return state;
}
} // namespace

extern TensorrtLogger& GetTensorrtLogger(bool verbose);

/*
Expand All @@ -45,21 +78,17 @@
* So, TensorRTCustomOp uses variadic inputs/outputs to pass ONNX graph validation.
*/
common::Status CreateTensorRTCustomOpDomainList(std::vector<OrtCustomOpDomain*>& domain_list, const std::string extra_plugin_lib_paths) {
static std::unique_ptr<OrtCustomOpDomain> custom_op_domain = std::make_unique<OrtCustomOpDomain>();
static std::vector<std::unique_ptr<TensorRTCustomOp>> created_custom_op_list;
static std::mutex mutex;
std::lock_guard<std::mutex> lock(mutex);
if (custom_op_domain->domain_ != "" && custom_op_domain->custom_ops_.size() > 0) {
domain_list.push_back(custom_op_domain.get());
TrtCustomOpDomainStateLocker state(GetTrtCustomOpDomainState());
if (state->custom_op_domain->domain_ != "" && state->custom_op_domain->custom_ops_.size() > 0) {
domain_list.push_back(state->custom_op_domain.get());
return Status::OK();
}

// Load any extra TRT plugin library if any.
// When the TRT plugin library is loaded, the global static object is created and the plugin is registered to TRT registry.
// This is done through macro, for example, REGISTER_TENSORRT_PLUGIN(VisionTransformerPluginCreator).
// extra_plugin_lib_paths has the format of "path_1;path_2....;path_n"
static bool is_loaded = false;
if (!extra_plugin_lib_paths.empty() && !is_loaded) {
if (!extra_plugin_lib_paths.empty() && !state->is_loaded) {
std::stringstream extra_plugin_libs(extra_plugin_lib_paths);
std::string lib;
while (std::getline(extra_plugin_libs, lib, ';')) {
Expand All @@ -70,7 +99,7 @@
LOGS_DEFAULT(WARNING) << "[TensorRT EP]" << status.ToString();
}
}
is_loaded = true;
state->is_loaded = true;
}

try {
Expand Down Expand Up @@ -133,14 +162,14 @@
continue;
}

created_custom_op_list.push_back(std::make_unique<TensorRTCustomOp>(onnxruntime::kTensorrtExecutionProvider, nullptr)); // Make sure TensorRTCustomOp object won't be cleaned up
created_custom_op_list.back().get()->SetName(plugin_name);
custom_op_domain->custom_ops_.push_back(created_custom_op_list.back().get());
state->created_custom_op_list.push_back(std::make_unique<TensorRTCustomOp>(onnxruntime::kTensorrtExecutionProvider, nullptr)); // Make sure TensorRTCustomOp object won't be cleaned up
state->created_custom_op_list.back().get()->SetName(plugin_name);
state->custom_op_domain->custom_ops_.push_back(state->created_custom_op_list.back().get());
registered_plugin_names.insert(plugin_name);
}

custom_op_domain->domain_ = "trt.plugins";
domain_list.push_back(custom_op_domain.get());
state->custom_op_domain->domain_ = "trt.plugins";
domain_list.push_back(state->custom_op_domain.get());
} catch (const std::exception&) {
LOGS_DEFAULT(WARNING) << "[TensorRT EP] Failed to get TRT plugins from TRT plugin registration. Therefore, TRT EP can't create custom ops for TRT plugins";
}
Expand Down
Loading