Skip to content

Commit b1ce3e9

Browse files
GregoryComerfacebook-github-bot
authored andcommitted
Add XNNPACK backend option for workspace sharing (re-land) (#13934)
Summary: **Note: This is a re-land, fixing a use after free which occurred when destroying a delegate instance. The executor is destroyed, which frees the workspace. The mutex that raii_lock points to is owned by the workspace. There is then a use after free when raii_lock goes out of scope. This is fixed by taking an owning reference to the workspace in destroy.** Add a backend option for XNNPACK to enable runtime control of workspace sharing. I've added 3 mode options - Disabled, PerModel, and Global. PerModel shares the workspace between all CALL_DELEGATE instances in a model, keyed by memory allocator address (see below). Global uses a single workspace instance. I've written the code to allow for the active workspace mode to be safely changed at any time. The workspace instance is resolved at delegate instance init time (model load) and is stored in the XNNExecutor instance. This design will also allow us to set per-model sharing options in the future. I've introduced a wrapper class (XNNWorkspace) to help with synchronization. With regard to the PerModel behavior, I am using the address of the runtime allocator to disambiguate the model. This is not ideal in the long-run, but there is some larger discussion around generating IDs in a coherent manner in multithreaded environments without synchronization in the core runtime. This might require PAL changes (exposing a thread ID, for example), so I intend to come back to this. It should be possible to transparently update this logic in the future. The program ID can collide or change without affecting correctness, but may increase memory (for collisions) or enforce extra synchronization (if unstable between delegate instances in a method). I'd like to add a PerMethod mode as a follow-up. This should be keyed to the specific method instance (not name), such that multiple method instances for the same method can be loaded for execution on different threads without forcing synchronization, but still allow sharing between call delegate instances in each method instance. This will require a unique method identifier. Reviewed By: digantdesai Differential Revision: D81647105
1 parent 07d1092 commit b1ce3e9

File tree

14 files changed

+999
-81
lines changed

14 files changed

+999
-81
lines changed

backends/test/multi_method_delegate_test.cpp

Lines changed: 61 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -5,13 +5,22 @@
55
#include <thread>
66
#include <vector>
77

8+
#include <executorch/backends/xnnpack/runtime/XNNPACKBackend.h>
9+
10+
#include <executorch/runtime/backend/interface.h>
11+
#include <executorch/runtime/backend/options.h>
812
#include <executorch/runtime/executor/program.h>
913
#include <executorch/runtime/platform/runtime.h>
1014

1115
#include <executorch/extension/data_loader/file_data_loader.h>
1216
#include <executorch/extension/memory_allocator/malloc_memory_allocator.h>
1317
#include <executorch/extension/runner_util/inputs.h>
1418

19+
using executorch::backends::xnnpack::workspace_sharing_mode_option_key;
20+
using executorch::backends::xnnpack::WorkspaceSharingMode;
21+
using executorch::backends::xnnpack::xnnpack_backend_key;
22+
23+
using executorch::runtime::BackendOptions;
1524
using executorch::runtime::Error;
1625
using executorch::runtime::EValue;
1726
using executorch::runtime::HierarchicalAllocator;
@@ -126,34 +135,61 @@ class XNNPACKMultiDelegateTest : public ETPTEMethodRunBaseTest {
126135
num_threads = 40;
127136
kMethodName = "forward";
128137
}
129-
};
130138

131-
// This test is to validate the assumption that the delegate is thread safe.
132-
// That includes the following:
133-
// 1. The delegate can be initilized by multiple threads in parallel.
134-
// 2. The delegate can be executed by multiple threads in parallel.
135-
// 3. The delegate can be destroyed by multiple threads in parallel.
136-
// Regardless of the underlying implementation of the delegate.
137-
// This is particularly important when we have shared resources across
138-
// delegate instances through a singleton backend instance.
139-
TEST_F(XNNPACKMultiDelegateTest, MultipleThreads) {
140-
ASSERT_NE(kTestPTE1Path.size(), 0);
141-
ASSERT_NE(kTestPTE2Path.size(), 0);
142-
ASSERT_NE(num_threads, 0);
143-
ASSERT_NE(kMethodName.size(), 0);
144-
145-
std::vector<std::thread> threads(num_threads);
146-
std::atomic<size_t> count{0};
147-
148-
for (int i = 0; i < num_threads; i++) {
149-
threads[i] = std::thread([&, i]() {
150-
run(i, i % 7 ? kTestPTE1Path : kTestPTE2Path, kMethodName, count);
151-
});
139+
// This test is to validate the assumption that the delegate is thread safe.
140+
// That includes the following:
141+
// 1. The delegate can be initilized by multiple threads in parallel.
142+
// 2. The delegate can be executed by multiple threads in parallel.
143+
// 3. The delegate can be destroyed by multiple threads in parallel.
144+
// Regardless of the underlying implementation of the delegate.
145+
// This is particularly important when we have shared resources across
146+
// delegate instances through a singleton backend instance.
147+
void runStressTest() {
148+
ASSERT_NE(kTestPTE1Path.size(), 0);
149+
ASSERT_NE(kTestPTE2Path.size(), 0);
150+
ASSERT_NE(num_threads, 0);
151+
ASSERT_NE(kMethodName.size(), 0);
152+
153+
std::vector<std::thread> threads(num_threads);
154+
std::atomic<size_t> count{0};
155+
156+
for (int i = 0; i < num_threads; i++) {
157+
threads[i] = std::thread([&, i]() {
158+
run(i, i % 7 ? kTestPTE1Path : kTestPTE2Path, kMethodName, count);
159+
});
160+
}
161+
for (int i = 0; i < num_threads; i++) {
162+
threads[i].join();
163+
}
164+
ASSERT_EQ(count, num_threads);
152165
}
153-
for (int i = 0; i < num_threads; i++) {
154-
threads[i].join();
166+
167+
void setWorkspaceSharingMode(WorkspaceSharingMode mode) {
168+
executorch::runtime::runtime_init();
169+
170+
BackendOptions<1> backend_options;
171+
backend_options.set_option(
172+
workspace_sharing_mode_option_key, static_cast<int>(mode));
173+
174+
auto status = executorch::runtime::set_option(
175+
xnnpack_backend_key, backend_options.view());
176+
ASSERT_EQ(status, Error::Ok);
155177
}
156-
ASSERT_EQ(count, num_threads);
178+
};
179+
180+
TEST_F(XNNPACKMultiDelegateTest, MultipleThreadsSharingDisabled) {
181+
setWorkspaceSharingMode(WorkspaceSharingMode::Disabled);
182+
runStressTest();
183+
}
184+
185+
TEST_F(XNNPACKMultiDelegateTest, MultipleThreadsPerModelSharing) {
186+
setWorkspaceSharingMode(WorkspaceSharingMode::PerModel);
187+
runStressTest();
188+
}
189+
190+
TEST_F(XNNPACKMultiDelegateTest, MultipleThreadsGlobalSharing) {
191+
setWorkspaceSharingMode(WorkspaceSharingMode::Global);
192+
runStressTest();
157193
}
158194

159195
// TODO(T208989291): Add more tests here. For example,

backends/xnnpack/runtime/XNNCompiler.cpp

Lines changed: 3 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -1895,24 +1895,16 @@ ET_NODISCARD Error XNNCompiler::compileModel(
18951895
xnn_weights_cache_t weights_cache_ptr = nullptr;
18961896
#endif
18971897

1898-
#ifdef ENABLE_XNNPACK_SHARED_WORKSPACE
1899-
ET_CHECK_OR_RETURN_ERROR(
1900-
workspace != nullptr, Internal, "Failed to initialize XNNPACK workspace");
1898+
// NOLINTBEGIN(facebook-hte-NullableDereference) - weights cache is allowed to
1899+
// be null
19011900
status = xnn_create_runtime_v4(
19021901
subgraph.get(),
19031902
weights_cache_ptr,
19041903
workspace,
19051904
::executorch::extension::threadpool::get_pthreadpool(),
19061905
runtime_flags,
19071906
&runtime_ptr);
1908-
#else
1909-
status = xnn_create_runtime_v3(
1910-
subgraph.get(),
1911-
weights_cache_ptr,
1912-
::executorch::extension::threadpool::get_pthreadpool(),
1913-
runtime_flags,
1914-
&runtime_ptr);
1915-
#endif
1907+
// NOLINTEND(facebook-hte-NullableDereference)
19161908

19171909
ET_CHECK_OR_RETURN_ERROR(
19181910
xnn_status_success == status,

backends/xnnpack/runtime/XNNExecutor.h

Lines changed: 8 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -9,13 +9,13 @@
99
#pragma once
1010

1111
#include <executorch/backends/xnnpack/runtime/XNNStatus.h>
12+
#include <executorch/backends/xnnpack/runtime/XNNWorkspace.h>
1213
#include <executorch/backends/xnnpack/runtime/profiling/XNNProfiler.h>
1314
#include <executorch/runtime/backend/interface.h>
1415
#include <executorch/runtime/core/error.h>
1516
#include <executorch/runtime/core/exec_aten/util/tensor_util.h>
1617

1718
#include <xnnpack.h>
18-
#include <map>
1919
#include <memory>
2020
#include <vector>
2121

@@ -35,9 +35,11 @@ class XNNExecutor {
3535
std::vector<uint32_t> output_ids_;
3636
std::vector<xnn_external_value> externals_;
3737
std::vector<std::string> packed_data_names_;
38+
std::shared_ptr<XNNWorkspace> workspace_;
3839

3940
public:
40-
XNNExecutor() = default;
41+
XNNExecutor(std::shared_ptr<XNNWorkspace> workspace)
42+
: workspace_(workspace) {}
4143

4244
inline size_t getNumInputs() {
4345
return input_ids_.size();
@@ -51,6 +53,10 @@ class XNNExecutor {
5153
return packed_data_names_;
5254
}
5355

56+
inline std::shared_ptr<XNNWorkspace> get_workspace() {
57+
return workspace_;
58+
}
59+
5460
/**
5561
* Initialize the XNNExecutor with a given runtime and input/output ids.
5662
* The input/output ids are expected to be sorted in order of their

0 commit comments

Comments
 (0)