Add XNNPACK backend option for workspace sharing #11748

GregoryComer · 2025-06-17T03:41:44Z

Summary

Add a backend option for XNNPACK to enable runtime control of workspace sharing. I've added 3 mode options - Disabled, PerModel, and Global. PerModel shares the workspace between all CALL_DELEGATE instances in a model, keyed by memory allocator address (see below). Global uses a single workspace instance.

I've written the code to allow for the active workspace mode to be safely changed at any time. The workspace instance is resolved at delegate instance init time (model load) and is stored in the XNNExecutor instance. This design will also allow us to set per-model sharing options in the future. I've introduced a wrapper class (XNNWorkspace) to help with synchronization.

With regard to the PerModel behavior, I am using the address of the runtime allocator to disambiguate the model. This is not ideal in the long-run, but there is some larger discussion around generating IDs in a coherent manner in multithreaded environments without synchronization in the core runtime. This might require PAL changes (exposing a thread ID, for example), so I intend to come back to this.

It should be possible to transparently update this logic in the future. The program ID can collide or change without affecting correctness, but may increase memory (for collisions) or enforce extra synchronization (if unstable between delegate instances in a method).

I'd like to add a PerMethod mode as a follow-up. This should be keyed to the specific method instance (not name), such that multiple method instances for the same method can be loaded for execution on different threads without forcing synchronization, but still allow sharing between call delegate instances in each method instance. This will require a unique method identifier.

Test plan

CI. I've also added a set of dedicated tests for getting/setting the option, running PTEs in each mode, switching modes at runtime, and I've also updated the multithreaded stress test to run in each mode.

pytorch-bot · 2025-06-17T03:41:47Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/11748

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

ROCm MI2xx CI/CD workflows failing due to : download from https://api.github.com/repos/pytorch/pytorch timed out.

❌ 1 New Failure, 1 Unrelated Failure

As of commit f7686a6 with merge base c3f8d64 ():

NEW FAILURE - The following job has failed:

Build documentation / build (buck2) / Build doc (gh)
At least one of the pre-conditions you specified did not hold

BROKEN TRUNK - The following job failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

pull / unittest-nxp-neutron / linux-job (gh) (trunk failure)
test_split_group_convolution__applied_by_default

This comment was automatically generated by Dr. CI and updates every 15 minutes.

facebook-github-bot · 2025-06-17T03:55:32Z

@GregoryComer has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot · 2025-06-17T06:07:59Z

@GregoryComer has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot · 2025-06-17T20:36:00Z

This pull request was exported from Phabricator. Differential Revision: D76789804

…#11748) Summary: Refactor the XNN backend workspace sharing logic to allow runtime gating. I've also added a temporary (marked experimental) API to enable workspace sharing. This will be replaced with backend options once available. Pull Request resolved: pytorch#11748 Test Plan: CI Rollback Plan: Differential Revision: D76789804 Pulled By: GregoryComer

digantdesai

Looks good. Please add a test at least to make sure we don't regress on the API and its semantics?

backends/xnnpack/runtime/XNNPACKBackend.cpp

backends/xnnpack/runtime/XNNPACKBackend.h

cccclai · 2025-06-25T22:10:34Z

backend options is landed btw

GregoryComer · 2025-06-25T22:17:03Z

backend options is landed btw

Thanks. I'll rework to use those.

backends/xnnpack/runtime/XNNCompiler.cpp

backends/xnnpack/runtime/XNNPACKBackend.cpp

Summary: Refactor the XNN backend workspace sharing logic to allow runtime gating. I've also added a temporary (marked experimental) API to enable workspace sharing. This will be replaced with backend options once available. Test Plan: CI Rollback Plan: Differential Revision: D76789804 Pulled By: GregoryComer

facebook-github-bot · 2025-07-09T19:55:08Z

This pull request was exported from Phabricator. Differential Revision: D76789804

Summary: Refactor the XNN backend workspace sharing logic to allow runtime gating using the backend option interface. Test Plan: CI Rollback Plan: Differential Revision: D76789804 Pulled By: GregoryComer

facebook-github-bot · 2025-07-31T00:53:42Z

This pull request was exported from Phabricator. Differential Revision: D76789804

Summary: Refactor the XNN backend workspace sharing logic to allow runtime gating using the backend option interface. Test Plan: CI Rollback Plan: Differential Revision: D76789804 Pulled By: GregoryComer

facebook-github-bot · 2025-07-31T00:57:05Z

@GregoryComer has imported this pull request. If you are a Meta employee, you can view this in D76789804.

GregoryComer · 2025-07-31T01:05:14Z

Changes are code complete. I'm waiting on full CI run, but this should be ready for review again.

CC @mcr229 @digantdesai

digantdesai

still reviewing but want to drop this before I get back to this

backends/xnnpack/runtime/XNNPACKBackend.cpp

digantdesai · 2025-07-31T12:18:42Z

backends/xnnpack/runtime/XNNPACKBackend.cpp

+          backend_options) override {
+    if (backend_options.size() > 0) {
+      for (const auto& option : backend_options) {
+        if (strcmp(option.key, xnnpack::workspace_sharing_mode_option_key) ==


feels like need a bit of restructuring. should we wrap xnnpack options in a wrapper which also implements setter and getter? so then this just becomes a look up and set on it.

Yeah, I was thinking about introducing an abstraction around backend options, but didn't want to bloat this PR. I can take this a follow-up, if no objection.

TODO(#issue) please

Filed as #13190. I added some brief comments in the issue, but it would be nice create a reusable abstraction that multiple backends can leverage. But I don't know if we can get it in the core runtime (space + embedded constraints), and I don't want to add it as an extension (I don't think it makes sense). Any thoughts?

backends/xnnpack/runtime/XNNPACKBackend.cpp

backends/xnnpack/runtime/XNNPACKBackend.h

digantdesai

Some more comments. I will work with you to make sure we can land this today

backends/xnnpack/test/runtime/test_workspace_sharing.cpp

digantdesai · 2025-08-01T15:22:14Z

backends/xnnpack/runtime/XNNWorkspace.h

+    return std::make_shared<XNNWorkspace>(
+        WorkspacePtr(workspace, &xnn_release_workspace));


I think this works well (1) for global when the ref count goes to 0, and (2) for per_model when the individual user ref count goes to 0. If we are switching mode, then when will these "duplicated" workspaces will get released? Should we explicitly detect a mode switch and release the old one? This will make switch cost high but peak memory low.

Alternative is to not allow switching after init :p

My intent is that workspaces get freed whenever all users are unloaded. Each executor instance holds a shared_ptr, and the top-level backend uses weak pointers to allow them to get cleaned up automatically. It would be nice to assert this behavior in a test. I might refactor the workspace management logic into a dedicated class to make this easier.

The global workspace is currently not released, but it probably should be. I'll update to hold a weak pointer and re-create when needed.

Updated to allow release and re-creation of the global workspace.

digantdesai · 2025-08-01T15:42:06Z

backends/xnnpack/runtime/XNNPACKBackend.cpp

+    std::scoped_lock<std::mutex> lock(workspace_meta_mutex_);
+
+    // Check for an existing (live) workspace for this program.
+    auto match = model_workspaces_.find(program_id);


Thinking out loud, worst case if we have collision in the program_id we will end up collobering memory during inference since two runtimes will go update the workspace.

Yeah, I don't love this, but it should be safe in all cases, at least. Either you end up using more memory than needed or it enforces extra synchronization. I'd like to push more on the method/program ID.

digantdesai · 2025-08-01T15:45:39Z

backends/xnnpack/test/runtime/test_workspace_sharing.cpp

+
+  for (auto i = 0; i < modes.size(); ++i) {
+    for (auto j = i + 1; j < modes.size(); ++j) {
+      run_and_validate_two_models(modes[i], modes[j]);


backends/xnnpack/runtime/XNNPACKBackend.cpp

Summary: Refactor the XNN backend workspace sharing logic to allow runtime gating using the backend option interface. Test Plan: CI Rollback Plan: Differential Revision: D76789804 Pulled By: GregoryComer

facebook-github-bot · 2025-08-07T17:47:49Z

@GregoryComer has imported this pull request. If you are a Meta employee, you can view this in D76789804.

Summary: Add a backend option for XNNPACK to enable runtime control of workspace sharing. I've added 3 mode options - Disabled, PerModel, and Global. PerModel shares the workspace between all CALL_DELEGATE instances in a model, keyed by memory allocator address (see below). Global uses a single workspace instance. I've written the code to allow for the active workspace mode to be safely changed at any time. The workspace instance is resolved at delegate instance init time (model load) and is stored in the XNNExecutor instance. This design will also allow us to set per-model sharing options in the future. I've introduced a wrapper class (XNNWorkspace) to help with synchronization. With regard to the PerModel behavior, I am using the address of the runtime allocator to disambiguate the model. This is not ideal in the long-run, but there is some larger discussion around generating IDs in a coherent manner in multithreaded environments without synchronization in the core runtime. This might require PAL changes (exposing a thread ID, for example), so I intend to come back to this. It should be possible to transparently update this logic in the future. The program ID can collide or change without affecting correctness, but may increase memory (for collisions) or enforce extra synchronization (if unstable between delegate instances in a method). I'd like to add a PerMethod mode as a follow-up. This should be keyed to the specific method instance (not name), such that multiple method instances for the same method can be loaded for execution on different threads without forcing synchronization, but still allow sharing between call delegate instances in each method instance. This will require a unique method identifier. Test Plan: CI. I've also added a set of dedicated tests for getting/setting the option, running PTEs in each mode, switching modes at runtime, and I've also updated the multithreaded stress test to run in each mode. Rollback Plan: Differential Revision: D76789804 Pulled By: GregoryComer

facebook-github-bot · 2025-08-07T21:02:04Z

This pull request was exported from Phabricator. Differential Revision: D76789804

Summary: Add a backend option for XNNPACK to enable runtime control of workspace sharing. I've added 3 mode options - Disabled, PerModel, and Global. PerModel shares the workspace between all CALL_DELEGATE instances in a model, keyed by memory allocator address (see below). Global uses a single workspace instance. I've written the code to allow for the active workspace mode to be safely changed at any time. The workspace instance is resolved at delegate instance init time (model load) and is stored in the XNNExecutor instance. This design will also allow us to set per-model sharing options in the future. I've introduced a wrapper class (XNNWorkspace) to help with synchronization. With regard to the PerModel behavior, I am using the address of the runtime allocator to disambiguate the model. This is not ideal in the long-run, but there is some larger discussion around generating IDs in a coherent manner in multithreaded environments without synchronization in the core runtime. This might require PAL changes (exposing a thread ID, for example), so I intend to come back to this. It should be possible to transparently update this logic in the future. The program ID can collide or change without affecting correctness, but may increase memory (for collisions) or enforce extra synchronization (if unstable between delegate instances in a method). I'd like to add a PerMethod mode as a follow-up. This should be keyed to the specific method instance (not name), such that multiple method instances for the same method can be loaded for execution on different threads without forcing synchronization, but still allow sharing between call delegate instances in each method instance. This will require a unique method identifier. Test Plan: CI. I've also added a set of dedicated tests for getting/setting the option, running PTEs in each mode, switching modes at runtime, and I've also updated the multithreaded stress test to run in each mode. Rollback Plan: Differential Revision: D76789804 Pulled By: GregoryComer

facebook-github-bot · 2025-08-07T21:20:30Z

This pull request was exported from Phabricator. Differential Revision: D76789804

digantdesai · 2025-08-19T11:22:41Z

backends/xnnpack/runtime/XNNWorkspaceManager.h

+  // A global workspace for all delegate instances, if global sharing is
+  // enabled. Lazy initialized. Stored as a weak pointer to allow automatic
+  // cleanup when all references are released.
+  mutable std::weak_ptr<XNNWorkspace> global_workspace_;


nit: can this be in model_workspaces_ with program_id<uintptr_t> == 0 or something, this will simplify get_or_create methods, and also other logic which treats global workspace different when its is just a shared program_id in reality.

digantdesai · 2025-08-19T11:32:51Z

backends/xnnpack/test/runtime/test_workspace_sharing.cpp

+    set_and_check_workspace_sharing_mode(*mode1);
+  }
+
+  Module mod1(std::getenv("ET_XNNPACK_GENERATED_ADD_LARGE_PTE_PATH"));


nit mod/mode/model/module :P

Suggested change

Module mod1(std::getenv("ET_XNNPACK_GENERATED_ADD_LARGE_PTE_PATH"));

Module module_for_pte_1(std::getenv("ET_XNNPACK_GENERATED_ADD_LARGE_PTE_PATH"));

digantdesai · 2025-08-19T11:36:46Z

backends/xnnpack/runtime/XNNPACKBackend.cpp

+
+    auto program_id =
+        reinterpret_cast<uintptr_t>(context.get_runtime_allocator());
+    auto workspace = ET_UNWRAP(get_or_create_workspace(program_id));


so for a given delegate once a workspace is created, there won't be any impact of subsequent mode changes in the process, until it is delegated right?

Yeah, the workspace is effectively "locked in" at the time of the delegate instance creation. So it should remain in that mode indefinitely. That seemed like the easiest way to handle the global mode option safely, though I'm open to suggestions.

digantdesai

Thanks @GregoryComer

Summary: Add a backend option for XNNPACK to enable runtime control of workspace sharing. I've added 3 mode options - Disabled, PerModel, and Global. PerModel shares the workspace between all CALL_DELEGATE instances in a model, keyed by memory allocator address (see below). Global uses a single workspace instance. I've written the code to allow for the active workspace mode to be safely changed at any time. The workspace instance is resolved at delegate instance init time (model load) and is stored in the XNNExecutor instance. This design will also allow us to set per-model sharing options in the future. I've introduced a wrapper class (XNNWorkspace) to help with synchronization. With regard to the PerModel behavior, I am using the address of the runtime allocator to disambiguate the model. This is not ideal in the long-run, but there is some larger discussion around generating IDs in a coherent manner in multithreaded environments without synchronization in the core runtime. This might require PAL changes (exposing a thread ID, for example), so I intend to come back to this. It should be possible to transparently update this logic in the future. The program ID can collide or change without affecting correctness, but may increase memory (for collisions) or enforce extra synchronization (if unstable between delegate instances in a method). I'd like to add a PerMethod mode as a follow-up. This should be keyed to the specific method instance (not name), such that multiple method instances for the same method can be loaded for execution on different threads without forcing synchronization, but still allow sharing between call delegate instances in each method instance. This will require a unique method identifier. Test Plan: CI. I've also added a set of dedicated tests for getting/setting the option, running PTEs in each mode, switching modes at runtime, and I've also updated the multithreaded stress test to run in each mode. Rollback Plan: Reviewed By: digantdesai Differential Revision: D76789804 Pulled By: GregoryComer

facebook-github-bot · 2025-09-02T17:48:39Z

This pull request was exported from Phabricator. Differential Revision: D76789804

…1748) Summary: **Note: This is a re-land, fixing a use after free which occurred when destroying a delegate instance. The executor is destroyed, which frees the workspace. The mutex that raii_lock points to is owned by the workspace. There is then a use after free when raii_lock goes out of scope. This is fixed by taking an owning reference to the workspace in destroy.** Add a backend option for XNNPACK to enable runtime control of workspace sharing. I've added 3 mode options - Disabled, PerModel, and Global. PerModel shares the workspace between all CALL_DELEGATE instances in a model, keyed by memory allocator address (see below). Global uses a single workspace instance. I've written the code to allow for the active workspace mode to be safely changed at any time. The workspace instance is resolved at delegate instance init time (model load) and is stored in the XNNExecutor instance. This design will also allow us to set per-model sharing options in the future. I've introduced a wrapper class (XNNWorkspace) to help with synchronization. With regard to the PerModel behavior, I am using the address of the runtime allocator to disambiguate the model. This is not ideal in the long-run, but there is some larger discussion around generating IDs in a coherent manner in multithreaded environments without synchronization in the core runtime. This might require PAL changes (exposing a thread ID, for example), so I intend to come back to this. It should be possible to transparently update this logic in the future. The program ID can collide or change without affecting correctness, but may increase memory (for collisions) or enforce extra synchronization (if unstable between delegate instances in a method). I'd like to add a PerMethod mode as a follow-up. This should be keyed to the specific method instance (not name), such that multiple method instances for the same method can be loaded for execution on different threads without forcing synchronization, but still allow sharing between call delegate instances in each method instance. This will require a unique method identifier. Test Plan: CI. I've also added a set of dedicated tests for getting/setting the option, running PTEs in each mode, switching modes at runtime, and I've also updated the multithreaded stress test to run in each mode. Rollback Plan: Differential Revision: D81646781 Pulled By: GregoryComer

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jun 17, 2025

GregoryComer force-pushed the xnnpack-runtime-workspace-sharing branch 2 times, most recently from f12f1ff to 0fd2a34 Compare June 17, 2025 03:44

GregoryComer added the release notes: none Do not include this in the release notes label Jun 17, 2025

GregoryComer force-pushed the xnnpack-runtime-workspace-sharing branch 3 times, most recently from 3ed0e10 to 3c08efe Compare June 17, 2025 05:45

facebook-github-bot added the fb-exported label Jun 17, 2025

GregoryComer force-pushed the xnnpack-runtime-workspace-sharing branch from 3c08efe to af3e5b7 Compare June 17, 2025 20:36

GregoryComer changed the title ~~(WIP) Refactor XNN workspace sharing to allow runtime gating~~ Refactor XNN workspace sharing to allow runtime gating Jun 17, 2025

GregoryComer marked this pull request as ready for review June 17, 2025 20:37

GregoryComer requested review from digantdesai and mcr229 as code owners June 17, 2025 20:37

mergennachin requested a review from cccclai June 18, 2025 17:17

digantdesai reviewed Jun 25, 2025

View reviewed changes

backends/xnnpack/runtime/XNNPACKBackend.cpp Outdated Show resolved Hide resolved

backends/xnnpack/runtime/XNNPACKBackend.cpp Outdated Show resolved Hide resolved

backends/xnnpack/runtime/XNNPACKBackend.h Outdated Show resolved Hide resolved

mcr229 reviewed Jun 30, 2025

View reviewed changes

backends/xnnpack/runtime/XNNCompiler.cpp Outdated Show resolved Hide resolved

backends/xnnpack/runtime/XNNPACKBackend.cpp Outdated Show resolved Hide resolved

mcr229 reviewed Jul 1, 2025

View reviewed changes

backends/xnnpack/runtime/XNNPACKBackend.cpp Outdated Show resolved Hide resolved

GregoryComer force-pushed the xnnpack-runtime-workspace-sharing branch from af3e5b7 to 86e49ba Compare July 9, 2025 19:54

GregoryComer force-pushed the xnnpack-runtime-workspace-sharing branch from 86e49ba to 6ae2330 Compare July 30, 2025 22:58

GregoryComer requested review from JacobSzwejbka and lucylq as code owners July 30, 2025 22:58

GregoryComer force-pushed the xnnpack-runtime-workspace-sharing branch from 6ae2330 to f7945d3 Compare July 31, 2025 00:53

GregoryComer force-pushed the xnnpack-runtime-workspace-sharing branch from f7945d3 to c73db0e Compare July 31, 2025 00:56

GregoryComer changed the title ~~Refactor XNN workspace sharing to allow runtime gating~~ Add XNNPACK backend option for workspace sharing Jul 31, 2025

digantdesai reviewed Jul 31, 2025

View reviewed changes

digantdesai reviewed Aug 1, 2025

View reviewed changes

This was referenced Aug 7, 2025

Remove build-time workspace sharing option #13189

Open

Enable workspace sharing tests in OSS CMake #13194

Open

GregoryComer force-pushed the xnnpack-runtime-workspace-sharing branch from c73db0e to 629e084 Compare August 7, 2025 17:44

GregoryComer force-pushed the xnnpack-runtime-workspace-sharing branch from 629e084 to 40cbb40 Compare August 7, 2025 21:01

GregoryComer force-pushed the xnnpack-runtime-workspace-sharing branch from 40cbb40 to 9dc1e66 Compare August 7, 2025 21:19

digantdesai reviewed Aug 19, 2025

View reviewed changes

digantdesai approved these changes Aug 19, 2025

View reviewed changes

GregoryComer force-pushed the xnnpack-runtime-workspace-sharing branch from 9dc1e66 to f7686a6 Compare September 2, 2025 17:47

facebook-github-bot merged commit 43bd889 into pytorch:main Sep 3, 2025
111 of 115 checks passed

		return std::make_shared<XNNWorkspace>(
		WorkspacePtr(workspace, &xnn_release_workspace));

	Module mod1(std::getenv("ET_XNNPACK_GENERATED_ADD_LARGE_PTE_PATH"));
	Module module_for_pte_1(std::getenv("ET_XNNPACK_GENERATED_ADD_LARGE_PTE_PATH"));

Add XNNPACK backend option for workspace sharing #11748

Add XNNPACK backend option for workspace sharing #11748

Uh oh!

Conversation

GregoryComer commented Jun 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Uh oh!

pytorch-bot bot commented Jun 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/11748

❗ 1 Active SEVs

❌ 1 New Failure, 1 Unrelated Failure

Uh oh!

facebook-github-bot commented Jun 17, 2025

Uh oh!

facebook-github-bot commented Jun 17, 2025

Uh oh!

facebook-github-bot commented Jun 17, 2025

Uh oh!

digantdesai left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cccclai commented Jun 25, 2025

Uh oh!

GregoryComer commented Jun 25, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

facebook-github-bot commented Jul 9, 2025

Uh oh!

facebook-github-bot commented Jul 31, 2025

Uh oh!

facebook-github-bot commented Jul 31, 2025

Uh oh!

GregoryComer commented Jul 31, 2025

Uh oh!

digantdesai left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

GregoryComer Jul 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

digantdesai left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

GregoryComer Aug 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

GregoryComer Aug 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

GregoryComer commented Jun 17, 2025 •

edited

Loading

pytorch-bot bot commented Jun 17, 2025 •

edited

Loading

GregoryComer Jul 31, 2025 •

edited

Loading

GregoryComer Aug 7, 2025 •

edited

Loading

GregoryComer Aug 7, 2025 •

edited

Loading

GregoryComer Aug 7, 2025 •

edited

Loading

digantdesai Aug 19, 2025 •

edited

Loading