Skip to content

Conversation

@GregoryComer
Copy link
Member

@GregoryComer GregoryComer commented Jun 17, 2025

Summary

Add a backend option for XNNPACK to enable runtime control of workspace sharing. I've added 3 mode options - Disabled, PerModel, and Global. PerModel shares the workspace between all CALL_DELEGATE instances in a model, keyed by memory allocator address (see below). Global uses a single workspace instance.

I've written the code to allow for the active workspace mode to be safely changed at any time. The workspace instance is resolved at delegate instance init time (model load) and is stored in the XNNExecutor instance. This design will also allow us to set per-model sharing options in the future. I've introduced a wrapper class (XNNWorkspace) to help with synchronization.

With regard to the PerModel behavior, I am using the address of the runtime allocator to disambiguate the model. This is not ideal in the long-run, but there is some larger discussion around generating IDs in a coherent manner in multithreaded environments without synchronization in the core runtime. This might require PAL changes (exposing a thread ID, for example), so I intend to come back to this.

It should be possible to transparently update this logic in the future. The program ID can collide or change without affecting correctness, but may increase memory (for collisions) or enforce extra synchronization (if unstable between delegate instances in a method).

I'd like to add a PerMethod mode as a follow-up. This should be keyed to the specific method instance (not name), such that multiple method instances for the same method can be loaded for execution on different threads without forcing synchronization, but still allow sharing between call delegate instances in each method instance. This will require a unique method identifier.

Test plan

CI. I've also added a set of dedicated tests for getting/setting the option, running PTEs in each mode, switching modes at runtime, and I've also updated the multithreaded stress test to run in each mode.

@pytorch-bot
Copy link

pytorch-bot bot commented Jun 17, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/11748

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

❌ 1 New Failure, 1 Unrelated Failure

As of commit f7686a6 with merge base c3f8d64 (image):

NEW FAILURE - The following job has failed:

BROKEN TRUNK - The following job failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jun 17, 2025
@GregoryComer GregoryComer force-pushed the xnnpack-runtime-workspace-sharing branch 2 times, most recently from f12f1ff to 0fd2a34 Compare June 17, 2025 03:44
@GregoryComer GregoryComer added the release notes: none Do not include this in the release notes label Jun 17, 2025
@facebook-github-bot
Copy link
Contributor

@GregoryComer has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@GregoryComer GregoryComer force-pushed the xnnpack-runtime-workspace-sharing branch 3 times, most recently from 3ed0e10 to 3c08efe Compare June 17, 2025 05:45
@facebook-github-bot
Copy link
Contributor

@GregoryComer has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D76789804

GregoryComer added a commit to GregoryComer/executorch that referenced this pull request Jun 17, 2025
…#11748)

Summary:
Refactor the XNN backend workspace sharing logic to allow runtime gating. I've also added a temporary (marked experimental) API to enable workspace sharing. This will be replaced with backend options once available.

Pull Request resolved: pytorch#11748

Test Plan:
CI

Rollback Plan:

Differential Revision: D76789804

Pulled By: GregoryComer
@GregoryComer GregoryComer force-pushed the xnnpack-runtime-workspace-sharing branch from 3c08efe to af3e5b7 Compare June 17, 2025 20:36
@GregoryComer GregoryComer changed the title (WIP) Refactor XNN workspace sharing to allow runtime gating Refactor XNN workspace sharing to allow runtime gating Jun 17, 2025
@GregoryComer GregoryComer marked this pull request as ready for review June 17, 2025 20:37
@mergennachin mergennachin requested a review from cccclai June 18, 2025 17:17
Copy link
Contributor

@digantdesai digantdesai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Please add a test at least to make sure we don't regress on the API and its semantics?

@cccclai
Copy link
Contributor

cccclai commented Jun 25, 2025

backend options is landed btw

@GregoryComer
Copy link
Member Author

backend options is landed btw

Thanks. I'll rework to use those.

@GregoryComer GregoryComer force-pushed the xnnpack-runtime-workspace-sharing branch from af3e5b7 to 86e49ba Compare July 9, 2025 19:54
GregoryComer added a commit to GregoryComer/executorch that referenced this pull request Jul 9, 2025
Summary:
Refactor the XNN backend workspace sharing logic to allow runtime gating. I've also added a temporary (marked experimental) API to enable workspace sharing. This will be replaced with backend options once available.


Test Plan:
CI

Rollback Plan:

Differential Revision: D76789804

Pulled By: GregoryComer
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D76789804

@GregoryComer GregoryComer force-pushed the xnnpack-runtime-workspace-sharing branch from 86e49ba to 6ae2330 Compare July 30, 2025 22:58
GregoryComer added a commit to GregoryComer/executorch that referenced this pull request Jul 31, 2025
Summary:
Refactor the XNN backend workspace sharing logic to allow runtime gating using the backend option interface.


Test Plan:
CI

Rollback Plan:

Differential Revision: D76789804

Pulled By: GregoryComer
@GregoryComer GregoryComer force-pushed the xnnpack-runtime-workspace-sharing branch from 6ae2330 to f7945d3 Compare July 31, 2025 00:53
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D76789804

GregoryComer added a commit to GregoryComer/executorch that referenced this pull request Jul 31, 2025
Summary:
Refactor the XNN backend workspace sharing logic to allow runtime gating using the backend option interface.

Test Plan:
CI

Rollback Plan:

Differential Revision: D76789804

Pulled By: GregoryComer
@GregoryComer GregoryComer force-pushed the xnnpack-runtime-workspace-sharing branch from f7945d3 to c73db0e Compare July 31, 2025 00:56
@facebook-github-bot
Copy link
Contributor

@GregoryComer has imported this pull request. If you are a Meta employee, you can view this in D76789804.

@GregoryComer GregoryComer changed the title Refactor XNN workspace sharing to allow runtime gating Add XNNPACK backend option for workspace sharing Jul 31, 2025
@GregoryComer
Copy link
Member Author

Changes are code complete. I'm waiting on full CI run, but this should be ready for review again.

CC @mcr229 @digantdesai

Copy link
Contributor

@digantdesai digantdesai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

still reviewing but want to drop this before I get back to this

backend_options) override {
if (backend_options.size() > 0) {
for (const auto& option : backend_options) {
if (strcmp(option.key, xnnpack::workspace_sharing_mode_option_key) ==
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

feels like need a bit of restructuring. should we wrap xnnpack options in a wrapper which also implements setter and getter? so then this just becomes a look up and set on it.

Copy link
Member Author

@GregoryComer GregoryComer Jul 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I was thinking about introducing an abstraction around backend options, but didn't want to bloat this PR. I can take this a follow-up, if no objection.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO(#issue) please

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Filed as #13190. I added some brief comments in the issue, but it would be nice create a reusable abstraction that multiple backends can leverage. But I don't know if we can get it in the core runtime (space + embedded constraints), and I don't want to add it as an extension (I don't think it makes sense). Any thoughts?

Copy link
Contributor

@digantdesai digantdesai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some more comments. I will work with you to make sure we can land this today

Comment on lines +51 to +59
return std::make_shared<XNNWorkspace>(
WorkspacePtr(workspace, &xnn_release_workspace));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this works well (1) for global when the ref count goes to 0, and (2) for per_model when the individual user ref count goes to 0. If we are switching mode, then when will these "duplicated" workspaces will get released? Should we explicitly detect a mode switch and release the old one? This will make switch cost high but peak memory low.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternative is to not allow switching after init :p

Copy link
Member Author

@GregoryComer GregoryComer Aug 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My intent is that workspaces get freed whenever all users are unloaded. Each executor instance holds a shared_ptr, and the top-level backend uses weak pointers to allow them to get cleaned up automatically. It would be nice to assert this behavior in a test. I might refactor the workspace management logic into a dedicated class to make this easier.

The global workspace is currently not released, but it probably should be. I'll update to hold a weak pointer and re-create when needed.

Copy link
Member Author

@GregoryComer GregoryComer Aug 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to allow release and re-creation of the global workspace.

std::scoped_lock<std::mutex> lock(workspace_meta_mutex_);

// Check for an existing (live) workspace for this program.
auto match = model_workspaces_.find(program_id);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking out loud, worst case if we have collision in the program_id we will end up collobering memory during inference since two runtimes will go update the workspace.

Copy link
Member Author

@GregoryComer GregoryComer Aug 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I don't love this, but it should be safe in all cases, at least. Either you end up using more memory than needed or it enforces extra synchronization. I'd like to push more on the method/program ID.


for (auto i = 0; i < modes.size(); ++i) {
for (auto j = i + 1; j < modes.size(); ++j) {
run_and_validate_two_models(modes[i], modes[j]);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

GregoryComer added a commit to GregoryComer/executorch that referenced this pull request Aug 7, 2025
Summary:
Refactor the XNN backend workspace sharing logic to allow runtime gating using the backend option interface.

Test Plan:
CI

Rollback Plan:

Differential Revision: D76789804

Pulled By: GregoryComer
@GregoryComer GregoryComer force-pushed the xnnpack-runtime-workspace-sharing branch from c73db0e to 629e084 Compare August 7, 2025 17:44
@facebook-github-bot
Copy link
Contributor

@GregoryComer has imported this pull request. If you are a Meta employee, you can view this in D76789804.

@GregoryComer GregoryComer force-pushed the xnnpack-runtime-workspace-sharing branch from 629e084 to 40cbb40 Compare August 7, 2025 21:01
GregoryComer added a commit to GregoryComer/executorch that referenced this pull request Aug 7, 2025
Summary:
Add a backend option for XNNPACK to enable runtime control of workspace sharing. I've added 3 mode options - Disabled, PerModel, and Global. PerModel shares the workspace between all CALL_DELEGATE instances in a model, keyed by memory allocator address (see below). Global uses a single workspace instance.

I've written the code to allow for the active workspace mode to be safely changed at any time. The workspace instance is resolved at delegate instance init time (model load) and is stored in the XNNExecutor instance. This design will also allow us to set per-model sharing options in the future. I've introduced a wrapper class (XNNWorkspace) to help with synchronization.

With regard to the PerModel behavior, I am using the address of the runtime allocator to disambiguate the model. This is not ideal in the long-run, but there is some larger discussion around generating IDs in a coherent manner in multithreaded environments without synchronization in the core runtime. This might require PAL changes (exposing a thread ID, for example), so I intend to come back to this.

It should be possible to transparently update this logic in the future. The program ID can collide or change without affecting correctness, but may increase memory (for collisions) or enforce extra synchronization (if unstable between delegate instances in a method).

I'd like to add a PerMethod mode as a follow-up. This should be keyed to the specific method instance (not name), such that multiple method instances for the same method can be loaded for execution on different threads without forcing synchronization, but still allow sharing between call delegate instances in each method instance. This will require a unique method identifier.


Test Plan:
CI. I've also added a set of dedicated tests for getting/setting the option, running PTEs in each mode, switching modes at runtime, and I've also updated the multithreaded stress test to run in each mode.

Rollback Plan:

Differential Revision: D76789804

Pulled By: GregoryComer
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D76789804

GregoryComer added a commit to GregoryComer/executorch that referenced this pull request Aug 7, 2025
Summary:
Add a backend option for XNNPACK to enable runtime control of workspace sharing. I've added 3 mode options - Disabled, PerModel, and Global. PerModel shares the workspace between all CALL_DELEGATE instances in a model, keyed by memory allocator address (see below). Global uses a single workspace instance.

I've written the code to allow for the active workspace mode to be safely changed at any time. The workspace instance is resolved at delegate instance init time (model load) and is stored in the XNNExecutor instance. This design will also allow us to set per-model sharing options in the future. I've introduced a wrapper class (XNNWorkspace) to help with synchronization.

With regard to the PerModel behavior, I am using the address of the runtime allocator to disambiguate the model. This is not ideal in the long-run, but there is some larger discussion around generating IDs in a coherent manner in multithreaded environments without synchronization in the core runtime. This might require PAL changes (exposing a thread ID, for example), so I intend to come back to this.

It should be possible to transparently update this logic in the future. The program ID can collide or change without affecting correctness, but may increase memory (for collisions) or enforce extra synchronization (if unstable between delegate instances in a method).

I'd like to add a PerMethod mode as a follow-up. This should be keyed to the specific method instance (not name), such that multiple method instances for the same method can be loaded for execution on different threads without forcing synchronization, but still allow sharing between call delegate instances in each method instance. This will require a unique method identifier.


Test Plan:
CI. I've also added a set of dedicated tests for getting/setting the option, running PTEs in each mode, switching modes at runtime, and I've also updated the multithreaded stress test to run in each mode.

Rollback Plan:

Differential Revision: D76789804

Pulled By: GregoryComer
@GregoryComer GregoryComer force-pushed the xnnpack-runtime-workspace-sharing branch from 40cbb40 to 9dc1e66 Compare August 7, 2025 21:19
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D76789804

// A global workspace for all delegate instances, if global sharing is
// enabled. Lazy initialized. Stored as a weak pointer to allow automatic
// cleanup when all references are released.
mutable std::weak_ptr<XNNWorkspace> global_workspace_;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can this be in model_workspaces_ with program_id<uintptr_t> == 0 or something, this will simplify get_or_create methods, and also other logic which treats global workspace different when its is just a shared program_id in reality.

set_and_check_workspace_sharing_mode(*mode1);
}

Module mod1(std::getenv("ET_XNNPACK_GENERATED_ADD_LARGE_PTE_PATH"));
Copy link
Contributor

@digantdesai digantdesai Aug 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit mod/mode/model/module :P

Suggested change
Module mod1(std::getenv("ET_XNNPACK_GENERATED_ADD_LARGE_PTE_PATH"));
Module module_for_pte_1(std::getenv("ET_XNNPACK_GENERATED_ADD_LARGE_PTE_PATH"));


auto program_id =
reinterpret_cast<uintptr_t>(context.get_runtime_allocator());
auto workspace = ET_UNWRAP(get_or_create_workspace(program_id));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so for a given delegate once a workspace is created, there won't be any impact of subsequent mode changes in the process, until it is delegated right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, the workspace is effectively "locked in" at the time of the delegate instance creation. So it should remain in that mode indefinitely. That seemed like the easiest way to handle the global mode option safely, though I'm open to suggestions.

Copy link
Contributor

@digantdesai digantdesai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @GregoryComer

Summary:
Add a backend option for XNNPACK to enable runtime control of workspace sharing. I've added 3 mode options - Disabled, PerModel, and Global. PerModel shares the workspace between all CALL_DELEGATE instances in a model, keyed by memory allocator address (see below). Global uses a single workspace instance.

I've written the code to allow for the active workspace mode to be safely changed at any time. The workspace instance is resolved at delegate instance init time (model load) and is stored in the XNNExecutor instance. This design will also allow us to set per-model sharing options in the future. I've introduced a wrapper class (XNNWorkspace) to help with synchronization.

With regard to the PerModel behavior, I am using the address of the runtime allocator to disambiguate the model. This is not ideal in the long-run, but there is some larger discussion around generating IDs in a coherent manner in multithreaded environments without synchronization in the core runtime. This might require PAL changes (exposing a thread ID, for example), so I intend to come back to this.

It should be possible to transparently update this logic in the future. The program ID can collide or change without affecting correctness, but may increase memory (for collisions) or enforce extra synchronization (if unstable between delegate instances in a method).

I'd like to add a PerMethod mode as a follow-up. This should be keyed to the specific method instance (not name), such that multiple method instances for the same method can be loaded for execution on different threads without forcing synchronization, but still allow sharing between call delegate instances in each method instance. This will require a unique method identifier.


Test Plan:
CI. I've also added a set of dedicated tests for getting/setting the option, running PTEs in each mode, switching modes at runtime, and I've also updated the multithreaded stress test to run in each mode.

Rollback Plan:

Reviewed By: digantdesai

Differential Revision: D76789804

Pulled By: GregoryComer
@GregoryComer GregoryComer force-pushed the xnnpack-runtime-workspace-sharing branch from 9dc1e66 to f7686a6 Compare September 2, 2025 17:47
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D76789804

@facebook-github-bot facebook-github-bot merged commit 43bd889 into pytorch:main Sep 3, 2025
111 of 115 checks passed
GregoryComer added a commit to GregoryComer/executorch that referenced this pull request Sep 4, 2025
…1748)

Summary:
**Note: This is a re-land, fixing a use after free which occurred when destroying a delegate instance. The executor is destroyed, which frees the workspace. The mutex that raii_lock points to is owned by the workspace. There is then a use after free when raii_lock goes out of scope. This is fixed by taking an owning reference to the workspace in destroy.**

Add a backend option for XNNPACK to enable runtime control of workspace sharing. I've added 3 mode options - Disabled, PerModel, and Global. PerModel shares the workspace between all CALL_DELEGATE instances in a model, keyed by memory allocator address (see below). Global uses a single workspace instance.

I've written the code to allow for the active workspace mode to be safely changed at any time. The workspace instance is resolved at delegate instance init time (model load) and is stored in the XNNExecutor instance. This design will also allow us to set per-model sharing options in the future. I've introduced a wrapper class (XNNWorkspace) to help with synchronization.

With regard to the PerModel behavior, I am using the address of the runtime allocator to disambiguate the model. This is not ideal in the long-run, but there is some larger discussion around generating IDs in a coherent manner in multithreaded environments without synchronization in the core runtime. This might require PAL changes (exposing a thread ID, for example), so I intend to come back to this.

It should be possible to transparently update this logic in the future. The program ID can collide or change without affecting correctness, but may increase memory (for collisions) or enforce extra synchronization (if unstable between delegate instances in a method).

I'd like to add a PerMethod mode as a follow-up. This should be keyed to the specific method instance (not name), such that multiple method instances for the same method can be loaded for execution on different threads without forcing synchronization, but still allow sharing between call delegate instances in each method instance. This will require a unique method identifier.


Test Plan:
CI. I've also added a set of dedicated tests for getting/setting the option, running PTEs in each mode, switching modes at runtime, and I've also updated the multithreaded stress test to run in each mode.

Rollback Plan:

Differential Revision: D81646781

Pulled By: GregoryComer
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. fb-exported release notes: none Do not include this in the release notes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants