Skip to content

Conversation

@GregoryComer
Copy link
Member

Summary:
Note: This is a re-land, fixing a use after free which occurred when destroying a delegate instance. The executor is destroyed, which frees the workspace. The mutex that raii_lock points to is owned by the workspace. There is then a use after free when raii_lock goes out of scope. This is fixed by taking an owning reference to the workspace in destroy.

Add a backend option for XNNPACK to enable runtime control of workspace sharing. I've added 3 mode options - Disabled, PerModel, and Global. PerModel shares the workspace between all CALL_DELEGATE instances in a model, keyed by memory allocator address (see below). Global uses a single workspace instance.

I've written the code to allow for the active workspace mode to be safely changed at any time. The workspace instance is resolved at delegate instance init time (model load) and is stored in the XNNExecutor instance. This design will also allow us to set per-model sharing options in the future. I've introduced a wrapper class (XNNWorkspace) to help with synchronization.

With regard to the PerModel behavior, I am using the address of the runtime allocator to disambiguate the model. This is not ideal in the long-run, but there is some larger discussion around generating IDs in a coherent manner in multithreaded environments without synchronization in the core runtime. This might require PAL changes (exposing a thread ID, for example), so I intend to come back to this.

It should be possible to transparently update this logic in the future. The program ID can collide or change without affecting correctness, but may increase memory (for collisions) or enforce extra synchronization (if unstable between delegate instances in a method).

I'd like to add a PerMethod mode as a follow-up. This should be keyed to the specific method instance (not name), such that multiple method instances for the same method can be loaded for execution on different threads without forcing synchronization, but still allow sharing between call delegate instances in each method instance. This will require a unique method identifier.

Differential Revision: D81647105

@pytorch-bot
Copy link

pytorch-bot bot commented Sep 4, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/13934

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure

As of commit b1ce3e9 with merge base 07d1092 (image):

NEW FAILURE - The following job has failed:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Sep 4, 2025
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D81647105

@GregoryComer GregoryComer added the release notes: xnnpack Changes to the XNNPack backend delegate label Sep 4, 2025
@GregoryComer
Copy link
Member Author

@digantdesai Any concerns with the changes I've made here?

// shared_ptr, as the pointer in the executor is freed, which includes
// the mutex referenced by raii_lock.
auto workspace = executor->get_workspace();
auto [raii_lock, _] = workspace->acquire();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this the new change w/ re-land?

Copy link
Member Author

@GregoryComer GregoryComer Sep 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that's correct. It keeps a shared_ptr handle to the XnnWorkspace object alive during destruction (the mutex is inside this). I'm open to other ideas, as well. It's slightly awkward with the current design.

Copy link
Contributor

@digantdesai digantdesai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, let's do it. Good luck!

GregoryComer added a commit to GregoryComer/executorch that referenced this pull request Sep 11, 2025
…3934)

Summary:

**Note: This is a re-land, fixing a use after free which occurred when destroying a delegate instance. The executor is destroyed, which frees the workspace. The mutex that raii_lock points to is owned by the workspace. There is then a use after free when raii_lock goes out of scope. This is fixed by taking an owning reference to the workspace in destroy.**

Add a backend option for XNNPACK to enable runtime control of workspace sharing. I've added 3 mode options - Disabled, PerModel, and Global. PerModel shares the workspace between all CALL_DELEGATE instances in a model, keyed by memory allocator address (see below). Global uses a single workspace instance.

I've written the code to allow for the active workspace mode to be safely changed at any time. The workspace instance is resolved at delegate instance init time (model load) and is stored in the XNNExecutor instance. This design will also allow us to set per-model sharing options in the future. I've introduced a wrapper class (XNNWorkspace) to help with synchronization.

With regard to the PerModel behavior, I am using the address of the runtime allocator to disambiguate the model. This is not ideal in the long-run, but there is some larger discussion around generating IDs in a coherent manner in multithreaded environments without synchronization in the core runtime. This might require PAL changes (exposing a thread ID, for example), so I intend to come back to this.

It should be possible to transparently update this logic in the future. The program ID can collide or change without affecting correctness, but may increase memory (for collisions) or enforce extra synchronization (if unstable between delegate instances in a method).

I'd like to add a PerMethod mode as a follow-up. This should be keyed to the specific method instance (not name), such that multiple method instances for the same method can be loaded for execution on different threads without forcing synchronization, but still allow sharing between call delegate instances in each method instance. This will require a unique method identifier.

Reviewed By: digantdesai

Differential Revision: D81647105
@facebook-github-bot
Copy link
Contributor

@GregoryComer has exported this pull request. If you are a Meta employee, you can view the originating diff in D81647105.

…3934)

Summary:

**Note: This is a re-land, fixing a use after free which occurred when destroying a delegate instance. The executor is destroyed, which frees the workspace. The mutex that raii_lock points to is owned by the workspace. There is then a use after free when raii_lock goes out of scope. This is fixed by taking an owning reference to the workspace in destroy.**

Add a backend option for XNNPACK to enable runtime control of workspace sharing. I've added 3 mode options - Disabled, PerModel, and Global. PerModel shares the workspace between all CALL_DELEGATE instances in a model, keyed by memory allocator address (see below). Global uses a single workspace instance.

I've written the code to allow for the active workspace mode to be safely changed at any time. The workspace instance is resolved at delegate instance init time (model load) and is stored in the XNNExecutor instance. This design will also allow us to set per-model sharing options in the future. I've introduced a wrapper class (XNNWorkspace) to help with synchronization.

With regard to the PerModel behavior, I am using the address of the runtime allocator to disambiguate the model. This is not ideal in the long-run, but there is some larger discussion around generating IDs in a coherent manner in multithreaded environments without synchronization in the core runtime. This might require PAL changes (exposing a thread ID, for example), so I intend to come back to this.

It should be possible to transparently update this logic in the future. The program ID can collide or change without affecting correctness, but may increase memory (for collisions) or enforce extra synchronization (if unstable between delegate instances in a method).

I'd like to add a PerMethod mode as a follow-up. This should be keyed to the specific method instance (not name), such that multiple method instances for the same method can be loaded for execution on different threads without forcing synchronization, but still allow sharing between call delegate instances in each method instance. This will require a unique method identifier.

Reviewed By: digantdesai

Differential Revision: D81647105
@facebook-github-bot
Copy link
Contributor

@GregoryComer has exported this pull request. If you are a Meta employee, you can view the originating diff in D81647105.

@facebook-github-bot facebook-github-bot merged commit a523306 into pytorch:main Sep 20, 2025
125 of 128 checks passed
StrycekSimon pushed a commit to nxp-upstream/executorch that referenced this pull request Sep 23, 2025
Differential Revision: D81647105

Pull Request resolved: pytorch#13934
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. fb-exported meta-exported release notes: xnnpack Changes to the XNNPack backend delegate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants