-
Notifications
You must be signed in to change notification settings - Fork 749
Add XNNPACK backend option for workspace sharing (re-land) #13934
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add XNNPACK backend option for workspace sharing (re-land) #13934
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/13934
Note: Links to docs will display an error until the docs builds have been completed. ❌ 1 New FailureAs of commit b1ce3e9 with merge base 07d1092 ( NEW FAILURE - The following job has failed:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
|
This pull request was exported from Phabricator. Differential Revision: D81647105 |
|
@digantdesai Any concerns with the changes I've made here? |
| // shared_ptr, as the pointer in the executor is freed, which includes | ||
| // the mutex referenced by raii_lock. | ||
| auto workspace = executor->get_workspace(); | ||
| auto [raii_lock, _] = workspace->acquire(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this the new change w/ re-land?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that's correct. It keeps a shared_ptr handle to the XnnWorkspace object alive during destruction (the mutex is inside this). I'm open to other ideas, as well. It's slightly awkward with the current design.
digantdesai
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright, let's do it. Good luck!
…3934) Summary: **Note: This is a re-land, fixing a use after free which occurred when destroying a delegate instance. The executor is destroyed, which frees the workspace. The mutex that raii_lock points to is owned by the workspace. There is then a use after free when raii_lock goes out of scope. This is fixed by taking an owning reference to the workspace in destroy.** Add a backend option for XNNPACK to enable runtime control of workspace sharing. I've added 3 mode options - Disabled, PerModel, and Global. PerModel shares the workspace between all CALL_DELEGATE instances in a model, keyed by memory allocator address (see below). Global uses a single workspace instance. I've written the code to allow for the active workspace mode to be safely changed at any time. The workspace instance is resolved at delegate instance init time (model load) and is stored in the XNNExecutor instance. This design will also allow us to set per-model sharing options in the future. I've introduced a wrapper class (XNNWorkspace) to help with synchronization. With regard to the PerModel behavior, I am using the address of the runtime allocator to disambiguate the model. This is not ideal in the long-run, but there is some larger discussion around generating IDs in a coherent manner in multithreaded environments without synchronization in the core runtime. This might require PAL changes (exposing a thread ID, for example), so I intend to come back to this. It should be possible to transparently update this logic in the future. The program ID can collide or change without affecting correctness, but may increase memory (for collisions) or enforce extra synchronization (if unstable between delegate instances in a method). I'd like to add a PerMethod mode as a follow-up. This should be keyed to the specific method instance (not name), such that multiple method instances for the same method can be loaded for execution on different threads without forcing synchronization, but still allow sharing between call delegate instances in each method instance. This will require a unique method identifier. Reviewed By: digantdesai Differential Revision: D81647105
e27afff to
52ff3e2
Compare
|
@GregoryComer has exported this pull request. If you are a Meta employee, you can view the originating diff in D81647105. |
…3934) Summary: **Note: This is a re-land, fixing a use after free which occurred when destroying a delegate instance. The executor is destroyed, which frees the workspace. The mutex that raii_lock points to is owned by the workspace. There is then a use after free when raii_lock goes out of scope. This is fixed by taking an owning reference to the workspace in destroy.** Add a backend option for XNNPACK to enable runtime control of workspace sharing. I've added 3 mode options - Disabled, PerModel, and Global. PerModel shares the workspace between all CALL_DELEGATE instances in a model, keyed by memory allocator address (see below). Global uses a single workspace instance. I've written the code to allow for the active workspace mode to be safely changed at any time. The workspace instance is resolved at delegate instance init time (model load) and is stored in the XNNExecutor instance. This design will also allow us to set per-model sharing options in the future. I've introduced a wrapper class (XNNWorkspace) to help with synchronization. With regard to the PerModel behavior, I am using the address of the runtime allocator to disambiguate the model. This is not ideal in the long-run, but there is some larger discussion around generating IDs in a coherent manner in multithreaded environments without synchronization in the core runtime. This might require PAL changes (exposing a thread ID, for example), so I intend to come back to this. It should be possible to transparently update this logic in the future. The program ID can collide or change without affecting correctness, but may increase memory (for collisions) or enforce extra synchronization (if unstable between delegate instances in a method). I'd like to add a PerMethod mode as a follow-up. This should be keyed to the specific method instance (not name), such that multiple method instances for the same method can be loaded for execution on different threads without forcing synchronization, but still allow sharing between call delegate instances in each method instance. This will require a unique method identifier. Reviewed By: digantdesai Differential Revision: D81647105
52ff3e2 to
b1ce3e9
Compare
|
@GregoryComer has exported this pull request. If you are a Meta employee, you can view the originating diff in D81647105. |
Differential Revision: D81647105 Pull Request resolved: pytorch#13934
Summary:
Note: This is a re-land, fixing a use after free which occurred when destroying a delegate instance. The executor is destroyed, which frees the workspace. The mutex that raii_lock points to is owned by the workspace. There is then a use after free when raii_lock goes out of scope. This is fixed by taking an owning reference to the workspace in destroy.
Add a backend option for XNNPACK to enable runtime control of workspace sharing. I've added 3 mode options - Disabled, PerModel, and Global. PerModel shares the workspace between all CALL_DELEGATE instances in a model, keyed by memory allocator address (see below). Global uses a single workspace instance.
I've written the code to allow for the active workspace mode to be safely changed at any time. The workspace instance is resolved at delegate instance init time (model load) and is stored in the XNNExecutor instance. This design will also allow us to set per-model sharing options in the future. I've introduced a wrapper class (XNNWorkspace) to help with synchronization.
With regard to the PerModel behavior, I am using the address of the runtime allocator to disambiguate the model. This is not ideal in the long-run, but there is some larger discussion around generating IDs in a coherent manner in multithreaded environments without synchronization in the core runtime. This might require PAL changes (exposing a thread ID, for example), so I intend to come back to this.
It should be possible to transparently update this logic in the future. The program ID can collide or change without affecting correctness, but may increase memory (for collisions) or enforce extra synchronization (if unstable between delegate instances in a method).
I'd like to add a PerMethod mode as a follow-up. This should be keyed to the specific method instance (not name), such that multiple method instances for the same method can be loaded for execution on different threads without forcing synchronization, but still allow sharing between call delegate instances in each method instance. This will require a unique method identifier.
Differential Revision: D81647105