-
Notifications
You must be signed in to change notification settings - Fork 743
Add XNNPACK backend option for workspace sharing #11748
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add XNNPACK backend option for workspace sharing #11748
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/11748
Note: Links to docs will display an error until the docs builds have been completed. ❗ 1 Active SEVsThere are 1 currently active SEVs. If your PR is affected, please view them below: ❌ 1 New Failure, 1 Unrelated FailureAs of commit f7686a6 with merge base c3f8d64 ( NEW FAILURE - The following job has failed:
BROKEN TRUNK - The following job failed but were present on the merge base:👉 Rebase onto the `viable/strict` branch to avoid these failures
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
f12f1ff to
0fd2a34
Compare
|
@GregoryComer has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
3ed0e10 to
3c08efe
Compare
|
@GregoryComer has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
|
This pull request was exported from Phabricator. Differential Revision: D76789804 |
…#11748) Summary: Refactor the XNN backend workspace sharing logic to allow runtime gating. I've also added a temporary (marked experimental) API to enable workspace sharing. This will be replaced with backend options once available. Pull Request resolved: pytorch#11748 Test Plan: CI Rollback Plan: Differential Revision: D76789804 Pulled By: GregoryComer
3c08efe to
af3e5b7
Compare
digantdesai
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. Please add a test at least to make sure we don't regress on the API and its semantics?
|
backend options is landed btw |
Thanks. I'll rework to use those. |
af3e5b7 to
86e49ba
Compare
Summary: Refactor the XNN backend workspace sharing logic to allow runtime gating. I've also added a temporary (marked experimental) API to enable workspace sharing. This will be replaced with backend options once available. Test Plan: CI Rollback Plan: Differential Revision: D76789804 Pulled By: GregoryComer
|
This pull request was exported from Phabricator. Differential Revision: D76789804 |
86e49ba to
6ae2330
Compare
Summary: Refactor the XNN backend workspace sharing logic to allow runtime gating using the backend option interface. Test Plan: CI Rollback Plan: Differential Revision: D76789804 Pulled By: GregoryComer
6ae2330 to
f7945d3
Compare
|
This pull request was exported from Phabricator. Differential Revision: D76789804 |
Summary: Refactor the XNN backend workspace sharing logic to allow runtime gating using the backend option interface. Test Plan: CI Rollback Plan: Differential Revision: D76789804 Pulled By: GregoryComer
f7945d3 to
c73db0e
Compare
|
@GregoryComer has imported this pull request. If you are a Meta employee, you can view this in D76789804. |
|
Changes are code complete. I'm waiting on full CI run, but this should be ready for review again. |
digantdesai
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
still reviewing but want to drop this before I get back to this
| backend_options) override { | ||
| if (backend_options.size() > 0) { | ||
| for (const auto& option : backend_options) { | ||
| if (strcmp(option.key, xnnpack::workspace_sharing_mode_option_key) == |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
feels like need a bit of restructuring. should we wrap xnnpack options in a wrapper which also implements setter and getter? so then this just becomes a look up and set on it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I was thinking about introducing an abstraction around backend options, but didn't want to bloat this PR. I can take this a follow-up, if no objection.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO(#issue) please
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Filed as #13190. I added some brief comments in the issue, but it would be nice create a reusable abstraction that multiple backends can leverage. But I don't know if we can get it in the core runtime (space + embedded constraints), and I don't want to add it as an extension (I don't think it makes sense). Any thoughts?
digantdesai
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some more comments. I will work with you to make sure we can land this today
| return std::make_shared<XNNWorkspace>( | ||
| WorkspacePtr(workspace, &xnn_release_workspace)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this works well (1) for global when the ref count goes to 0, and (2) for per_model when the individual user ref count goes to 0. If we are switching mode, then when will these "duplicated" workspaces will get released? Should we explicitly detect a mode switch and release the old one? This will make switch cost high but peak memory low.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alternative is to not allow switching after init :p
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My intent is that workspaces get freed whenever all users are unloaded. Each executor instance holds a shared_ptr, and the top-level backend uses weak pointers to allow them to get cleaned up automatically. It would be nice to assert this behavior in a test. I might refactor the workspace management logic into a dedicated class to make this easier.
The global workspace is currently not released, but it probably should be. I'll update to hold a weak pointer and re-create when needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated to allow release and re-creation of the global workspace.
| std::scoped_lock<std::mutex> lock(workspace_meta_mutex_); | ||
|
|
||
| // Check for an existing (live) workspace for this program. | ||
| auto match = model_workspaces_.find(program_id); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking out loud, worst case if we have collision in the program_id we will end up collobering memory during inference since two runtimes will go update the workspace.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I don't love this, but it should be safe in all cases, at least. Either you end up using more memory than needed or it enforces extra synchronization. I'd like to push more on the method/program ID.
|
|
||
| for (auto i = 0; i < modes.size(); ++i) { | ||
| for (auto j = i + 1; j < modes.size(); ++j) { | ||
| run_and_validate_two_models(modes[i], modes[j]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
Summary: Refactor the XNN backend workspace sharing logic to allow runtime gating using the backend option interface. Test Plan: CI Rollback Plan: Differential Revision: D76789804 Pulled By: GregoryComer
c73db0e to
629e084
Compare
|
@GregoryComer has imported this pull request. If you are a Meta employee, you can view this in D76789804. |
629e084 to
40cbb40
Compare
Summary: Add a backend option for XNNPACK to enable runtime control of workspace sharing. I've added 3 mode options - Disabled, PerModel, and Global. PerModel shares the workspace between all CALL_DELEGATE instances in a model, keyed by memory allocator address (see below). Global uses a single workspace instance. I've written the code to allow for the active workspace mode to be safely changed at any time. The workspace instance is resolved at delegate instance init time (model load) and is stored in the XNNExecutor instance. This design will also allow us to set per-model sharing options in the future. I've introduced a wrapper class (XNNWorkspace) to help with synchronization. With regard to the PerModel behavior, I am using the address of the runtime allocator to disambiguate the model. This is not ideal in the long-run, but there is some larger discussion around generating IDs in a coherent manner in multithreaded environments without synchronization in the core runtime. This might require PAL changes (exposing a thread ID, for example), so I intend to come back to this. It should be possible to transparently update this logic in the future. The program ID can collide or change without affecting correctness, but may increase memory (for collisions) or enforce extra synchronization (if unstable between delegate instances in a method). I'd like to add a PerMethod mode as a follow-up. This should be keyed to the specific method instance (not name), such that multiple method instances for the same method can be loaded for execution on different threads without forcing synchronization, but still allow sharing between call delegate instances in each method instance. This will require a unique method identifier. Test Plan: CI. I've also added a set of dedicated tests for getting/setting the option, running PTEs in each mode, switching modes at runtime, and I've also updated the multithreaded stress test to run in each mode. Rollback Plan: Differential Revision: D76789804 Pulled By: GregoryComer
|
This pull request was exported from Phabricator. Differential Revision: D76789804 |
Summary: Add a backend option for XNNPACK to enable runtime control of workspace sharing. I've added 3 mode options - Disabled, PerModel, and Global. PerModel shares the workspace between all CALL_DELEGATE instances in a model, keyed by memory allocator address (see below). Global uses a single workspace instance. I've written the code to allow for the active workspace mode to be safely changed at any time. The workspace instance is resolved at delegate instance init time (model load) and is stored in the XNNExecutor instance. This design will also allow us to set per-model sharing options in the future. I've introduced a wrapper class (XNNWorkspace) to help with synchronization. With regard to the PerModel behavior, I am using the address of the runtime allocator to disambiguate the model. This is not ideal in the long-run, but there is some larger discussion around generating IDs in a coherent manner in multithreaded environments without synchronization in the core runtime. This might require PAL changes (exposing a thread ID, for example), so I intend to come back to this. It should be possible to transparently update this logic in the future. The program ID can collide or change without affecting correctness, but may increase memory (for collisions) or enforce extra synchronization (if unstable between delegate instances in a method). I'd like to add a PerMethod mode as a follow-up. This should be keyed to the specific method instance (not name), such that multiple method instances for the same method can be loaded for execution on different threads without forcing synchronization, but still allow sharing between call delegate instances in each method instance. This will require a unique method identifier. Test Plan: CI. I've also added a set of dedicated tests for getting/setting the option, running PTEs in each mode, switching modes at runtime, and I've also updated the multithreaded stress test to run in each mode. Rollback Plan: Differential Revision: D76789804 Pulled By: GregoryComer
40cbb40 to
9dc1e66
Compare
|
This pull request was exported from Phabricator. Differential Revision: D76789804 |
| // A global workspace for all delegate instances, if global sharing is | ||
| // enabled. Lazy initialized. Stored as a weak pointer to allow automatic | ||
| // cleanup when all references are released. | ||
| mutable std::weak_ptr<XNNWorkspace> global_workspace_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: can this be in model_workspaces_ with program_id<uintptr_t> == 0 or something, this will simplify get_or_create methods, and also other logic which treats global workspace different when its is just a shared program_id in reality.
| set_and_check_workspace_sharing_mode(*mode1); | ||
| } | ||
|
|
||
| Module mod1(std::getenv("ET_XNNPACK_GENERATED_ADD_LARGE_PTE_PATH")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit mod/mode/model/module :P
| Module mod1(std::getenv("ET_XNNPACK_GENERATED_ADD_LARGE_PTE_PATH")); | |
| Module module_for_pte_1(std::getenv("ET_XNNPACK_GENERATED_ADD_LARGE_PTE_PATH")); |
|
|
||
| auto program_id = | ||
| reinterpret_cast<uintptr_t>(context.get_runtime_allocator()); | ||
| auto workspace = ET_UNWRAP(get_or_create_workspace(program_id)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so for a given delegate once a workspace is created, there won't be any impact of subsequent mode changes in the process, until it is delegated right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, the workspace is effectively "locked in" at the time of the delegate instance creation. So it should remain in that mode indefinitely. That seemed like the easiest way to handle the global mode option safely, though I'm open to suggestions.
digantdesai
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @GregoryComer
Summary: Add a backend option for XNNPACK to enable runtime control of workspace sharing. I've added 3 mode options - Disabled, PerModel, and Global. PerModel shares the workspace between all CALL_DELEGATE instances in a model, keyed by memory allocator address (see below). Global uses a single workspace instance. I've written the code to allow for the active workspace mode to be safely changed at any time. The workspace instance is resolved at delegate instance init time (model load) and is stored in the XNNExecutor instance. This design will also allow us to set per-model sharing options in the future. I've introduced a wrapper class (XNNWorkspace) to help with synchronization. With regard to the PerModel behavior, I am using the address of the runtime allocator to disambiguate the model. This is not ideal in the long-run, but there is some larger discussion around generating IDs in a coherent manner in multithreaded environments without synchronization in the core runtime. This might require PAL changes (exposing a thread ID, for example), so I intend to come back to this. It should be possible to transparently update this logic in the future. The program ID can collide or change without affecting correctness, but may increase memory (for collisions) or enforce extra synchronization (if unstable between delegate instances in a method). I'd like to add a PerMethod mode as a follow-up. This should be keyed to the specific method instance (not name), such that multiple method instances for the same method can be loaded for execution on different threads without forcing synchronization, but still allow sharing between call delegate instances in each method instance. This will require a unique method identifier. Test Plan: CI. I've also added a set of dedicated tests for getting/setting the option, running PTEs in each mode, switching modes at runtime, and I've also updated the multithreaded stress test to run in each mode. Rollback Plan: Reviewed By: digantdesai Differential Revision: D76789804 Pulled By: GregoryComer
9dc1e66 to
f7686a6
Compare
|
This pull request was exported from Phabricator. Differential Revision: D76789804 |
…1748) Summary: **Note: This is a re-land, fixing a use after free which occurred when destroying a delegate instance. The executor is destroyed, which frees the workspace. The mutex that raii_lock points to is owned by the workspace. There is then a use after free when raii_lock goes out of scope. This is fixed by taking an owning reference to the workspace in destroy.** Add a backend option for XNNPACK to enable runtime control of workspace sharing. I've added 3 mode options - Disabled, PerModel, and Global. PerModel shares the workspace between all CALL_DELEGATE instances in a model, keyed by memory allocator address (see below). Global uses a single workspace instance. I've written the code to allow for the active workspace mode to be safely changed at any time. The workspace instance is resolved at delegate instance init time (model load) and is stored in the XNNExecutor instance. This design will also allow us to set per-model sharing options in the future. I've introduced a wrapper class (XNNWorkspace) to help with synchronization. With regard to the PerModel behavior, I am using the address of the runtime allocator to disambiguate the model. This is not ideal in the long-run, but there is some larger discussion around generating IDs in a coherent manner in multithreaded environments without synchronization in the core runtime. This might require PAL changes (exposing a thread ID, for example), so I intend to come back to this. It should be possible to transparently update this logic in the future. The program ID can collide or change without affecting correctness, but may increase memory (for collisions) or enforce extra synchronization (if unstable between delegate instances in a method). I'd like to add a PerMethod mode as a follow-up. This should be keyed to the specific method instance (not name), such that multiple method instances for the same method can be loaded for execution on different threads without forcing synchronization, but still allow sharing between call delegate instances in each method instance. This will require a unique method identifier. Test Plan: CI. I've also added a set of dedicated tests for getting/setting the option, running PTEs in each mode, switching modes at runtime, and I've also updated the multithreaded stress test to run in each mode. Rollback Plan: Differential Revision: D81646781 Pulled By: GregoryComer
Summary
Add a backend option for XNNPACK to enable runtime control of workspace sharing. I've added 3 mode options - Disabled, PerModel, and Global. PerModel shares the workspace between all CALL_DELEGATE instances in a model, keyed by memory allocator address (see below). Global uses a single workspace instance.
I've written the code to allow for the active workspace mode to be safely changed at any time. The workspace instance is resolved at delegate instance init time (model load) and is stored in the XNNExecutor instance. This design will also allow us to set per-model sharing options in the future. I've introduced a wrapper class (XNNWorkspace) to help with synchronization.
With regard to the PerModel behavior, I am using the address of the runtime allocator to disambiguate the model. This is not ideal in the long-run, but there is some larger discussion around generating IDs in a coherent manner in multithreaded environments without synchronization in the core runtime. This might require PAL changes (exposing a thread ID, for example), so I intend to come back to this.
It should be possible to transparently update this logic in the future. The program ID can collide or change without affecting correctness, but may increase memory (for collisions) or enforce extra synchronization (if unstable between delegate instances in a method).
I'd like to add a PerMethod mode as a follow-up. This should be keyed to the specific method instance (not name), such that multiple method instances for the same method can be loaded for execution on different threads without forcing synchronization, but still allow sharing between call delegate instances in each method instance. This will require a unique method identifier.
Test plan
CI. I've also added a set of dedicated tests for getting/setting the option, running PTEs in each mode, switching modes at runtime, and I've also updated the multithreaded stress test to run in each mode.