-
Notifications
You must be signed in to change notification settings - Fork 688
Add AOTI backend skeleton code #13123
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/13123
Note: Links to docs will display an error until the docs builds have been completed. ❌ 221 New Failures, 6 Cancelled JobsAs of commit f93d194 with merge base 2640a86 ( NEW FAILURES - The following jobs have failed:
CANCELLED JOBS - The following jobs were cancelled. Please retry:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
This PR needs a
|
exir/lowered_backend_module.py
Outdated
), | ||
) | ||
], | ||
example_inputs=owning_program.example_inputs, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a hack. We can't make sure the partitioned graph module has the same input as the original graph module.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sounds like a good thing to reflect in a code comment?
26fccab
to
e14146c
Compare
AOTInductorModelContainerGetNumOutputsFunc | ||
AOTInductorModelContainerGetNumOutputs = nullptr; | ||
AOTInductorModelContainerRunFunc AOTInductorModelContainerRun = nullptr; | ||
std::unordered_map<Tensor*, std::vector<int64_t>> tensor_to_sizes; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Im assuming this is a temporary hack. I dont think we can reasonably leave these in global state like this. You probably have to do AOTITensorHandle = extension::Tensor*
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You probably have to do AOTITensorHandle = extension::Tensor*
The only thing stopping us from doing so is the different size types on ATen tensor and ET tensor (int64 vs int32)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since ET desktop also has an interest in the shim impl can that live outside the aoti backend
|
||
if(EXECUTORCH_BUILD_CORTEX_M) | ||
add_subdirectory(${CMAKE_CURRENT_SOURCE_DIR}/backends/cortex_m) | ||
list(APPEND _executorch_backends coretex_m_backend) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this target doesn't seem to exist (even if I assume it's a typo for "cortex_m_backend").
backends/aoti/CMakeLists.txt
Outdated
|
||
# include(${EXECUTORCH_ROOT}/build/Utils.cmake) | ||
|
||
find_package(CUDAToolkit REQUIRED) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
presumably we need separate configuration knobs for CUDA and Metal (not blocking)
backends/aoti/CMakeLists.txt
Outdated
set(_aoti_sources runtime/AotiBackend.cpp) | ||
add_library(aoti_backend STATIC ${_aoti_sources}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: just inline _aoti_sources, no need to make a variable
backends/aoti/CMakeLists.txt
Outdated
$<BUILD_INTERFACE:${EXECUTORCH_ROOT}> | ||
$<INSTALL_INTERFACE:include> | ||
) | ||
target_compile_options(aoti_backend PUBLIC -fexceptions -frtti -fPIC) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-fexceptions and -frtti are default-on, no need to specify. -fPIC should be gated by CMAKE_POSITION_INDEPENDENT_CODE; why mess with it?
backends/aoti/CMakeLists.txt
Outdated
) | ||
target_compile_options(aoti_backend PUBLIC -fexceptions -frtti -fPIC) | ||
# Ensure symbols are exported properly | ||
target_link_options(aoti_backend PUBLIC -Wl,--export-dynamic) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- this is a linux-ism
- this is for building shared libraries, but you've configured aoti_backend as a static library. Are you sure we need this?
- if aoti_backend was a shared library, I think this would be passed by default
AOTIDelegateHandle* handle = (AOTIDelegateHandle*)handle_; | ||
|
||
size_t num_inputs; | ||
AOTInductorModelContainerGetNumInputs( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
must check all error codes
} | ||
|
||
void destroy(DelegateHandle* handle_) const override { | ||
AOTIDelegateHandle* handle = (AOTIDelegateHandle*)handle_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can handle_ be null?
tensor_to_sizes.clear(); | ||
tensor_to_strides.clear(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is incorrect in the presence of other threads that might have loaded their own DSOs; probably need to make these thread_local
export_and_run_aoti.sh
Outdated
-DEXECUTORCH_BUILD_EXTENSION_TENSOR=ON \ | ||
.. | ||
cd .. | ||
cmake --build cmake-out -j9 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not let cmake choose parallelism appropriately
install_requirements.py
Outdated
|
||
# The pip repository that hosts nightly torch packages. | ||
TORCH_NIGHTLY_URL = "https://download.pytorch.org/whl/nightly/cpu" | ||
TORCH_NIGHTLY_URL = "https://download.pytorch.org/whl/nightly/cu126" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
any particular reason not to use a more recent version?
exir/lowered_backend_module.py
Outdated
), | ||
) | ||
], | ||
example_inputs=owning_program.example_inputs, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sounds like a good thing to reflect in a code comment?
backends/aoti/aoti_partitioner.py
Outdated
@@ -0,0 +1,76 @@ | |||
# Copyright (c) Meta Platforms, Inc. and affiliates. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am wondering if we could split the partitioner (or, more generally, any other part of this PR) to help the review/land/iterate process go faster
4732087
to
7542cae
Compare
Summary
[PLEASE REMOVE] See CONTRIBUTING.md's Pull Requests for ExecuTorch PR guidelines.
[PLEASE REMOVE] If this PR closes an issue, please add a
Fixes #<issue-id>
line.[PLEASE REMOVE] If this PR introduces a fix or feature that should be the upcoming release notes, please add a "Release notes: " label. For a list of available release notes labels, check out CONTRIBUTING.md's Pull Requests.
Test plan
[PLEASE REMOVE] How did you test this PR? Please write down any manual commands you used and note down tests that you have written if applicable.