Skip to content

Conversation

larryliu0820
Copy link
Contributor

Summary

[PLEASE REMOVE] See CONTRIBUTING.md's Pull Requests for ExecuTorch PR guidelines.

[PLEASE REMOVE] If this PR closes an issue, please add a Fixes #<issue-id> line.

[PLEASE REMOVE] If this PR introduces a fix or feature that should be the upcoming release notes, please add a "Release notes: " label. For a list of available release notes labels, check out CONTRIBUTING.md's Pull Requests.

Test plan

[PLEASE REMOVE] How did you test this PR? Please write down any manual commands you used and note down tests that you have written if applicable.

Copy link

pytorch-bot bot commented Aug 5, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/13123

Note: Links to docs will display an error until the docs builds have been completed.

❌ 221 New Failures, 6 Cancelled Jobs

As of commit f93d194 with merge base 2640a86 (image):

NEW FAILURES - The following jobs have failed:

CANCELLED JOBS - The following jobs were cancelled. Please retry:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Aug 5, 2025
@larryliu0820 larryliu0820 changed the title Add skeleton code Add AOTI backend skeleton code Aug 5, 2025
Copy link

github-actions bot commented Aug 5, 2025

This PR needs a release notes: label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

@larryliu0820 larryliu0820 marked this pull request as draft August 5, 2025 05:55
),
)
],
example_inputs=owning_program.example_inputs,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a hack. We can't make sure the partitioned graph module has the same input as the original graph module.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds like a good thing to reflect in a code comment?

AOTInductorModelContainerGetNumOutputsFunc
AOTInductorModelContainerGetNumOutputs = nullptr;
AOTInductorModelContainerRunFunc AOTInductorModelContainerRun = nullptr;
std::unordered_map<Tensor*, std::vector<int64_t>> tensor_to_sizes;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Im assuming this is a temporary hack. I dont think we can reasonably leave these in global state like this. You probably have to do AOTITensorHandle = extension::Tensor*

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You probably have to do AOTITensorHandle = extension::Tensor*

The only thing stopping us from doing so is the different size types on ATen tensor and ET tensor (int64 vs int32)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since ET desktop also has an interest in the shim impl can that live outside the aoti backend


if(EXECUTORCH_BUILD_CORTEX_M)
add_subdirectory(${CMAKE_CURRENT_SOURCE_DIR}/backends/cortex_m)
list(APPEND _executorch_backends coretex_m_backend)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this target doesn't seem to exist (even if I assume it's a typo for "cortex_m_backend").


# include(${EXECUTORCH_ROOT}/build/Utils.cmake)

find_package(CUDAToolkit REQUIRED)
Copy link
Contributor

@swolchok swolchok Aug 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

presumably we need separate configuration knobs for CUDA and Metal (not blocking)

Comment on lines 28 to 29
set(_aoti_sources runtime/AotiBackend.cpp)
add_library(aoti_backend STATIC ${_aoti_sources})
Copy link
Contributor

@swolchok swolchok Aug 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: just inline _aoti_sources, no need to make a variable

$<BUILD_INTERFACE:${EXECUTORCH_ROOT}>
$<INSTALL_INTERFACE:include>
)
target_compile_options(aoti_backend PUBLIC -fexceptions -frtti -fPIC)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-fexceptions and -frtti are default-on, no need to specify. -fPIC should be gated by CMAKE_POSITION_INDEPENDENT_CODE; why mess with it?

)
target_compile_options(aoti_backend PUBLIC -fexceptions -frtti -fPIC)
# Ensure symbols are exported properly
target_link_options(aoti_backend PUBLIC -Wl,--export-dynamic)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. this is a linux-ism
  2. this is for building shared libraries, but you've configured aoti_backend as a static library. Are you sure we need this?
  3. if aoti_backend was a shared library, I think this would be passed by default

AOTIDelegateHandle* handle = (AOTIDelegateHandle*)handle_;

size_t num_inputs;
AOTInductorModelContainerGetNumInputs(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

must check all error codes

}

void destroy(DelegateHandle* handle_) const override {
AOTIDelegateHandle* handle = (AOTIDelegateHandle*)handle_;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can handle_ be null?

Comment on lines 546 to 547
tensor_to_sizes.clear();
tensor_to_strides.clear();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is incorrect in the presence of other threads that might have loaded their own DSOs; probably need to make these thread_local

-DEXECUTORCH_BUILD_EXTENSION_TENSOR=ON \
..
cd ..
cmake --build cmake-out -j9
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not let cmake choose parallelism appropriately


# The pip repository that hosts nightly torch packages.
TORCH_NIGHTLY_URL = "https://download.pytorch.org/whl/nightly/cpu"
TORCH_NIGHTLY_URL = "https://download.pytorch.org/whl/nightly/cu126"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any particular reason not to use a more recent version?

),
)
],
example_inputs=owning_program.example_inputs,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds like a good thing to reflect in a code comment?

@@ -0,0 +1,76 @@
# Copyright (c) Meta Platforms, Inc. and affiliates.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am wondering if we could split the partitioner (or, more generally, any other part of this PR) to help the review/land/iterate process go faster

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants