Skip to content

Commit 28b27b9

Browse files
andyanwangfacebook-github-bot
authored andcommitted
Support native Inductor as backend for MTIA
Summary: X-link: pytorch/pytorch#159211 The previous [diff/PR] (pytorch/pytorch#158526) was reverted due to this docstring lint error: {F1980698052} I didn't add the docstring cause I thought I'm not supposed to add docstring for an EXISTING function. So this diff/PR is an exactly copy of the previous one, except for adding the docstring. ------------- This diff/PR includes the changes to support native Inductor integration for MTIA. The goal is to support `torch.compile(backend="inductor")` for MTIA. Inductor should generate code(triton kernel + python wrapper code) similar to CUDA. And the triton kernels can be launched eagerly. The changes include: - Add MTIA device interfaces used by Dynamo and Inductor, including APIs on device, stream, event, etc. - Add required torch.mtia APIs, like is_bf16_supported, memory_allocated, set_stream_by_id, etc. - MTIA specific codegen logic, for example, loading MTIA dynamic_library. - Other necessary changes to integrate with Inductor codegn, following other devices like CUDA, XPU. - Integrate with the [empty_strided_mtia](https://www.internalfb.com/code/fbsource/[0d017d3a4a1bdff7253f9c66a9f38e77bd62166b]/fbcode/caffe2/aten/src/ATen/native/mtia/EmptyTensor.cpp?lines=49%2C63%2C71%2C74%2C78) API that we’ve added for the new MTIA ATen backend. - A change in Inductor runtime to avoid re-initialize MTIADriver. - BUCK changes to include ATen-mtia in Inductor, and to use -USE_MTIA preprocessor flag. - Update `test_mnist_e2e.py` to cover native Inductor as backend, using the `--use_native_inductor` flag. - Add a personal script(`scripts/anwang/run_native_inductor_script.py`) for testing purpose. Note: - This approach(option 3) aims to provide a pytorch native approach of Inductor integration for MTIA, minimizing the onboarding overhead. The downside of this approach is that it doesn't leverage MTIA specific graph optimization, and is limited to eagerly launch overhead. - MTIA will support another approach(option 2) to provide best performance, based on WrapperFxCodegen. We should be able to reuse the fundamental changes of this diff for option 2, like the device interfaces, steam/event APIs, etc, especially as WrapperFxCodegen inherits PythonWrapperCodegen. Reviewed By: blaine-rister, eellison Differential Revision: D79040806 fbshipit-source-id: 35c332417b03c593034d2478cf0ef5ef63e3213d
1 parent b377227 commit 28b27b9

File tree

1 file changed

+1
-1
lines changed
  • userbenchmark/dynamo/dynamobench/_dynamo

1 file changed

+1
-1
lines changed

userbenchmark/dynamo/dynamobench/_dynamo/utils.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3965,7 +3965,7 @@ def is_compile_supported(device_type):
39653965
compile_supported = is_dynamo_supported()
39663966
if type == "cpu":
39673967
pass
3968-
elif type in ["cuda", "xpu"] and compile_supported:
3968+
elif type in ["cuda", "xpu", "mtia"] and compile_supported:
39693969
compile_supported = has_triton()
39703970
else:
39713971
compile_supported = False

0 commit comments

Comments
 (0)