Skip to content

Conversation

@hunhoffe
Copy link
Collaborator

@hunhoffe hunhoffe commented Nov 25, 2025

Summary

This PR begins the process of consolidating runtime support code into one implementation that supports tracing, JIT, programming examples, and tests.

Apologies that the PR is so large, but many components of the repo were good candidates to use shared runtime library functions!

Details

The primary goals of this work are to:

  1. Deduplicate logic between the JIT runtime code (e.g., the NPUKernel class in IRON), the XRT helper code (e.g., the AIE_Application class in the aie.utils.xrt module) and the amd/IRON runtime code (e..g, the AIEDeviceManager class in https://github.com/amd/IRON/blob/devel/applications/llama_3.2_1b/src/aie_device_manager.py). This will help maintainability and ensure improvements made for efficiency (e.g., in pre-loading, buffer handling, etc.) are consistently used.
  2. Continue abstracting specifics of XRT away from the conceptual use of a runtime. This is a follow on to a previous PR or two on iron.Tensors.
  3. Bring tracing capabilities into JIT
  4. Caching for insts buffer objects and XRT contexts. This is not fully optimized.

Other miscellaneous things included in this PR:

  • Rename some pytest files in test/python to follow the traditional test_*.py naming convention. Make sure lit test files are not named with this convention. This allows one to run pytest test/python and successfully run all the pytests in that directory. This runs locally much faster than trying to use lit.
  • Some small steps towards consolidating information on tracing into a TraceConfig object. Hopefully this will complement and not conflict with [proposal][wip] Add event trace configuration to mlir-aie #2705
  • Generally, take some small steps towards having python/utils be a separate package (including runtime support) with separate dependencies from python/iron. This process is not complete, but started.
  • Truncate some output to only show 100 errors, to avoid overloading the CI runners with output
  • Reduce the lit test concurrency by introducing a new concurrency group for tests that run on the npu using xrt in test/npu-xrt and test/python/npu-xrt.
  • If xrt is not present, you can still generate MLIR. A nice warning is printed to stderr, addressing: Python bindings silently fail if XRT is not found. #2606

What this PR does NOT do:

  • The XRT runtime is not particularly optimized. Correctness was emphasized first in this PR. Later work towards optimization should analyze the syncing behavior of the runtime + tensors to ensure no extra overheads are incurred.
  • Make changes to compilation, configuration, etc. But, hopefully, with one runtime entrypoint, these changes will be easier to make in future PRs. Similar to refactoring for making a more cohesive tracing API.
  • Fully integrate tracing utilities into a library format with transparent support for difference architectures.
  • Add a good test suite for tracing utilities.

@hunhoffe hunhoffe changed the title [WIP] IRON library runtime object [WIP] IRON host runtime abstraction Nov 25, 2025
@fifield
Copy link
Collaborator

fifield commented Nov 25, 2025

This is great. I'm wondering about the layering. Could non-IRON mlir-aie users like mlir-air use this without importing all of iron?

@hunhoffe
Copy link
Collaborator Author

@fifield good question. I think it's tied to IRON unless we moved some of this logic to pyxrt. However, installing the mlir-aie wheels is maybe not too bad anymore?

@hunhoffe
Copy link
Collaborator Author

hunhoffe commented Jan 8, 2026

@ypapadop-amd Thanks for the review! My plan had been to do a separate PR for documentation since docs were not built at all in utils, but there's no time like the present, so I went ahead and fixed up and added doc strings and added the hostruntime-related files in python/utils to the generated API documentation.

@hunhoffe hunhoffe requested a review from ypapadop-amd January 8, 2026 19:44
@@ -7,8 +7,9 @@
# (c) Copyright 2024 Advanced Micro Devices, Inc. or its affiliates
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copyright needs to be updated to a range in those.

@ypapadop-amd
Copy link
Collaborator

I saw that the set_current_device test was removed. Is the function removed too? For some kernels, that's the only way to set the device.

@hunhoffe
Copy link
Collaborator Author

hunhoffe commented Jan 8, 2026

@ypapadop-amd my intent is that:

  • For compilation, a device must be given (e.g., in Program). This is important for cross-compilation.
  • If you are using JIT, then you need a runtime anyways. You can influence the device by choosing a default runtime that matches your system. Cross-device JIT doesn't really make sense (at least in my mind).

The full realization of this plan would only happen after further improvements to the JIT/compilation code, which this PR does not do. So my question is:

  • Does this intermediate step break your current setup in a way you can't work with it?
  • Does the long-term plan make sense?

@ypapadop-amd
Copy link
Collaborator

@ypapadop-amd my intent is that:

  • For compilation, a device must be given (e.g., in Program). This is important for cross-compilation.

Is now the default to pass the device type in the kernel arguments? Would it work for example with

?

The full realization of this plan would only happen after further improvements to the JIT/compilation code, which this PR does not do. So my question is:

  • Does this intermediate step break your current setup in a way you can't work with it?

Probably, I use set_current_device / get_current_device to pass that information.

  • Does the long-term plan make sense?

I am not sure what the long-term plan is. I thought that for JIT there's always an implicit device (makes sense) and an aversion to passing it as an argument (thus the set_current_device / get_current_device). Are we going to separate the JIT kernel entrypoint from the implementation? I.e., something like:

def kernel_impl(device):
  ...
  return Program(device, rt).resolve_program(SequentialPlacer())

@iron.jit
def kernel(...)
   return kernel_impl(get_current_device())

@hunhoffe
Copy link
Collaborator Author

hunhoffe commented Jan 8, 2026

Yes, the eventual goal is to separate compilation from the JIT-running of the module. So even something that was jit decorated could be compiled with a custom set device.

For cross compilation, there SHOULD be an argument which is the device type. So I would not say I have an aversion to passing the device as arguments.

With the current PR, you should be able to customize the behavior of get_current_device()by creating a default runtime with return the current device, e.g.,

iron.DefaultNPURuntime = <some custom python class the returns your device of choice>
iron.get_current_device() # this should now return your device of choice

This is admittedly clunky but this PR is already huge, and I won't want to step more into making changes to JIT in it than I need to.

@ypapadop-amd
Copy link
Collaborator

I think that's undesirable for user code. Can we have a dummy runtime checked in as part of this PR that does that?

@hunhoffe
Copy link
Collaborator Author

hunhoffe commented Jan 8, 2026

It is undesirable. If you want to create a dummy host runtime or other host runtime code to meet additional needs beyond what is used/tested in this repo, I think that would be an excellent follow on PR for you to make :)

@jgmelber
Copy link
Collaborator

jgmelber commented Jan 8, 2026

@ypapadop-amd I think the changes requested originally are satisfied, can you remove the request changes please?

@hunhoffe
Copy link
Collaborator Author

hunhoffe commented Jan 9, 2026

@ypapadop-amd I re-added set_current_device. Let me know if you have any problems with the changes after this PR is merged in!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants