IRON host runtime abstraction #2737

hunhoffe · 2025-11-25T02:59:28Z

Summary

This PR begins the process of consolidating runtime support code into one implementation that supports tracing, JIT, programming examples, and tests.

Apologies that the PR is so large, but many components of the repo were good candidates to use shared runtime library functions!

Details

The primary goals of this work are to:

Deduplicate logic between the JIT runtime code (e.g., the NPUKernel class in IRON), the XRT helper code (e.g., the AIE_Application class in the aie.utils.xrt module) and the amd/IRON runtime code (e..g, the AIEDeviceManager class in https://github.com/amd/IRON/blob/devel/applications/llama_3.2_1b/src/aie_device_manager.py). This will help maintainability and ensure improvements made for efficiency (e.g., in pre-loading, buffer handling, etc.) are consistently used.
Continue abstracting specifics of XRT away from the conceptual use of a runtime. This is a follow on to a previous PR or two on iron.Tensors.
Bring tracing capabilities into JIT
Caching for insts buffer objects and XRT contexts. This is not fully optimized.

Other miscellaneous things included in this PR:

Rename some pytest files in test/python to follow the traditional test_*.py naming convention. Make sure lit test files are not named with this convention. This allows one to run pytest test/python and successfully run all the pytests in that directory. This runs locally much faster than trying to use lit.
Some small steps towards consolidating information on tracing into a TraceConfig object. Hopefully this will complement and not conflict with [proposal][wip] Add event trace configuration to mlir-aie #2705
Generally, take some small steps towards having python/utils be a separate package (including runtime support) with separate dependencies from python/iron. This process is not complete, but started.
Truncate some output to only show 100 errors, to avoid overloading the CI runners with output
Reduce the lit test concurrency by introducing a new concurrency group for tests that run on the npu using xrt in test/npu-xrt and test/python/npu-xrt.
If xrt is not present, you can still generate MLIR. A nice warning is printed to stderr, addressing: Python bindings silently fail if XRT is not found. #2606

What this PR does NOT do:

The XRT runtime is not particularly optimized. Correctness was emphasized first in this PR. Later work towards optimization should analyze the syncing behavior of the runtime + tensors to ensure no extra overheads are incurred.
Make changes to compilation, configuration, etc. But, hopefully, with one runtime entrypoint, these changes will be easier to make in future PRs. Similar to refactoring for making a more cohesive tracing API.
Fully integrate tracing utilities into a library format with transparent support for difference architectures.
Add a good test suite for tracing utilities.

fifield · 2025-11-25T14:02:11Z

This is great. I'm wondering about the layering. Could non-IRON mlir-aie users like mlir-air use this without importing all of iron?

python/utils/xrtruntime/hostruntime.py

python/iron/hostruntime/xrtruntime/hostruntime.py

hunhoffe · 2025-11-25T16:25:29Z

@fifield good question. I think it's tied to IRON unless we moved some of this logic to pyxrt. However, installing the mlir-aie wheels is maybe not too bad anymore?

hunhoffe · 2026-01-08T19:43:43Z

@ypapadop-amd Thanks for the review! My plan had been to do a separate PR for documentation since docs were not built at all in utils, but there's no time like the present, so I went ahead and fixed up and added doc strings and added the hostruntime-related files in python/utils to the generated API documentation.

ypapadop-amd · 2026-01-08T20:13:55Z

programming_examples/basic/vector_scalar_mul/test.py

@@ -7,8 +7,9 @@
 # (c) Copyright 2024 Advanced Micro Devices, Inc. or its affiliates


Copyright needs to be updated to a range in those.

ypapadop-amd · 2026-01-08T20:19:01Z

I saw that the set_current_device test was removed. Is the function removed too? For some kernels, that's the only way to set the device.

hunhoffe · 2026-01-08T20:23:37Z

@ypapadop-amd my intent is that:

For compilation, a device must be given (e.g., in Program). This is important for cross-compilation.
If you are using JIT, then you need a runtime anyways. You can influence the device by choosing a default runtime that matches your system. Cross-device JIT doesn't really make sense (at least in my mind).

The full realization of this plan would only happen after further improvements to the JIT/compilation code, which this PR does not do. So my question is:

Does this intermediate step break your current setup in a way you can't work with it?
Does the long-term plan make sense?

ypapadop-amd · 2026-01-08T20:45:01Z

@ypapadop-amd my intent is that:

For compilation, a device must be given (e.g., in Program). This is important for cross-compilation.

Is now the default to pass the device type in the kernel arguments? Would it work for example with

mlir-aie/programming_examples/basic/vector_vector_add/vector_vector_add_placed.py

Line 51 in 5b52c47

@device(iron.get_current_device())

?

The full realization of this plan would only happen after further improvements to the JIT/compilation code, which this PR does not do. So my question is:

Does this intermediate step break your current setup in a way you can't work with it?

Probably, I use set_current_device / get_current_device to pass that information.

Does the long-term plan make sense?

I am not sure what the long-term plan is. I thought that for JIT there's always an implicit device (makes sense) and an aversion to passing it as an argument (thus the set_current_device / get_current_device). Are we going to separate the JIT kernel entrypoint from the implementation? I.e., something like:

def kernel_impl(device):
  ...
  return Program(device, rt).resolve_program(SequentialPlacer())

@iron.jit
def kernel(...)
   return kernel_impl(get_current_device())

hunhoffe · 2026-01-08T20:51:46Z

Yes, the eventual goal is to separate compilation from the JIT-running of the module. So even something that was jit decorated could be compiled with a custom set device.

For cross compilation, there SHOULD be an argument which is the device type. So I would not say I have an aversion to passing the device as arguments.

With the current PR, you should be able to customize the behavior of get_current_device()by creating a default runtime with return the current device, e.g.,

iron.DefaultNPURuntime = <some custom python class the returns your device of choice>
iron.get_current_device() # this should now return your device of choice

This is admittedly clunky but this PR is already huge, and I won't want to step more into making changes to JIT in it than I need to.

ypapadop-amd · 2026-01-08T20:54:47Z

I think that's undesirable for user code. Can we have a dummy runtime checked in as part of this PR that does that?

hunhoffe · 2026-01-08T20:57:53Z

It is undesirable. If you want to create a dummy host runtime or other host runtime code to meet additional needs beyond what is used/tested in this repo, I think that would be an excellent follow on PR for you to make :)

jgmelber · 2026-01-08T21:38:18Z

@ypapadop-amd I think the changes requested originally are satisfied, can you remove the request changes please?

hunhoffe · 2026-01-09T00:31:48Z

@ypapadop-amd I re-added set_current_device. Let me know if you have any problems with the changes after this PR is merged in!

hunhoffe changed the title ~~[WIP] IRON library runtime object~~ [WIP] IRON host runtime abstraction Nov 25, 2025

ypapadop-amd reviewed Nov 25, 2025

View reviewed changes

python/utils/xrtruntime/hostruntime.py Outdated Show resolved Hide resolved

ypapadop-amd reviewed Nov 25, 2025

View reviewed changes

python/iron/hostruntime/xrtruntime/hostruntime.py Outdated Show resolved Hide resolved

hunhoffe added 25 commits November 26, 2025 16:58

update readme

d422622

remove references to old util folder

5256eda

Small fixes with imports

2f86372

Small changes

ebdf631

small fixes with group id for insts

fc6a0c2

fix typo in last commit

7ca3a42

first attempt at a runtime

c7c31e8

Start to stub out runtime class

0dcf518

add import

88ba934

more progress towards a runtime

5d93b72

try to make the xrt hostruntime handle insts more appropriately

c84fbfd

handle insts a little better

7422259

still not working, some small progress though

acd6456

remove buggy test for now

fdecb02

manage the instructions cache within the host runtime class

b027436

move read insts and clean up cache

cf137dd

fix bad import

0146dc9

fix default kernel name

1d300fa

remove unneeded todos

8327d0e

fix bad function call

494692c

debugging, small progress

02678df

Small fix, still debugging

5153605

still debugging

68d8875

revert some changes for easier debugging

62d1175

simplify jit example device handling

00c5461

more updates to documentation

dda8594

hunhoffe requested a review from ypapadop-amd January 8, 2026 19:44

hunhoffe added 3 commits January 8, 2026 12:58

more small fixes

fb8f442

Add another test

c11665c

add another flag to better control the cached runtime behavior

24c6997

ypapadop-amd reviewed Jan 8, 2026

View reviewed changes

update copyright

83b81c1

Merge branch 'main' into iron_runtime

9908eb7

ypapadop-amd approved these changes Jan 8, 2026

View reviewed changes

hunhoffe added 2 commits January 8, 2026 17:23

override set device

42315b6

Add some sanity checks

f1ae207

hunhoffe enabled auto-merge January 9, 2026 00:31

hunhoffe added this pull request to the merge queue Jan 9, 2026

Merged via the queue into main with commit 9c3a745 Jan 9, 2026
60 checks passed

hunhoffe deleted the iron_runtime branch January 9, 2026 01:26

This was referenced Jan 12, 2026

Python bindings silently fail if XRT is not found. #2606

Closed

Fix tensor size mismatch #2809

Merged

Try to update GitHub docs #2813

Merged

jgmelber mentioned this pull request Jan 26, 2026

Importing from aie.util triggers "Failed to import PyXRT: No module named 'pyxrt', proceeding without runtime libraries." although XRT is neither expected nor needed #2833

Open

hunhoffe mentioned this pull request Jan 27, 2026

Refine JIT Runtime/Compilation/Build Capabilities #2721

Open

24 tasks

		@@ -7,8 +7,9 @@
		# (c) Copyright 2024 Advanced Micro Devices, Inc. or its affiliates

IRON host runtime abstraction #2737

IRON host runtime abstraction #2737

Uh oh!

Conversation

hunhoffe commented Nov 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Details

The primary goals of this work are to:

Other miscellaneous things included in this PR:

What this PR does NOT do:

Uh oh!

fifield commented Nov 25, 2025

Uh oh!

Uh oh!

Uh oh!

hunhoffe commented Nov 25, 2025

Uh oh!

hunhoffe commented Jan 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ypapadop-amd Jan 8, 2026

Choose a reason for hiding this comment

Uh oh!

ypapadop-amd commented Jan 8, 2026

Uh oh!

hunhoffe commented Jan 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ypapadop-amd commented Jan 8, 2026

Uh oh!

hunhoffe commented Jan 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ypapadop-amd commented Jan 8, 2026

Uh oh!

hunhoffe commented Jan 8, 2026

Uh oh!

jgmelber commented Jan 8, 2026

Uh oh!

hunhoffe commented Jan 9, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

hunhoffe commented Nov 25, 2025 •

edited

Loading

hunhoffe commented Jan 8, 2026 •

edited

Loading

hunhoffe commented Jan 8, 2026 •

edited

Loading

hunhoffe commented Jan 8, 2026 •

edited

Loading