Skip to content

Windows: Initial Bring-Up#2861

Open
thomthehound wants to merge 14 commits intoXilinx:mainfrom
thomthehound:windows-bring-up
Open

Windows: Initial Bring-Up#2861
thomthehound wants to merge 14 commits intoXilinx:mainfrom
thomthehound:windows-bring-up

Conversation

@thomthehound
Copy link
Contributor

End-to-End Native Windows Support for IRON/MLIR-AIE

This PR adds a native Windows (non-WSL) path for building and running IRON projects:

  • The environment may now be set up on Windows, using all-Windows assets
  • The mlir-aie wheels may now be built directly on Windows, without WSL
  • GitHub Workflows have been updated to build and distribute Windows-native wheels
  • A (very basic) Makefile parser is included to test against programming examples
  • IRON experiments can now both run and build fully inside of Windows
  • Windows-specific code is, in all cases, gated behind if(WIN32), platform.system() == "Windows", os.name == "nt", etc.
  • All existing Linux/WSL behavior is preserved as closely as possible (minus a couple of specific cases below)

Provenance

Last summer, I began a private project -- which I called "IRONWINDOW" -- to port MLIR-AIE over to Windows. As I was doing it for myself, and for my own purposes ("for fun"), I made no effort to preserve the existing workflows and project layout. I had also assumed there would be little professional interest in a full Windows port. Ultimately, the project stalled as I ran into numerous problems with llvm-aie and MLIR. Several months later, I came across PR #2677 ("[TEST] Build wheels for Windows"), which proved my assumption incorrect (as I should have known, given how well-appointed for Windows the codebase was already). Taking up the project again, I was able to track down the specific code in LLVM-AIE and MLIR that were breaking Windows support and patch them. After that, MLIR-AIE "just worked".

At the time, I had hoped it would be a simple task to then translate my "IRONWINDOW" project back into something that fit within the MLIR-AIE codebase from which it had diverged. That task proved greater than I had anticipated. At times I felt like Ken Mattingly in the simulator, going through iteration after iteration of checklists to get the Apollo 13 command module back to Earth. However, after approximately a month and a half of working on this, I believe I finally have something worth presenting.


Steak and Potatoes

Nobody likes some outsider coming in and messing with their codebase; so I feel obligated to justify every single change I've made, and why. I'll try to keep this brief, as the code should speak for itself.

1) Cross-platform entrypoints for setup + example execution

New: utils/iron_setup.py

  • One-stop IRON environment setup/update/clean across Linux/WSL/Windows
  • Installs the needed wheels and Python requirements
  • Automatically patches the Windows LLVM-AIE wheel
  • Emits shell-appropriate environment blocks (bash, pwsh, cmd)
  • Functionally similar to sourcing quick_setup.sh, env_setup.sh and opt/xilinx/xrt/setup.sh back-to-back

New: utils/run_example.py

  • Cross-platform runner for programming_examples/*, simulating the Makefile in that it:
    • Builds AIE artifacts
    • Builds the host app
    • Packages runtime artifacts (xclbin/insts/etc.)
    • Executes on XRT-backed Ryzen AI NPU
  • Primarily intended as "proof-of-life" on Windows, but it should also run on at least WSL
  • Currently only fewer than ~1/2 of the examples will run (I could probably bring this up over ~2/3, but I do not see the purpose at this time)

These are additive utilities designed to be drop-in usable without breaking existing scripts. If we do start shipping Windows wheels, it is my hope that we unify all environments on something like iron_setup.py, as using a single cross-platform script will make drift less likely going forward. Right now the env emission is somewhat awkward because, obviously, python is not designed to mess with a user's shell environment. However, I have written the file in such a way that it would be trivial to modify it to output platform-specific environment setup scripts to disk.

I make no attempt to hide the fact that I used a programming AI to "vibe code" utils/run_example.py. That file exists only to prove the Windows wheels actually work. Programming it by hand would have been more work than it was worth, given the fact that I expect it to be temporary. I did look it over to manually organize and clean up a few things, but it is not exactly craft-brewed.


2) Windows wheel-building tools

The two additional Python scripts here are intended to allow Windows users to build their own wheels in case the GitHub Workflows don't pan out. I realize that most of what they do could have been accomplished with Git Bash, but the Windows setup is already frustrating enough as-is. Also, what is the point of going Windows-native if we're going to use POSIX as a crutch? Regardless, I needed a place to patch the Windows MLIR wheel as it is downloaded, and I saw no reason to uselessly stage it if it is only needed to build the wheel.

New: utils/mlir_aie_wheels/scripts/download_mlir.py

  • Python replacement for bash-only MLIR distro fetching
  • Parses the existing utils/clone-llvm.sh version source to stay aligned with upstream versioning
  • Automatically patches the MLIR wheel's LLVM CMake exports to remove hard-pathing of diaguids.lib

New: utils/mlir_aie_wheels/scripts/build_local.py

  • Python replacement for local wheel-building helper scripts
  • This is a straight port of build_local.sh that just removes POSIX assumptions

Updated: utils/mlir_aie_wheels/pyproject.toml and python_bindings/pyproject.toml

  • Windows before-build now uses download_mlir.py (no bash required)
  • Keeps Linux/macOS behavior unchanged

Updated: utils/mlir_aie_wheels/setup.py

  • Normalizes paths for CMake (forward slashes) and uses argv-list CMake invocation to avoid quoting/escaping edge cases
  • Keeps behavior identical on Linux/macOS

Updated: utils/mlir_aie_wheels/python_bindings/{pyproject.toml, setup.py, CMakeLists.txt}

  • Fixes the python_bindings wheel so it actually builds (adds a missing build-system block to pull in nanobind)
  • Windows before-build now uses download_mlir.py
  • On Windows, avoids packaging signed XRT binaries (see note below)

3) CMake fixes for Windows

Updated: top-level CMakeLists.txt + tools/CMakeLists.txt

  • Adds AIE_BUILD_CHESS_CLANG (default ON elsewhere, default OFF on Windows)
  • Prevents Windows builds from attempting to use proprietary/non-existent Vitis/xchesscc (can be forced on if that ever exists in the future)
  • Fixes a few small Windows-specific CMake annoyances (module path, UUID handling, imported XRT DLL location)

Updated: cmake/modulesXilinx/FindXRT.cmake

  • Addresses XRT directory layout differences between Windows and Linux
  • Keeps Linux discovery behavior intact

Updated: tools/bootgen/CMakeLists.txt

  • Adds MSVC-friendly compile options (exceptions + warning suppression) so bootgen builds without spam

4) Small runtime/test infrastructure updates

Updated: python/utils/compile/cache/utils.py

  • Adds Windows-compatible file locking for the compile cache (msvcrt), retaining fcntl behavior on POSIX
  • Locking works differently on Windows vs. Linux, but this implementation will prevent race conditions (which I think was the point)

Updated: python/aie_lit_utils/lit_config_helpers.py

  • Adjusts XRT link flags for Windows (no -luuid)
  • Ensures runtime search path handling is correct across platforms

Updated: python/CMakeLists.txt

  • Drops uuid linking on Windows (not applicable)

Updated: python/requirements_dev.txt + python/requirements_ml.txt

  • requirements_dev.txt: adds missing build tooling (ninja, cibuildwheel needed for Windows wheel builds) and bumps CMake floor to match what we actually use
  • requirements_ml.txt: switches to --extra-index-url for torch so it doesn't clobber PyPI for anyone mixing requirement sets

5) Documentation for Windows-native bring-up

New: docs/buildHostWinNative.md

  • End-to-end Windows-native bring-up guide:
    • Installing Windows-native tools
    • Building XRT
    • Setting up IRON env
    • Building wheels
    • Running examples

This one was a real doozy. I did my best to trim it down to the essentials, but the XRT setup is going to be very annoying for the average user. I had to submit a separate PR to the XRT project just to get it to work at all. And, from my conversation with one of the code owners, it turns out that they build it on their end by pre-populating certain directories with tools from a private GitHub package. The average user has no such luxury. I offered to write a script to improve the situation, but I was told that any Python scripts would be rejected outright. I do not think a .bat file is well suited for the kind of dependency management that XRT requires (anyone who has tried to nest if/else statements in cmd, not to mention debug using only the error "'.' was unexpected at this time.", knows what I am talking about here), and, even if I did put the time into it, I have no way of testing against their currently private build system, in which case I would be wasting my time. So... we have this. I understand where they are coming from over there, so I'm not upset about it. I'm just explaining why this is such a PITA. Perhaps in future I can come up with something better.

As an aside: I noticed that the original docs/buildHostWin.md (for WSL) made a point of only applying the trademark symbol to AMD™-owned properties... and not those of other companies. So, I had some fun with that to keep myself entertained as I worked. My compliments to the chef on that.


CI / workflows

These are the only things I have not fully tested on my end, and therefore are the ones most likely to break:

  • mlirAIEDistro.yml: adds Windows wheel build coverage alongside existing platforms
  • buildRyzenWheels.yml: adds a Windows job that builds and stages artifacts consistently with existing jobs

The Windows jobs are designed to minimize special-casing. I largely copied them from the existing Linux jobs, so they should follow the same flow, aside from a few OS-specific fixes: most notably the need to use vcpkg to compile OpenSSL (so bootgen builds).
This is also one area where I did deliberately touch existing code, but for good reason. Assuming that we will now be publishing twice as many wheels (Linux + Windows), and we are building them in parallel, the possibility of a race condition arises when we push them out. Therefore, I moved publishing into a publish job that runs after both build jobs complete, and then updates releases sequentially.

Results should be transparent relative to the existing behavior: version-tag runs still publish to the version tag (e.g. vX.Y.Z), and scheduled/manual runs still update the same baseline tags (latest-wheels-3 and latest-wheels-no-rtti). The only difference is that Windows wheels are now included alongside the Linux ones, and the release update shouldn't Kentucky Derby. On the other hand, and speaking frankly, I am not exactly the world's foremost expert in GitHub actions and runners. So I would be surprised if this works the first time.


Windows-specific patches

The patches for the MLIR and llvm-aie wheels are clearly delineated with banner-headers in utils/mlir_aie_wheels/scripts/download_mlir.py and utils/iron_setup.py respectively. But here is a brief outline:

  • MLIR wheel (download_mlir.py)

    • The downloaded MLIR wheel's LLVM CMake exports embed an absolute path to VS's diaguids.lib (from the DIA SDK).
    • That path is machine-specific, and it breaks linking on any runner/machine that doesn't have that exact Visual Studio install path.
    • The patch replaces any missing absolute .../diaguids.lib reference with either:
      • the detected DIA SDK path from VSINSTALLDIR / vswhere, or
      • a plain diaguids.lib (which should show up in a normal library search path on any properly set up Windows machine)
    • It is easy to see how this oversight made it to production: the windows-2022 GitHub runner defaults to VS 2022 Enterprise, which is the exact hard-path that was embedded in the wheel. However, it still prevents local builds and therefore needed to be fixed.
    • Seeing that we build our own MLIR wheel, I considered just patching it on our end with this PR, but I felt it was safer to do it this way first. I had a hard enough time staying in scope as it is. I'll follow up on this later.
  • llvm-aie wheel (iron_setup.py)

    • The Windows llvm-aie wheel ships c.lib / m.lib, but a lot of the build logic assumes Unix-style libc.a / libm.a.
    • The patch creates libc.a and libm.a aliases (copies) so that existing Makefiles/CMake logic doesn't have to special-case Windows.
    • It also strips the .deplibs section from crt1.o via llvm-objcopy, which prevents bogus MSVC runtime deps from leaking into the aie2p link line. This is the primary bug that stopped me in my tracks months ago.
      • This patch is why I require Windows users to have LLVM installed. It needs to happen even if MLIR isn't staged, so I couldn't use the copy of llvm-objcopy packaged in that wheel.
      • I don't understand how this ever could have worked. I can only assume that there are few-to-zero Windows consumers of llvm-aie. Until now. In any case, I'll see about submitting a patch upstream to remove the .deplibs section from crt1.o in llvm-aie, as that is the real fix.

Errata:

  • The "python bindings" wheel appeared to be broken the way that I found it (I cannot see how it could have built, even in Linux). For completeness, I fixed it (this is the second place I touched Linux code). But I honestly don't know what purpose it is supposed to serve. Perhaps it is a vestige from before pyxrt existed?
  • On the topic of the bindings wheel, for Windows only, I deliberately disabled packaging xrt_coreutil.dll. That is a signed binary, and I won't involve myself in packaging those. It is on the system PATH for all Windows users, anyway.
  • In terms of NPU-access, Windows is, obviously, going to be host-only as a result of Microsoft's policies, which are enforced through the exports of the signed driver/binaries. Perhaps over-zealously. I anticipate that at least some documentation updates will be required to align expectations with Windows realities.
  • Additionally, the NPU is supposed to have access to a reserved pool of system memory using flags.carvout, but, after failing with some test code, I dumped the exports of pyxrt.pyd, and we simply don't have it. I'd like to say I can address that with another XRT patch, but... at some point I need to realize that I'm not getting paid to do this, nor has anybody asked me.
  • The existing top-level CMakeLists.txt contains the following:
# If we need runtime libs, then statically link them.
set(CMAKE_MSVC_RUNTIME_LIBRARY "MultiThreaded$<$<CONFIG:Debug>:Debug>")

... among other "MultiThreaded" (/MT) MSVC settings.
I didn't touch that, but I can imagine this being a problem on Windows because, again, we are using signed binaries without corresponding static import libs, and the XRT-built import libs (produced in both static and dynamic flavors) are almost certainly going to ABI mismatch or access fault against the drivers somewhere. I have not seen any problems with this yet, but that doesn't mean it won't happen.

  • The utils/iron_setup.py script automatically sets env-vars on Windows that are equivalent to sourcing opt/xilinx/xrt/setup.sh, with the exception of setting the XRT directory to XRT_ROOT and not XILINX_XRT (which it force-unsets in case it accidentally appears). Setting XILINX_XRT decouples from the system binaries on Windows, meaning the signed drivers are inaccessible and the NPU becomes unreachable.
  • I made some trouble for myself with the NPU detection: AIE2PS/NPU3 is now provisioned for future support, but is currently treated as a superset of AIE2P/NPU2.
  • I exclusively use PowerShell on Windows. That is why the documentation spends so much time harping about PowerShell, and also why PowerShell appears first in the scripts.
  • Please excuse the crudity of this model. I didn't have time to build it to scale or paint it.

How to test

Windows (native)

  • Spend an hour building XRT (see: docs/buildHostWinNative.md)
  • Bring up IRON env:
    • python utils/iron_setup.py --all
    • python utils/iron_setup.py env --shell pwsh | iex (or --shell cmd)
  • Build and install the Windows mlir-aie wheels:
    • $env:CIBW_BUILD = "cp312-*"
    • $env:OPENSSL_ROOT_DIR="$env:VCPKG_ROOT/installed/x64-windows" # Not needed if OpenSSL is installed standalone AND the libraries are in system directories
    • python utils/mlir_aie_wheels/scripts/build_local.py # This thing trashes the disk almost as hard as a stress-test would. I'd like to streamline it more in the future.
    • python utils/iron_setup.py install --mlir-aie wheelhouse
    • python utils/iron_setup.py env --shell pwsh | iex # Again
  • Run a programming example end-to-end:
    • python utils/run_example.py run --example-dir programming_examples/basic/vector_scalar_mul

Linux/WSL

  • Existing flows are unchanged; new Python scripts should work equivalently when desired.
  • However, I encourage you to experiment with the new iron_setup.py script, as it is deliberately cross-platform. It would be nice if we could standardize on a single script to avoid drift in the future.

@jgmelber
Copy link
Collaborator

Thanks @thomthehound for all the hard work you have put into this. I plan on testing this myself locally in addition to reviewing this PR. I would also be very interested in a chat if you are, feel free to reach out via email or IRON Discord.

@jgmelber
Copy link
Collaborator

Updated: cmake/modulesXilinx/FindXRT.cmake

These changes would need to be merged into: https://github.com/Xilinx/cmakeModules before bumping the submodule

@thomthehound
Copy link
Contributor Author

These changes would need to be merged into: https://github.com/Xilinx/cmakeModules before bumping the submodule

Oops. I knew I was forgetting something. Thank you, PR submitted.

@jgmelber
Copy link
Collaborator

These changes would need to be merged into: https://github.com/Xilinx/cmakeModules before bumping the submodule

Oops. I knew I was forgetting something. Thank you, PR submitted.

Merged. You can now bump the submodule.

Copy link
Collaborator

@jgmelber jgmelber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need to look through a few things a bit more, a lot of great work here! Here are a few initial comments

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this necessary to check in? Or perhaps a user specific preference?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's only there as a quick test that everything works. I intend to remove it later. The problem is that the makefiles for the examples have some bash-isms in them, and I didn't want to direct users to use Git Bash (honestly, I'm not even entirely sure that would work anyway). So I would have had to give bespoke manual commands for every example on Windows. Having an AI generate that just seemed faster.

Copy link
Contributor Author

@thomthehound thomthehound Feb 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about this some more, and I believe I actually did test the makefiles with Git Bash and MSYS2, and they failed because WSL support exists. So I still would have needed to re-write everything, even if I did direct users to POSIX to run the examples.

Until I get more clever, I really do think this is the best way to demonstrate the example programs can build in Windows.

For the record; its existence offends me, too.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm OK with checking this in for now as a step to test the programming_examples on Windows natively. I think there are ways to improve the Makefiles to better support Windows. @jackl-xilinx started the initial work on supporting both Linux and WSL in them.

Maybe we should just use cmake...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My thought exactly. I already have something in mind, but it is going to take a while.

@thomthehound
Copy link
Contributor Author

Alright, those were great suggestions. Thank you! It will take me a few hours to revise and re-test.

@thomthehound
Copy link
Contributor Author

thomthehound commented Feb 12, 2026

Updated utils/iron_setup.py:

  • Windows llvm-aie patches apply to all *-none-unknown-elf toolchain directories under <peano>/lib.
  • NPU detection output and commenting is now more conservative.

Updated python/aie_lit_utils/lit_config_helpers.py:

  • Regex now takes the correct form to span spaces.

Notes:

  • If xrt-smi reports anything greater than RyzenAI-npu3, or any "NPU " at all, we treat it as NPU1 if we don't know what else it is (future-proof fallback).
  • We detect NPU3 (treating it as NPU2), but we do not report that to the end user.
  • We can still remove mention of NPU3 entirely, if desired.

@jgmelber
Copy link
Collaborator

  • If xrt-smi reports anything greater than RyzenAI-npu3, or any "NPU " at all, we treat it as NPU1 if we don't know what else it is (future-proof fallback).
  • We detect NPU3 (treating it as NPU2), but we do not report that to the end user.
  • We can still remove mention of NPU3 entirely, if desired.

For NPU3 it might be best to detect it but not treat it as something it is not, I am thinking this might cause a bug that's hard to track down in the future.

@thomthehound
Copy link
Contributor Author

thomthehound commented Feb 12, 2026

  • If xrt-smi reports anything greater than RyzenAI-npu3, or any "NPU " at all, we treat it as NPU1 if we don't know what else it is (future-proof fallback).
  • We detect NPU3 (treating it as NPU2), but we do not report that to the end user.
  • We can still remove mention of NPU3 entirely, if desired.

For NPU3 it might be best to detect it but not treat it as something it is not, I am thinking this might cause a bug that's hard to track down in the future.

Nuked.

Copy link
Collaborator

@jgmelber jgmelber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this matches what we do/will see from xrt-smi

# If we have anything that reports like an NPU, even if unknown type, accept it as at least XDNA.
NPU_REGEX = re.compile(r"(RyzenAI-npu(?![0-3]\b)\d+|\bNPU\b)", re.IGNORECASE)
NPU2_REGEX = re.compile(r"NPU Strix|NPU Strix Halo|NPU Krackan|NPU Gorgon|RyzenAI-npu[4567]", re.IGNORECASE)
NPU3_REGEX = re.compile(r"NPU Medusa|RyzenAI-npu[8]", re.IGNORECASE)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
NPU3_REGEX = re.compile(r"NPU Medusa|RyzenAI-npu[8]", re.IGNORECASE)
NPU3_REGEX = re.compile(r"NPU Medusa|RyzenAI-npu[3]", re.IGNORECASE)


# If we have anything that reports like an NPU, even if unknown type, accept it as at least XDNA.
NPU_REGEX = re.compile(r"(RyzenAI-npu(?![0-3]\b)\d+|\bNPU\b)", re.IGNORECASE)
NPU2_REGEX = re.compile(r"NPU Strix|NPU Strix Halo|NPU Krackan|NPU Gorgon|RyzenAI-npu[4567]", re.IGNORECASE)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
NPU2_REGEX = re.compile(r"NPU Strix|NPU Strix Halo|NPU Krackan|NPU Gorgon|RyzenAI-npu[4567]", re.IGNORECASE)
NPU2_REGEX = re.compile(r"NPU Strix|NPU Strix Halo|NPU Krackan|RyzenAI-npu[456]", re.IGNORECASE)



# If we have anything that reports like an NPU, even if unknown type, accept it as at least XDNA.
NPU_REGEX = re.compile(r"(RyzenAI-npu(?![0-3]\b)\d+|\bNPU\b)", re.IGNORECASE)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
NPU_REGEX = re.compile(r"(RyzenAI-npu(?![0-3]\b)\d+|\bNPU\b)", re.IGNORECASE)
NPU_REGEX = re.compile(r"NPU Phoenix|RyzenAI-npu[1]", re.IGNORECASE)

@jgmelber
Copy link
Collaborator

5) Documentation for Windows-native bring-up

New: docs/buildHostWinNative.md
...

I think this is missing in the PR

@thomthehound
Copy link
Contributor Author

5) Documentation for Windows-native bring-up

New: docs/buildHostWinNative.md
...

I think this is missing in the PR

Ah. The docs were in the .gitignore list. I was pushing so much that I missed it.

@thomthehound
Copy link
Contributor Author

thomthehound commented Feb 13, 2026

I believe there may be a bug with the GitHub actions that interferes with retrieving branches from forks (or, rather, it assumes all calls are coming from inside the house). This appears to be the proximal cause of the remaining failures. I suspect that, if there had happened to be a pre-existing branch named windows-bring-up in this repo, the "MLIR AIE Distro" tests would have run against whatever was in that branch instead of mine.

I think I see where the problem is, so I can prepare a patch for it if you like. However, I do not think repairing it is in the scope of this PR.

On the other hand: this does provide the perfect opportunity to test such a patch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants