[examples][xegpu-matmul] Add XeGPU matrix multiplication example #28

tkarna · 2025-11-25T21:04:14Z

Adds XeGPU matrix multiplication example that runs the payload, checks correctness and measures performance.

matmul.py is the main script with CLI.
README.md has installation instructions and usage examples.

rolfmorel

Nice! Left some comments inline.

python/examples/xegpu_matmul/execution_engine.py

python/examples/xegpu_matmul/schedule.py

python/examples/xegpu_matmul/payload_generator.py

python/examples/xegpu_matmul/schedule.py

python/examples/xegpu_matmul/mlir_utils.py

python/examples/xegpu_matmul/schedule.py

python/examples/xegpu_matmul/runner.py

python/examples/xegpu_matmul/matmul.py

rolfmorel · 2025-11-26T10:51:58Z

For the failing CI, have a look at adding both a RUN: ... and a REQUIRES: ... line to the runnable files: https://llvm.org/docs/TestingGuide.html#constraining-test-execution and https://llvm.org/docs/TestingGuide.html#tips-for-writing-constraints

For the non-runnable files, add a file lit.local.cfg alongside with an excludes clause, e.g. https://github.com/llvm/lighthouse/blob/2f3e92f88ebb98ddddd29239b3aacc8d07e9ef28/python/examples/ingress/torch/lit.local.cfg

rolfmorel · 2025-11-26T20:48:29Z

python/examples/xegpu_matmul/runner.py

+            one = arith.constant(index_t, 1)
+            nwarmup_cst = arith.constant(index_t, nwarmup)
+            for i in scf.for_(zero, nwarmup_cst, one):
+                # FIXME(upstream): func.call is broken for this use case?


Can confirm that CamelCaseOp is subclassed (and that this subclass shadows the autogen-ed CamelCaseOp version) while the autogen-ed snake_case wrapper is not shadowed. Hence using func.call returns the autogen-ed CallOp and not its subclass.

rolfmorel · 2025-11-26T22:57:49Z

python/examples/xegpu_matmul/payload.py

+        fargs.append(memref_c_t)
+
+        @func.func(*fargs, name=func_name)
+        def payload(*args):


To demonstrate full end-to-end, we should have a look at how easy it is to get IR in the following form automatically converted to a form that works with your schedule:

module { func.func @main(%arg0: tensor<2048x8192xf32>, %arg1: tensor<8192x4096xf32>) -> tensor<2048x4096xf32> { %cst = arith.constant 0.000000e+00 : f32 %0 = tensor.empty() : tensor<2048x4096xf32> %1 = linalg.fill ins(%cst : f32) outs(%0 : tensor<2048x4096xf32>) -> tensor<2048x4096xf32> %2 = linalg.matmul ins(%arg0, %arg1 : tensor<2048x8192xf32>, tensor<8192x4096xf32>) outs(%1 : tensor<2048x4096xf32>) -> tensor<2048x4096xf32> return %2 : tensor<2048x4096xf32> } }

That is, the above is the IR we get from torch-mlir from a basic matmul in Torch: https://github.com/ScalingIntelligence/KernelBench/blob/5c88b2319076e8d44b9901914de7b45d220944e9/KernelBench/level1/2_Standard_matrix_multiplication_.py

That would make the demonstration "fully upstream" and "fully end-to-end."

Yes. The above kernel bufferizes to a kernel that allocates the output memref. In this case, we'd need a mechanism to convert the alloc, e.g., to a gpu alloc if necessary. And we'd need to track the allocated buffers and deallocate them later.

If the kernel updates a tensor in-place, it's a little trickier. At tensor level the function must return the updated tensor. This return value becomes redundant after bufferization. In fact, the return value can cause a copy, and then the input and output memrefs are different, breaking the semantics. We could drop the return value after bufferization, but that changes the function signature which is often not desirable.

This matmul example demonstrates an update-in-place kernel. In this case its easiest if we define the function boundary with memrefs and keep it fixed.

For autotuning, we cannot use kernels that allocate the output buffer. So yes, we'd need to find a way to convert, say, a torch-mlir kernel to in-place-update semantics. Should not be too hard actually, as the input/output roles of the arguments are clear.

For simple kernels like this, there are bufferization passes that can help with that. However, we might have to also explore more robust solutions long-term.
I'd keep the current example as is, and iterate later.

python/examples/xegpu_matmul/README.md

python/examples/xegpu_matmul/mlir_utils.py

python/examples/xegpu_matmul/matmul.py

Co-authored-by: Adam Siemieniuk <[email protected]>

tkarna · 2025-11-27T14:14:24Z

For the failing CI, have a look at adding both a RUN: ... and a REQUIRES: ... line to the runnable files: https://llvm.org/docs/TestingGuide.html#constraining-test-execution and https://llvm.org/docs/TestingGuide.html#tips-for-writing-constraints

For the non-runnable files, add a file lit.local.cfg alongside with an excludes clause, e.g. https://github.com/llvm/lighthouse/blob/2f3e92f88ebb98ddddd29239b3aacc8d07e9ef28/python/examples/ingress/torch/lit.local.cfg

I've set up CI such that it just dumps the IR at XeGPU WG level. This does not require custom LLVM build and can thus be executed with the standard install.

python/examples/xegpu_matmul/matmul.py

python/examples/xegpu_matmul/lit.local.cfg

add xegpu matmul example

58e3f28

tkarna force-pushed the xegpu-matmul-example branch from 2ba2713 to 58e3f28 Compare November 25, 2025 21:08

tkarna added 2 commits November 25, 2025 23:26

define context and location only once

6a5ecee

funciong arg typing

fd4d1f9

rolfmorel reviewed Nov 25, 2025

View reviewed changes

adam-smnk reviewed Nov 26, 2025

View reviewed changes

tkarna added 4 commits November 26, 2025 18:31

func.func decorator and snake_case op names

2f6957b

payload_generator: do not pass module to helper func generators

5957395

simplify schedule and use snake_case op names where possible

ee34b37

schedule: add comment about gpu-lower-to-xevm-pipeline

04a4821

rolfmorel reviewed Nov 26, 2025

View reviewed changes

tkarna added 3 commits November 27, 2025 11:13

rename execution_engine.py -> runner.py

37ac9f2

rename payload_generator.py -> payload.py

1cde5a3

simplify allocation: context manager returns input memrefs

ed39be1

adam-smnk reviewed Nov 27, 2025

View reviewed changes

python/examples/xegpu_matmul/README.md Outdated Show resolved Hide resolved

python/examples/xegpu_matmul/mlir_utils.py Outdated Show resolved Hide resolved

python/examples/xegpu_matmul/matmul.py Outdated Show resolved Hide resolved

tkarna and others added 3 commits November 27, 2025 11:29

Update README

6786837

Co-authored-by: Adam Siemieniuk <[email protected]>

mlir_utils: snake_case op names where possible

42899a2

remove element type options, only (f16,f16,f32) can be lowered

787f9f9

tkarna force-pushed the xegpu-matmul-example branch 3 times, most recently from 2c0acab to 163e4a7 Compare November 27, 2025 14:08

enable simple CI test

fbcc453

tkarna force-pushed the xegpu-matmul-example branch from 163e4a7 to fbcc453 Compare November 27, 2025 14:10

adam-smnk approved these changes Nov 27, 2025

View reviewed changes

python/examples/xegpu_matmul/matmul.py Outdated Show resolved Hide resolved

rolfmorel reviewed Nov 27, 2025

View reviewed changes

python/examples/xegpu_matmul/lit.local.cfg Outdated Show resolved Hide resolved

rolfmorel approved these changes Nov 28, 2025

View reviewed changes

number of warmup iters is now configurable + clean up lit config

dd8243a

more snake_case transform ops

d4513be

[examples][xegpu-matmul] Add XeGPU matrix multiplication example #28

Are you sure you want to change the base?

[examples][xegpu-matmul] Add XeGPU matrix multiplication example #28

Conversation

tkarna commented Nov 25, 2025

Uh oh!

rolfmorel left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

rolfmorel commented Nov 26, 2025

Uh oh!

rolfmorel Nov 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rolfmorel Nov 26, 2025

Choose a reason for hiding this comment

Uh oh!

rolfmorel Nov 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tkarna Nov 27, 2025

Choose a reason for hiding this comment

Uh oh!

tkarna Nov 27, 2025

Choose a reason for hiding this comment

Uh oh!

adam-smnk Nov 27, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

tkarna commented Nov 27, 2025

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

rolfmorel Nov 26, 2025 •

edited

Loading

rolfmorel Nov 26, 2025 •

edited

Loading