[mlir][python] Add Pythonic wrappers for gpu ops #163883

ashermancinelli · 2025-10-16T22:49:32Z

Add some tests for launching GPU kernels and regions and correct some small documentation mistakes. I wonder if we could add builders that take 3-tuples for the dim3 launch parameters, and let the async tokens default to None/empty list; essentially supporting the use cases provided by the builders on the C++ side, like:

    OpBuilder<(ins "GPUFuncOp":$kernelFunc, "KernelDim3":$gridSize,
      "KernelDim3":$blockSize, "Value":$dynamicSharedMemorySize,
      "ValueRange":$kernelOperands,
      CArg<"Type", "nullptr">:$asyncTokenType,
      CArg<"ValueRange", "{}">:$asyncDependencies,
      CArg<"std::optional<KernelDim3>", "std::nullopt">:$clusterSize)>,

This PR is only to test what's currently there, but if folks support a builder that mirrors the C++ builders while preserving the current use-case, I'll add that next. Thanks in advance!

These are the tests I wish I could have referred to during development. Also corrected some small documentation mistakes.

llvmbot · 2025-10-16T22:49:58Z

@llvm/pr-subscribers-mlir

@llvm/pr-subscribers-mlir-gpu

Author: Asher Mancinelli (ashermancinelli)

Changes

Add some tests for launching GPU kernels and regions and correct some small documentation mistakes. I wonder if we could add builders that take 3-tuples for the dim3 launch parameters, and let the async tokens default to None/empty list; essentially supporting the use cases provided by the builders on the C++ side, like:

    OpBuilder&lt;(ins "GPUFuncOp":$kernelFunc, "KernelDim3":$gridSize,
      "KernelDim3":$blockSize, "Value":$dynamicSharedMemorySize,
      "ValueRange":$kernelOperands,
      CArg&lt;"Type", "nullptr"&gt;:$asyncTokenType,
      CArg&lt;"ValueRange", "{}"&gt;:$asyncDependencies,
      CArg&lt;"std::optional&lt;KernelDim3&gt;", "std::nullopt"&gt;:$clusterSize)&gt;,

This PR is only to test what's currently there, but if folks support a builder that mirrors the C++ builders while preserving the current use-case, I'll add that next. Thanks in advance!

Full diff: https://github.com/llvm/llvm-project/pull/163883.diff

3 Files Affected:

(modified) mlir/docs/Dialects/GPU.md (+1-1)
(modified) mlir/include/mlir/Dialect/GPU/IR/GPUOps.td (+1-1)
(modified) mlir/test/python/dialects/gpu/dialect.py (+98-1)

diff --git a/mlir/docs/Dialects/GPU.md b/mlir/docs/Dialects/GPU.md
index 8d4d2ca3e5743..c16ed57737e5b 100644
--- a/mlir/docs/Dialects/GPU.md
+++ b/mlir/docs/Dialects/GPU.md
@@ -121,7 +121,7 @@ func.func @main() {
     gpu.launch
         blocks(%0, %1, %2) in (%3 = %c1, %4 = %c1, %5 = %c1)
         threads(%6, %7, %8) in (%9 = %c2, %10 = %c1, %11 = %c1) {
-        gpu.printf "Hello from %d\n" %6 : index
+        gpu.printf "Hello from %d\n", %6 : index
         gpu.terminator
     }
     return
diff --git a/mlir/include/mlir/Dialect/GPU/IR/GPUOps.td b/mlir/include/mlir/Dialect/GPU/IR/GPUOps.td
index 987fc13e0508d..a6c6038e1e224 100644
--- a/mlir/include/mlir/Dialect/GPU/IR/GPUOps.td
+++ b/mlir/include/mlir/Dialect/GPU/IR/GPUOps.td
@@ -584,7 +584,7 @@ def GPU_DynamicSharedMemoryOp : GPU_Op<"dynamic_shared_memory", [Pure]>
     This operation provides a memref pointer to the start of dynamic shared
     memory, often referred to as workgroup memory. It's important to note that
     this dynamic shared memory needs to be allocated at kernel launch. One can
-    conveniently utilize `the dynamic_shared_memory_size` parameter of
+    conveniently utilize the `dynamic_shared_memory_size` parameter of
     `gpu.launch` for this purpose.
 
     Examples:
diff --git a/mlir/test/python/dialects/gpu/dialect.py b/mlir/test/python/dialects/gpu/dialect.py
index 66c401886804c..24f20d109b3d0 100644
--- a/mlir/test/python/dialects/gpu/dialect.py
+++ b/mlir/test/python/dialects/gpu/dialect.py
@@ -2,7 +2,8 @@
 
 from mlir.ir import *
 import mlir.ir as ir
-import mlir.dialects.gpu as gpu
+from mlir.dialects import gpu, func, arith, math
+from mlir.extras import types as T
 import mlir.dialects.gpu.passes
 from mlir.passmanager import *
 
@@ -157,3 +158,99 @@ def builder(func: gpu.GPUFuncOp) -> None:
     # CHECK:   %[[VAL_0:.*]] = gpu.global_id  x
     # CHECK:   gpu.return
     # CHECK: }
+
+# CHECK-LABEL: testGPULaunchFuncOp
+@run
+def testGPULaunchFuncOp():
+    module = Module.create()
+
+    module.operation.attributes["gpu.container_module"] = UnitAttr.get()
+    with InsertionPoint(module.body):
+        gpu_module = gpu.GPUModuleOp("gpu_module")
+        block = gpu_module.bodyRegion.blocks.append()
+
+    with InsertionPoint(block):
+        gpu_func = gpu.GPUFuncOp(
+            FunctionType.get([], []),
+            "kernel",
+            body_builder=lambda func: gpu.return_([]),
+            kernel=True,
+        )
+
+    with InsertionPoint(module.body):
+        host = func.FuncOp(type=FunctionType.get([], []), name="host")
+
+    with InsertionPoint(host.add_entry_block()):
+        c1 = arith.constant(T.index(), 1)
+        grid_sizes = [c1] * 3
+        block_sizes = [c1] * 3
+        sym_ref = SymbolRefAttr.get([gpu_module.sym_name.value, gpu_func.name.value])
+        token_type = Type.parse("!gpu.async.token")
+        token = gpu.wait(async_token=token_type, async_dependencies=[])
+        token = gpu.launch_func(
+            async_token=token_type,
+            async_dependencies=[token],
+            kernel=sym_ref,
+            grid_size_x=grid_sizes[0],
+            grid_size_y=grid_sizes[1],
+            grid_size_z=grid_sizes[2],
+            block_size_x=block_sizes[0],
+            block_size_y=block_sizes[1],
+            block_size_z=block_sizes[2],
+            kernel_operands=[],
+        )
+        gpu.wait(async_token=None, async_dependencies=[token])
+        func.ReturnOp([])
+
+    print(module)
+
+    # CHECK-LABEL:   gpu.module @gpu_module {
+    # CHECK:           gpu.func @kernel() kernel {
+    # CHECK:             gpu.return
+    # CHECK:           }
+    # CHECK:         }
+
+    # CHECK-LABEL:   func.func @host() {
+    # CHECK:           %[[CONSTANT_0:.*]] = arith.constant 1 : index
+    # CHECK:           %[[WAIT_0:.*]] = gpu.wait async
+    # CHECK:           %[[LAUNCH_FUNC_0:.*]] = gpu.launch_func async {{\[}}%[[WAIT_0]]] @gpu_module::@kernel blocks in (%[[CONSTANT_0]], %[[CONSTANT_0]], %[[CONSTANT_0]]) threads in (%[[CONSTANT_0]], %[[CONSTANT_0]], %[[CONSTANT_0]])
+    # CHECK:           gpu.wait {{\[}}%[[LAUNCH_FUNC_0]]]
+    # CHECK:           return
+    # CHECK:         }
+
+
+# CHECK-LABEL: testGPULaunchOp
+@run
+def testGPULaunchOp():
+    module = Module.create()
+
+    with InsertionPoint(module.body):
+        host = func.FuncOp(type=FunctionType.get([T.f32()], []), name="gpu_printf")
+
+    entry_block = host.add_entry_block()
+    with InsertionPoint(entry_block):
+        c1 = arith.constant(T.index(), 1)
+
+        launch = gpu.launch(None, [], c1, c1, c1, c1, c1, c1)
+        launch_block = launch.regions[0].blocks.append()
+        for _ in range(12):
+            launch_block.add_argument(T.index(), Location.unknown())
+
+    with InsertionPoint(launch_block):
+        gpu.printf("%f", [entry_block.arguments[0]])
+        gpu.terminator()
+
+    with InsertionPoint(entry_block):
+        func.ReturnOp([])
+
+    print(module)
+
+    # CHECK-LABEL:   func.func @gpu_printf(
+    # CHECK-SAME:      %[[ARG0:.*]]: f32) {
+    # CHECK:           %[[CONSTANT_0:.*]] = arith.constant 1 : index
+    # CHECK:           gpu.launch blocks(%[[VAL_0:.*]], %[[VAL_1:.*]], %[[VAL_2:.*]]) in (%[[VAL_3:.*]] = %[[CONSTANT_0]], %[[VAL_4:.*]] = %[[CONSTANT_0]], %[[VAL_5:.*]] = %[[CONSTANT_0]]) threads(%[[VAL_6:.*]], %[[VAL_7:.*]], %[[VAL_8:.*]]) in (%[[VAL_9:.*]] = %[[CONSTANT_0]], %[[VAL_10:.*]] = %[[CONSTANT_0]], %[[VAL_11:.*]] = %[[CONSTANT_0]]) {
+    # CHECK:             gpu.printf "%[[VAL_12:.*]]", %[[ARG0]] : f32
+    # CHECK:             gpu.terminator
+    # CHECK:           }
+    # CHECK:           return
+    # CHECK:         }

github-actions · 2025-10-16T22:51:09Z

✅ With the latest revision this PR passed the Python code formatter.

makslevental · 2025-10-16T22:57:06Z

mlir/test/python/dialects/gpu/dialect.py

+        sym_ref = SymbolRefAttr.get([gpu_module.sym_name.value, gpu_func.name.value])
+        token_type = Type.parse("!gpu.async.token")
+        token = gpu.wait(async_token=token_type, async_dependencies=[])
+        token = gpu.launch_func(


you can just grab this https://github.com/makslevental/mlir-python-extras/blob/main/mlir/extras/dialects/ext/gpu.py#L339-L379 (which supports exactly what you're saying - 3-tuples), put it in mlir/dialects/gpu.py (along with the standard register_operation thing) and then just below wrap it in launch_func (thereby shadowing auto-generated launch_func).

That all looks great to me, but are we worried about forcing folks to update their code if it depends on the Python bindings?

what do you mean? oh you're saying since this meaningfully changes the signature of both these existing APIs (LaunchFuncOp and launch_func) relative to the auto-generated ones?

Right - anyone using either api will need to update their code. That's annoying! But maybe worth it. Just making sure 😄

The python APIs aren't stable (ie we make no stability guarantees). So basically this same "breakage" occurs whenever we add one of these nicer builders. Also there's a simple "migration path": people can just import the generated original APIs directly from _gpu_ops_gen if they really want to keep using the old (auto-generated) APIs.

Since users always have an easy way to get their old bindings back, I feel better about adding the Python-extras builders. I'll pull in the builder you linked, and I'll be ready to merge if I get an OK from Mehdi or Guray. Thanks!

makslevental · 2025-10-17T02:05:14Z

mlir/python/mlir/dialects/gpu/__init__.py

+def printf(format, *args, loc=None, ip=None):
+    return _printf(format=format, args=args, loc=loc, ip=ip)


ok lol this is a good use of *args

makslevental

LGTM! Thanks for the upstreaming.

[mlir][python] Add tests for gpu.launch(_func) ops

1db01a2

These are the tests I wish I could have referred to during development. Also corrected some small documentation mistakes.

ashermancinelli requested review from grypp and joker-eph October 16, 2025 22:49

ashermancinelli self-assigned this Oct 16, 2025

ashermancinelli added the mlir:gpu label Oct 16, 2025

ashermancinelli requested review from fabianmcg, ftynse, makslevental, rolfmorel and stellaraccident as code owners October 16, 2025 22:49

llvmbot added the mlir label Oct 16, 2025

format

6fecd47

makslevental reviewed Oct 16, 2025

View reviewed changes

Adapt some GPU utils from mlir-python-extras

7e5056d

llvmbot added the mlir:python MLIR Python bindings label Oct 17, 2025

Use legacy Union spelling

5f00447

makslevental reviewed Oct 17, 2025

View reviewed changes

makslevental approved these changes Oct 17, 2025

View reviewed changes

ashermancinelli changed the title ~~[mlir][python] Add tests for gpu.launch(_func) ops~~ [mlir][python] Add Pythonic wrappers for gpu launch ops Oct 17, 2025

ashermancinelli changed the title ~~[mlir][python] Add Pythonic wrappers for gpu launch ops~~ [mlir][python] Add Pythonic wrappers for gpu ops Oct 17, 2025

ashermancinelli merged commit c375c41 into llvm:main Oct 20, 2025
10 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[mlir][python] Add Pythonic wrappers for gpu ops #163883

[mlir][python] Add Pythonic wrappers for gpu ops #163883

ashermancinelli commented Oct 16, 2025

Uh oh!

llvmbot commented Oct 16, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Oct 16, 2025 •

edited

Loading

Uh oh!

makslevental Oct 16, 2025 •

edited

Loading

Uh oh!

ashermancinelli Oct 16, 2025

Uh oh!

makslevental Oct 16, 2025

Uh oh!

ashermancinelli Oct 16, 2025

Uh oh!

makslevental Oct 17, 2025 •

edited

Loading

Uh oh!

ashermancinelli Oct 17, 2025

Uh oh!

makslevental Oct 17, 2025

Uh oh!

makslevental left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		def printf(format, *args, loc=None, ip=None):
		return _printf(format=format, args=args, loc=loc, ip=ip)

[mlir][python] Add Pythonic wrappers for gpu ops #163883

[mlir][python] Add Pythonic wrappers for gpu ops #163883

Conversation

ashermancinelli commented Oct 16, 2025

Uh oh!

llvmbot commented Oct 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Oct 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

makslevental Oct 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ashermancinelli Oct 16, 2025

Choose a reason for hiding this comment

Uh oh!

makslevental Oct 16, 2025

Choose a reason for hiding this comment

Uh oh!

ashermancinelli Oct 16, 2025

Choose a reason for hiding this comment

Uh oh!

makslevental Oct 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ashermancinelli Oct 17, 2025

Choose a reason for hiding this comment

Uh oh!

makslevental Oct 17, 2025

Choose a reason for hiding this comment

Uh oh!

makslevental left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

llvmbot commented Oct 16, 2025 •

edited

Loading

github-actions bot commented Oct 16, 2025 •

edited

Loading

makslevental Oct 16, 2025 •

edited

Loading

makslevental Oct 17, 2025 •

edited

Loading