Skip to content

Conversation

@CoTinker
Copy link
Contributor

@CoTinker CoTinker commented Aug 4, 2025

This PR adds a verification check in LaunchOp::verify() to disallow nested gpu.launch operations. Nested gpu.launch is currently unsupported and can lead to undefined or unintended behavior during lowering. This change ensures that such cases are caught early during IR verification. Fixes #149318.

This PR adds a verification check in `LaunchOp::verify()` to disallow nested `gpu.launch` operations. Nested `gpu.launch` is currently unsupported and can lead to undefined or unintended behavior during lowering. This change ensures that such cases are caught early during IR verification.
@llvmbot
Copy link
Member

llvmbot commented Aug 4, 2025

@llvm/pr-subscribers-mlir-gpu

@llvm/pr-subscribers-mlir

Author: Longsheng Mou (CoTinker)

Changes

This PR adds a verification check in LaunchOp::verify() to disallow nested gpu.launch operations. Nested gpu.launch is currently unsupported and can lead to undefined or unintended behavior during lowering. This change ensures that such cases are caught early during IR verification. Fixes #149318.


Full diff: https://github.com/llvm/llvm-project/pull/151968.diff

2 Files Affected:

  • (modified) mlir/lib/Dialect/GPU/IR/GPUDialect.cpp (+3)
  • (modified) mlir/test/Dialect/GPU/invalid.mlir (+15)
diff --git a/mlir/lib/Dialect/GPU/IR/GPUDialect.cpp b/mlir/lib/Dialect/GPU/IR/GPUDialect.cpp
index 5a72ef17db7f0..d6438d355fec1 100644
--- a/mlir/lib/Dialect/GPU/IR/GPUDialect.cpp
+++ b/mlir/lib/Dialect/GPU/IR/GPUDialect.cpp
@@ -866,6 +866,9 @@ LogicalResult LaunchOp::verify() {
   if (!(hasClusterSize()) &&
       (getClusterSizeX() || getClusterSizeY() || getClusterSizeZ()))
     return emitOpError() << "cluster size must be all present";
+
+  if (getOperation()->getParentOfType<LaunchOp>())
+    return emitOpError() << "not support nested launches";
   return success();
 }
 
diff --git a/mlir/test/Dialect/GPU/invalid.mlir b/mlir/test/Dialect/GPU/invalid.mlir
index 35381dab7b200..4606dabb59cbe 100644
--- a/mlir/test/Dialect/GPU/invalid.mlir
+++ b/mlir/test/Dialect/GPU/invalid.mlir
@@ -35,6 +35,21 @@ func.func @launch_requires_gpu_return(%sz : index) {
 
 // -----
 
+func.func @nested_launches(%sz : index) {
+  gpu.launch blocks(%bx, %by, %bz) in (%sbx = %sz, %sby = %sz, %sbz = %sz)
+             threads(%tx, %ty, %tz) in (%stx = %sz, %sty = %sz, %stz = %sz) {
+    // @expected-error@+1 {{'gpu.launch' op not support nested launches}}
+    gpu.launch blocks(%bx1, %by1, %bz1) in (%sbx1 = %sz, %sby1 = %sz, %sbz1 = %sz)
+               threads(%tx1, %ty1, %tz1) in (%stx1 = %sz, %sty1 = %sz, %stz1 = %sz) {
+      gpu.terminator
+    }
+    gpu.terminator
+  }
+  return
+}
+
+// -----
+
 func.func @launch_func_too_few_operands(%sz : index) {
   // expected-error@+1 {{expected 6 or more operands}}
   "gpu.launch_func"(%sz, %sz, %sz, %sz, %sz)

@joker-eph
Copy link
Collaborator

joker-eph commented Aug 4, 2025

I don't believe this should be a verifier error because that does not compose with inlining. Making this part of the verifier would mean that the inliner transformation would be subject to create invalid IR without any possibility to prevent this.

Instead we should catch this in gpu-kernel-outlining and error out appropriately.

@CoTinker
Copy link
Contributor Author

CoTinker commented Aug 4, 2025

I don't believe this should be a verifier error because that does not compose with inlining. Making this part of the verifier would mean that the inliner transformation would be subject to create invalid IR without any possibility to prevent this.

Instead we should catch this in gpu-kernel-outlining and error out appropriately.

Okay, I will submit a new PR to fix this issue.

@CoTinker CoTinker closed this Aug 4, 2025
@grypp
Copy link
Member

grypp commented Aug 4, 2025

Nested gpu.launch is valid IR, and we should not report an error in the target‑independent gpu dialect or during gpu-kernel-outlining.

For example, in CUDA supports nested kernel launch, it's called dynamic parallelism, which allows launching a kernel from within another kernel. A lowering for this could be implemented today.

However, since we currently don’t have such a lowering, we could emit an error in the gpu-to-nvvm pass (NVIDIA‑specific). Other vendors can add similar lowering or diagnostics in their respective passes.

@CoTinker
Copy link
Contributor Author

CoTinker commented Aug 4, 2025

For example, in CUDA supports nested kernel launch, it's called dynamic parallelism, which allows launching a kernel from within another kernel. A lowering for this could be implemented today.

Thanks for your reply, that's mean we should support lowering nested gpu.launch in gpu-kernel-outlining. I'll implement it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[MLIR] crashed in -gpu-kernel-outlining pass with error message: Assertion `isa<To>(Val) && "cast<Ty>() argument of incompatible type!"' failed

4 participants