Skip to content

Conversation

@yangtetris
Copy link
Contributor

@yangtetris yangtetris commented Oct 30, 2025

Summary

This PR improves the convert-gpu-to-nvvm pass to provide more comprehensive LLVM conversion by allowing the math dialect to fall back to generic LLVM lowering patterns when operations are not supported by libdevice, instead of leaving them unconverted. With this change, there is no need to append a convert-math-to-llvm pass after a convert-gpu-to-nvvm pass.

Since [mlir][GPUToNVVM] Add benefit to populate functions, we no longer need to skip math dialect to prioritize gpu-to-nvvm patterns over generic LLVM patterns. In fact, arith operations like arith.remf, arith.maxnumf already use benefits to control pattern priority.

Example

// Input Module
gpu.module @test {
  func.func @math_abs(%arg0: i16) -> i16 {
    %res = math.absi %arg0 : i16
    return %res : i16
  }
}

// Before the change: unconverted
gpu.module @test {
  func.func @math_abs(%arg0: i16) -> i16 {
    %res = math.absi %arg0 : i16
    return %res : i16
  }
}

// After the change:
gpu.module @test {
  func.func @math_abs(%arg0: i16) -> i16 {
    %res = "llvm.intr.abs"(%arg0) <{is_int_min_poison = false}> : (i16) -> i16
    return %res : i16
  }
}

@llvmbot
Copy link
Member

llvmbot commented Oct 30, 2025

@llvm/pr-subscribers-mlir

@llvm/pr-subscribers-mlir-gpu

Author: Yang Bai (yangtetris)

Changes

Summary

This PR improves the convert-gpu-to-nvvm pass to provide more comprehensive LLVM conversion by allowing the math dialect to fall back to generic LLVM lowering patterns when operations are not supported by libdevice, instead of leaving them unconverted. With this change, there is no need to append a convert-math-to-llvm pass after a convert-gpu-to-nvvm pass.
Since [mlir][GPUToNVVM] Add benefit to populate functions, we no longer need to skip math dialect to prioritize gpu-to-nvvm patterns over generic LLVM patterns. In fact, arith operations like arith.remf, arith.maxnumf already use benefits to control pattern priority.

Example

// Input Module
gpu.module {
  func.func @<!-- -->math_abs(%arg0: i16) -&gt; i16 {
    %res = math.absi %arg0 : i16
    return %res : i16
  }
}

// Before the change: unconverted
gpu.module {
  func.func @<!-- -->math_abs(%arg0: i16) -&gt; i16 {
    %res = math.absi %arg0 : i16
    return %res : i16
  }
}

// After the change:
gpu.module {
  func.func @<!-- -->math_abs(%arg0: i16) -&gt; i16 {
    %res = "llvm.intr.abs"(%arg0) &lt;{is_int_min_poison = false}&gt; : (i16) -&gt; i16
    return %res : i16
  }
}

Full diff: https://github.com/llvm/llvm-project/pull/165728.diff

2 Files Affected:

  • (modified) mlir/lib/Conversion/GPUToNVVM/LowerGpuOpsToNVVMOps.cpp (+2-4)
  • (added) mlir/test/Conversion/GPUToNVVM/gpu-to-generic-llvm.mlir (+29)
diff --git a/mlir/lib/Conversion/GPUToNVVM/LowerGpuOpsToNVVMOps.cpp b/mlir/lib/Conversion/GPUToNVVM/LowerGpuOpsToNVVMOps.cpp
index d64c4d64cad84..70c97b3566662 100644
--- a/mlir/lib/Conversion/GPUToNVVM/LowerGpuOpsToNVVMOps.cpp
+++ b/mlir/lib/Conversion/GPUToNVVM/LowerGpuOpsToNVVMOps.cpp
@@ -383,16 +383,14 @@ struct LowerGpuOpsToNVVMOpsPass final
     LLVMConversionTarget target(getContext());
 
     // Set higher benefit, so patterns will run before generic LLVM lowering.
+    // Make sure the benefit here is higher than ArithToLLVMDialectInterface and
+    // MathToLLVMDialectInterface.
     populateGpuToNVVMConversionPatterns(converter, llvmPatterns,
                                         /*benefit=*/10);
 
     llvm::SmallDenseSet<StringRef> allowedDialectsSet(allowedDialects.begin(),
                                                       allowedDialects.end());
     for (Dialect *dialect : getContext().getLoadedDialects()) {
-      // Skip math patterns as nvvm needs custom math lowering.
-      if (isa<math::MathDialect>(dialect))
-        continue;
-
       bool allowed = allowedDialectsSet.contains(dialect->getNamespace());
       // Empty `allowedDialectsSet` means all dialects are allowed.
       if (!allowedDialectsSet.empty() && !allowed)
diff --git a/mlir/test/Conversion/GPUToNVVM/gpu-to-generic-llvm.mlir b/mlir/test/Conversion/GPUToNVVM/gpu-to-generic-llvm.mlir
new file mode 100644
index 0000000000000..5be7938aae8ef
--- /dev/null
+++ b/mlir/test/Conversion/GPUToNVVM/gpu-to-generic-llvm.mlir
@@ -0,0 +1,29 @@
+// RUN: mlir-opt %s -convert-gpu-to-nvvm -split-input-file | FileCheck %s
+
+/// Math/arith ops that are not supported by libdevice
+/// should be converted by generic LLVM lowering patterns.
+
+gpu.module @generic_llvm_test_module_0 {
+  // CHECK-LABEL: @arith_add
+  func.func @arith_add(%left: i64, %right: i64) -> i64 {
+    // CHECK: llvm.add {{.*}}, {{.*}} : i64
+    %result = arith.addi %left, %right : i64
+    return %result : i64
+  }
+}
+
+gpu.module @generic_llvm_test_module_1 {
+  // CHECK-LABEL: @math_abs_non_i32
+  func.func @math_abs_non_i32(%arg_i64: i64, %arg_i16: i16, %arg_i8: i8, %arg_i1: i1) 
+      -> (i64, i16, i8, i1) {
+    // CHECK: "llvm.intr.abs"{{.*}} : (i64) -> i64
+    %abs_i64 = math.absi %arg_i64 : i64
+    // CHECK: "llvm.intr.abs"{{.*}} : (i16) -> i16
+    %abs_i16 = math.absi %arg_i16 : i16
+    // CHECK: "llvm.intr.abs"{{.*}} : (i8) -> i8
+    %abs_i8 = math.absi %arg_i8 : i8
+    // CHECK: "llvm.intr.abs"{{.*}} : (i1) -> i1
+    %abs_i1 = math.absi %arg_i1 : i1
+    return %abs_i64, %abs_i16, %abs_i8, %abs_i1 : i64, i16, i8, i1
+  }
+}

@yangtetris yangtetris changed the title [mlir] enable fallback to generic LLVM lowering for math dialect in convert-gpu-to-nvvm pass [mlir][GPUToNVVM] enable fallback to generic LLVM lowering for math dialect in convert-gpu-to-nvvm pass Oct 30, 2025
Copy link
Contributor

@fabianmcg fabianmcg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general the change seems ok. However, it's important to note that the convert-gpu-to-nvvm is not really meant to be used in production, instead it's expected that people use convert to llvm.

@yangtetris
Copy link
Contributor Author

In general the change seems ok. However, it's important to note that the convert-gpu-to-nvvm is not really meant to be used in production, instead it's expected that people use convert to llvm.

Thanks! Could you please elaborate a bit more why it's not recommended to use it in production? In fact I'm seeing this pass in some projects' production code.

@fabianmcg
Copy link
Contributor

fabianmcg commented Oct 30, 2025

Thanks! Could you please elaborate a bit more why it's not recommended to use it in production? In fact I'm seeing this pass in some projects' production code.

Sure. First some context, many of the conversion passes (I'm not talking about the patterns, just the passes) were created to help people downstream understand how to create their pipelines and passes (ie, not meant to be used directly). Testing upstream was also an important point in their creation.

The difference with convert-to-llvm is that it was specifically introduced to create a production level conversion pass that users could use.

Now, let me give more rationale behind why they're not meant to be in production. Lets assume that one has more convert-*-to-llvm passes in the pipeline, in such case you're invoking the dialect conversion infrastructure (which is a heavy infra), many times (in many cases with almost the same patterns), thus creating huge overheads in compilation time in the pipeline. Moreover, in many cases the conversion passes have many opinions on what to run (for example, I might want to error out if a math op couldn't be converted to an NVVM intrinsic, instead of going to llvm).

Consequently, the recommended approach in production has been: "build your own conversion pass" which won't have the drawbacks I mentioned above, as you decide which patterns to include all in one go (convert-to-llvm can in many cases do this).

@fabianmcg fabianmcg requested a review from grypp October 30, 2025 16:18
@joker-eph
Copy link
Collaborator

Do we have the infra in convert-to-llvm to inject target-specific patterns (like NVVM) right now?

@fabianmcg
Copy link
Contributor

Do we have the infra in convert-to-llvm to inject target-specific patterns (like NVVM) right now?

Yes: https://github.com/llvm/llvm-project/blob/main/mlir/test/Conversion/GPUToNVVM/gpu-to-nvvm-target-attr.mlir

Though, It could use a small cleanup. (make dynamic by default and potentially remove the static version).

@joker-eph
Copy link
Collaborator

Thanks @fabianmcg ; so are we ready to actually migrate to convert-to-llvm and remove this pass entirely?

@fabianmcg
Copy link
Contributor

fabianmcg commented Oct 31, 2025

Thanks @fabianmcg ; so are we ready to actually migrate to convert-to-llvm and remove this pass entirely?

I'd need to check, I know there are some things that have to be improved, but I think for the gpu-to-nvvm case almost everything is there.

@yangtetris
Copy link
Contributor Author

I'd need to check, I know there are some things that have to be improved, but I think for the gpu-to-nvvm case almost everything is there.

Hi @fabianmcg , I'm trying to replace the usage of convert-gpu-to-nvvm with dynamic convert-to-llvm + ConvertToLLVMAttrInterface in my project. However, I can't find any way to set the LowerToLLVMOption::useBarePtrCallConv field, which is needed for creating the type converter. Could we enhance the convert-to-llvm pass to support useBarePtrCallConv?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants