[mlir] Added `Convergent` trait that matches LLVM's semantics #152358

npanchen · 2025-08-06T18:20:17Z

LLVM provides convergent function attribute that says call to it must not be made control-dependent on any new condition. For example, that attribute disables jump threading that otherwise can lead to runtime errors or dead lock.
See https://llvm.org/docs/ConvergentOperations.html for more details.

It appears that MLIR does not provide a trait for this even though some operations, such as nvvm.barrier0 is convergent due it lowering to llvm::Intrinsic::nvvm_barrier_cta_sync_aligned_all.

The patch adds Convergent trait to ControlFlowInterface (IMO, that's appropriate place for this trait) and adds that trait to some NVVM operations that are lowered to convergent LLVM Intrinsic.

LLVM provides `convergent` function attribute that says call to it must not be made control-dependent on any new condition. For example, that attribute disables jump threading that otherwise can lead to runtime errors or dead lock. See https://llvm.org/docs/ConvergentOperations.html for more details. It appears that MLIR does not provide a trait for this even though some operations, such as `nvvm.barrier0` is convergent due it lowering to `llvm::Intrinsic::nvvm_barrier_cta_sync_aligned_all`. The patch adds `Convergent` trait to `ControlFlowInterface` (IMO, that's appropriate place for this trait) and adds that trait to some NVVM operations that are lowered to convergent LLVM Intrinsic.

llvmbot · 2025-08-06T18:20:49Z

@llvm/pr-subscribers-mlir

@llvm/pr-subscribers-mlir-llvm

Author: Nikolay Panchenko (npanchen)

Changes

LLVM provides convergent function attribute that says call to it must not be made control-dependent on any new condition. For example, that attribute disables jump threading that otherwise can lead to runtime errors or dead lock.
See https://llvm.org/docs/ConvergentOperations.html for more details.

It appears that MLIR does not provide a trait for this even though some operations, such as nvvm.barrier0 is convergent due it lowering to llvm::Intrinsic::nvvm_barrier_cta_sync_aligned_all.

The patch adds Convergent trait to ControlFlowInterface (IMO, that's appropriate place for this trait) and adds that trait to some NVVM operations that are lowered to convergent LLVM Intrinsic.

Full diff: https://github.com/llvm/llvm-project/pull/152358.diff

4 Files Affected:

(modified) mlir/include/mlir/Dialect/LLVMIR/NVVMOps.td (+23-14)
(modified) mlir/include/mlir/Interfaces/ControlFlowInterfaces.h (+5-1)
(modified) mlir/include/mlir/Interfaces/ControlFlowInterfaces.td (+4)
(modified) mlir/test/lib/Dialect/Test/TestOps.td (+5)

diff --git a/mlir/include/mlir/Dialect/LLVMIR/NVVMOps.td b/mlir/include/mlir/Dialect/LLVMIR/NVVMOps.td
index 30df3b739e5ca..e95328398fe0c 100644
--- a/mlir/include/mlir/Dialect/LLVMIR/NVVMOps.td
+++ b/mlir/include/mlir/Dialect/LLVMIR/NVVMOps.td
@@ -14,12 +14,13 @@
 #define NVVMIR_OPS
 
 include "mlir/IR/EnumAttr.td"
+include "mlir/Interfaces/ControlFlowInterfaces.td"
+include "mlir/Interfaces/InferIntRangeInterface.td"
+include "mlir/Interfaces/SideEffectInterfaces.td"
 include "mlir/Dialect/GPU/IR/CompilationAttrInterfaces.td"
 include "mlir/Dialect/LLVMIR/LLVMOpBase.td"
 include "mlir/Dialect/LLVMIR/NVVMRequiresSMTraits.td"
-include "mlir/Interfaces/SideEffectInterfaces.td"
 include "mlir/Dialect/LLVMIR/BasicPtxBuilderInterface.td"
-include "mlir/Interfaces/InferIntRangeInterface.td"
 include "mlir/Dialect/LLVMIR/LLVMTypes.td"
 
 def LLVM_PointerGeneric : LLVM_PointerInAddressSpace<0>;
@@ -561,7 +562,7 @@ def NVVM_MBarrierTestWaitSharedOp : NVVM_Op<"mbarrier.test.wait.shared">,
 // NVVM synchronization op definitions
 //===----------------------------------------------------------------------===//
 
-def NVVM_Barrier0Op : NVVM_Op<"barrier0"> {
+def NVVM_Barrier0Op : NVVM_Op<"barrier0", [Convergent]> {
   let assemblyFormat = "attr-dict";
   string llvmBuilder = [{
       createIntrinsicCall(
@@ -570,8 +571,9 @@ def NVVM_Barrier0Op : NVVM_Op<"barrier0"> {
   }];
 }
 
-def NVVM_BarrierOp : NVVM_Op<"barrier", [AttrSizedOperandSegments]> {
-  let arguments = (ins     
+def NVVM_BarrierOp : NVVM_Op<"barrier",
+  [Convergent, AttrSizedOperandSegments]> {
+  let arguments = (ins
     Optional<I32>:$barrierId,
     Optional<I32>:$numberOfThreads);
   string llvmBuilder = [{
@@ -598,7 +600,7 @@ def NVVM_BarrierOp : NVVM_Op<"barrier", [AttrSizedOperandSegments]> {
   ];
 }
 
-def NVVM_BarrierArriveOp : NVVM_PTXBuilder_Op<"barrier.arrive"> 
+def NVVM_BarrierArriveOp : NVVM_PTXBuilder_Op<"barrier.arrive", [Convergent]>
 {
   let arguments = (ins Optional<I32>:$barrierId, I32:$numberOfThreads);
 
@@ -624,7 +626,7 @@ def NVVM_BarrierArriveOp : NVVM_PTXBuilder_Op<"barrier.arrive">
   }];
 }
 
-def NVVM_ClusterArriveOp : NVVM_Op<"cluster.arrive"> {
+def NVVM_ClusterArriveOp : NVVM_Op<"cluster.arrive", [Convergent]> {
   let arguments = (ins OptionalAttr<UnitAttr>:$aligned);
 
   let summary = "Cluster Barrier Arrive Op";
@@ -647,7 +649,8 @@ def NVVM_ClusterArriveOp : NVVM_Op<"cluster.arrive"> {
   let assemblyFormat = "attr-dict";
 }
 
-def NVVM_ClusterArriveRelaxedOp : NVVM_Op<"cluster.arrive.relaxed", [NVVMRequiresSM<90>]> {
+def NVVM_ClusterArriveRelaxedOp : NVVM_Op<"cluster.arrive.relaxed",
+  [Convergent, NVVMRequiresSM<90>]> {
   let arguments = (ins OptionalAttr<UnitAttr>:$aligned);
 
   let summary = "Cluster Barrier Relaxed Arrive Op";
@@ -673,7 +676,8 @@ def NVVM_ClusterArriveRelaxedOp : NVVM_Op<"cluster.arrive.relaxed", [NVVMRequire
   let assemblyFormat = "attr-dict";
 }
 
-def NVVM_ClusterWaitOp : NVVM_Op<"cluster.wait", [NVVMRequiresSM<90>]> {
+def NVVM_ClusterWaitOp : NVVM_Op<"cluster.wait",
+  [Convergent, NVVMRequiresSM<90>]> {
   let arguments = (ins OptionalAttr<UnitAttr>:$aligned);
 
   let summary = "Cluster Barrier Wait Op";
@@ -1054,7 +1058,8 @@ def NVVM_CpAsyncWaitGroupOp : NVVM_Op<"cp.async.wait.group">,
   let assemblyFormat = "$n attr-dict";
 }
 
-def NVVM_CpAsyncMBarrierArriveOp : NVVM_Op<"cp.async.mbarrier.arrive"> {
+def NVVM_CpAsyncMBarrierArriveOp : NVVM_Op<"cp.async.mbarrier.arrive",
+  [Convergent]> {
   let summary = "NVVM Dialect Op for cp.async.mbarrier.arrive";
   let description = [{
     The `cp.async.mbarrier.arrive` Op makes the mbarrier object track
@@ -1079,7 +1084,8 @@ def NVVM_CpAsyncMBarrierArriveOp : NVVM_Op<"cp.async.mbarrier.arrive"> {
   }];
 }
 
-def NVVM_CpAsyncMBarrierArriveSharedOp : NVVM_Op<"cp.async.mbarrier.arrive.shared"> {
+def NVVM_CpAsyncMBarrierArriveSharedOp : NVVM_Op<"cp.async.mbarrier.arrive.shared",
+  [Convergent]> {
   let summary = "NVVM Dialect Op for cp.async.mbarrier.arrive.shared";
   let description = [{
     The `cp.async.mbarrier.arrive.shared` Op makes the mbarrier object
@@ -2806,7 +2812,8 @@ def NVVM_CpAsyncBulkSharedCTAToGlobalOp :
 // NVVM Wgmma Ops
 //===----------------------------------------------------------------------===//
 
-def NVVM_WgmmaFenceAlignedOp : NVVM_Op<"wgmma.fence.aligned", [NVVMRequiresSMa<[90]>]> {
+def NVVM_WgmmaFenceAlignedOp : NVVM_Op<"wgmma.fence.aligned",
+  [Convergent, NVVMRequiresSMa<[90]>]> {
   let arguments = (ins);
   let description = [{
     Enforce an ordering of register accesses between warpgroup level matrix 
@@ -2820,7 +2827,8 @@ def NVVM_WgmmaFenceAlignedOp : NVVM_Op<"wgmma.fence.aligned", [NVVMRequiresSMa<[
   }];
 }
 
-def NVVM_WgmmaGroupSyncAlignedOp : NVVM_Op<"wgmma.commit.group.sync.aligned", [NVVMRequiresSMa<[90]>]> {
+def NVVM_WgmmaGroupSyncAlignedOp : NVVM_Op<"wgmma.commit.group.sync.aligned",
+  [Convergent, NVVMRequiresSMa<[90]>]> {
   let assemblyFormat = "attr-dict";
   let description = [{
     Commits all prior uncommitted warpgroup level matrix multiplication operations.
@@ -2832,7 +2840,8 @@ def NVVM_WgmmaGroupSyncAlignedOp : NVVM_Op<"wgmma.commit.group.sync.aligned", [N
   }];
 }
 
-def NVVM_WgmmaWaitGroupSyncOp : NVVM_Op<"wgmma.wait.group.sync.aligned", [NVVMRequiresSMa<[90]>]> {
+def NVVM_WgmmaWaitGroupSyncOp : NVVM_Op<"wgmma.wait.group.sync.aligned",
+  [Convergent, NVVMRequiresSMa<[90]>]> {
   let arguments = (ins I64Attr:$group);
   let assemblyFormat = "attr-dict $group";
   let description = [{
diff --git a/mlir/include/mlir/Interfaces/ControlFlowInterfaces.h b/mlir/include/mlir/Interfaces/ControlFlowInterfaces.h
index d63800c12d132..750a9f86e49d7 100644
--- a/mlir/include/mlir/Interfaces/ControlFlowInterfaces.h
+++ b/mlir/include/mlir/Interfaces/ControlFlowInterfaces.h
@@ -337,8 +337,12 @@ struct ReturnLike : public TraitBase<ConcreteType, ReturnLike> {
     return success();
   }
 };
-} // namespace OpTrait
 
+// The Operation may not be made control-dependent on any additional values.
+// See https://llvm.org/docs/ConvergentOperations.html for more details.
+template <typename ConcreteType>
+struct Convergent : public TraitBase<ConcreteType, Convergent> {};
+} // namespace OpTrait
 } // namespace mlir
 
 //===----------------------------------------------------------------------===//
diff --git a/mlir/include/mlir/Interfaces/ControlFlowInterfaces.td b/mlir/include/mlir/Interfaces/ControlFlowInterfaces.td
index b8d08cc553caa..6eb2f9002d7cf 100644
--- a/mlir/include/mlir/Interfaces/ControlFlowInterfaces.td
+++ b/mlir/include/mlir/Interfaces/ControlFlowInterfaces.td
@@ -511,4 +511,8 @@ def ReturnLike : TraitList<[
     >
 ]>;
 
+// Use to inject an implementation of getSpeculatability.  Users should not use
+// this directly.
+def Convergent : NativeOpTrait<"Convergent">;
+
 #endif // MLIR_INTERFACES_CONTROLFLOWINTERFACES
diff --git a/mlir/test/lib/Dialect/Test/TestOps.td b/mlir/test/lib/Dialect/Test/TestOps.td
index 2eaad552a7a3a..ad14666a1a2cc 100644
--- a/mlir/test/lib/Dialect/Test/TestOps.td
+++ b/mlir/test/lib/Dialect/Test/TestOps.td
@@ -2113,6 +2113,11 @@ def TestTypeChangerOp : TEST_Op<"type_changer">,
 def TestValidOp : TEST_Op<"valid", [Terminator]>,
   Arguments<(ins Variadic<AnyType>)>;
 
+def TestConvergentOp : TEST_Op<"convergent", [Convergent]> {
+  let arguments = (ins AnyType);
+  let results = (outs AnyType);
+}
+
 def TestMergeBlocksOp : TEST_Op<"merge_blocks"> {
   let summary = "merge_blocks operation";
   let description = [{

mlir/include/mlir/Interfaces/ControlFlowInterfaces.td

joker-eph · 2025-08-06T19:08:25Z

mlir/include/mlir/Dialect/LLVMIR/NVVMOps.td

@@ -1054,7 +1058,8 @@ def NVVM_CpAsyncWaitGroupOp : NVVM_Op<"cp.async.wait.group">,
  let assemblyFormat = "$n attr-dict";
 }

-def NVVM_CpAsyncMBarrierArriveOp : NVVM_Op<"cp.async.mbarrier.arrive"> {
+def NVVM_CpAsyncMBarrierArriveOp : NVVM_Op<"cp.async.mbarrier.arrive",
+  [Convergent]> {


I'm not sure that these are actually convergent?
Actually it's not clear to me that any of the post-volta intrinsics needs to be.

See this comment: https://discourse.llvm.org/t/llvm-convergence-semantics/77642/12

Not sure I follow your comment. Are you referring to cp.async.mbarrier.arrive ? If so, I don't see it has been discussed in that comment/thread.
As for cp.async.mbarrier.arrive, it's marked in LLVM with IntrConvergent.

If so, I don't see it has been discussed in that comment/thread.

The comment is a general one, so it potentially applies to all the intrinsics :)

As for cp.async.mbarrier.arrive, it's marked in LLVM with IntrConvergent.

I do have low confidence for LLVM current annotation as a reliable source of documentation right now.

I do have low confidence for LLVM current annotation as a reliable source of documentation right now.

That's valid point. At the same time having different traits set on LLVM intrinsic and MLIR operation, that is lowered to that intrinsic, looks not only confusing, but stinky.
Ideally, for low-level dialects, such as NVVM, it will be great just to query that property from the intrinsic it's lowered to, but that does require to have LLVMContext.

Specifically about convergent, my understanding, if intrinsics is marked with it, but in reality it's not convergent, it's only bad for performance, but stability-wise it's still correct.
But yeah, it will be great to go through all LLVM intrinsics and make sure proper traits are used.

@AlexMaclean @Prince781 that will be great to revisit nvvm's intrinsic properties. I know PTX doc definitely describes nvvm_barrier_cta_sync_aligned_all as convergent, but I couldn't find such statement for others

Yes the ".aligned" specifier in PTX is what brings the convergence requirement.

But I'm not sure why cp.async.mbarrier.arrive for example would need to care about convergence?

That's valid point. At the same time having different traits set on LLVM intrinsic and MLIR operation, that is lowered to that intrinsic, looks not only confusing, but stinky.

Absolutely: we should fix LLVM ;)
The problem is that NVVM non-aligned intrinsics should be modeled with some sort of write effects to a private memory: that could be pessimizing transformations (because special casing this everywhere isn't something LLVM has been optimizing for).

joker-eph · 2025-08-06T19:14:26Z

mlir/include/mlir/Interfaces/ControlFlowInterfaces.h

+// The Operation may not be made control-dependent on any additional values.
+// See https://llvm.org/docs/ConvergentOperations.html for more details.
+template <typename ConcreteType>
+struct Convergent : public TraitBase<ConcreteType, Convergent> {};


Is there any impact that MLIR is using structured control-flow in the support for convergence? In particular aren't token inserted during lowering to CFG to preserve loop structures for example?

There's no direct impact on absence of this in MLIR, but this trait is a prerequisite to properly annotate functions that invoke such operations. OtherwiseLLVM will apply optimizations incorrectly.
Specifically, the use case I saw was:

fn callee() { nvvm.barrier0 } fn caller() { if (condition) { call callee() } }

callee has to be marked with convergent attribute, otherwise if jump threading is applied, the code will become invalid.

OtherwiseLLVM will apply optimizations incorrectly.

For each Op that MLIR generates, LLVM sets convergency. So LLVM won't do incorrect optimization for MLIR.

For each Op that MLIR generates, LLVM sets convergency.

Are you sure we would do it inter-procedurally and annotate the actual call here?
Or are we relying on some attribute propagation later?

@npanchen : when mapping a structured control-flow to a CFG, just annotating things as "convergent" isn't enough I believe. See this example: https://youtu.be/_Z5DuiVCFAw?t=434 ; basically structured control-flow needs to say "something" about the reconvergence property and the management of anchors (the video is amazing for explaining all this I think!)

the convergent property is only set on the intrinsic, not on arbitrary function that calls to that intrinsic. Since LLVM does not infer that property automatically, that can result to undesired optimization.
Just to highlight that JT works differently for the case I described above: https://godbolt.org/z/9T4d7MM58. To note, the IR there looks ok, but the problem I observed is similar, but just has more complex CF.

Also, the MLIR trait seems generally useful to prevent, say, loop multiversioning by trip count if loop contains such operation.

@npanchen : when mapping a structured control-flow to a CFG, just annotating things as "convergent" isn't enough I believe. See this example: https://youtu.be/_Z5DuiVCFAw?t=434 ; basically structured control-flow needs to say "something" about the reconvergence property and the management of anchors (the video is amazing for explaining all this I think!)

Thanks, that's quite interesting video. I actually was unable to find when convergence.loop/convergence.entry/convergence.anchor are generated. Are they even still in use ?
If they are, yeah, that might be more interesting from design point of view. As of know I can only think about adding RecursivelyConvergent trait. But not sure that's a good idea for functions.

completely side note: there's also noduplicate property, which seems not to be used a lot, but it's not supported by MLIR and it does directly impact inliner.

They are emitted by clang, search for shouldEmitConvergenceTokens() there, you'll see the kind of things I have in mind for our structured-to-cfg needs (and actually: a semantics definition of convergence guarantee on our structured control-flow)

I see. I was not searching right.

grypp · 2025-08-07T13:02:29Z

mlir/include/mlir/Dialect/LLVMIR/NVVMOps.td

-def NVVM_BarrierOp : NVVM_Op<"barrier", [AttrSizedOperandSegments]> {
-  let arguments = (ins     
+def NVVM_BarrierOp : NVVM_Op<"barrier",
+  [Convergent, AttrSizedOperandSegments]> {


This instruction can synchronize different thread counts. Could you clarify what convergent refers to in this context? I may be missing the nuance—does it mean a convergent warp or a convergent CTA?

nvvm.barrier 32 nvvm.barrier 128

That's is based on LLVM's assumption that intrinsics are convergent.
See my other reply about consistency of convergent properties.

grypp · 2025-08-07T13:25:08Z

mlir/include/mlir/Interfaces/ControlFlowInterfaces.h

+// The Operation may not be made control-dependent on any additional values.
+// See https://llvm.org/docs/ConvergentOperations.html for more details.
+template <typename ConcreteType>
+struct Convergent : public TraitBase<ConcreteType, Convergent> {};


OtherwiseLLVM will apply optimizations incorrectly.

For each Op that MLIR generates, LLVM sets convergency. So LLVM won't do incorrect optimization for MLIR.

npanchen · 2025-08-07T21:41:56Z

@joker-eph @grypp is there anything I need to address in this PR ?

joker-eph · 2025-08-07T23:13:15Z

All the discussions we're having above looks like things to close on.

Right now you're adding convergent on a bunch of operations which are not convergent as far as I know.

Also adding a trait on something for which we don't have a good definition or an idea of how it'll work with our structured control-flow does not seem like something I'm comfortable to do: I think the plan needs to be more fleshed out.

What concrete problem is this patch solving for you right now?

npanchen · 2025-08-08T17:03:22Z

All the discussions we're having above looks like things to close on.

Right now you're adding convergent on a bunch of operations which are not convergent as far as I know.
What concrete problem is this patch solving for you right now?

The exact issue is

fn callee() {
  nvvm.barrier0
}

fn caller() {
  if (condition) {
    call callee()
  }
}

I'm ok-ish to add new trait to nvvm.barrier0. But it will be great if you can poke your colleagues to revisit remaining intrinsics so that remaining MLIR ops can be updated properly.

Also adding a trait on something for which we don't have a good definition or an idea of how it'll work with our structured control-flow does not seem like something I'm comfortable to do: I think the plan needs to be more fleshed out.

On one hand I agree with you that adding something that might change in future isn't good. On the other that's hard to reason about proper design without having convergent CF examples to support.
Since the change is quite minimal I don't think it's a big problem to be replaced with a proper design later (that assumes RecursivelyConvergent won't be right approach and trait I'm adding won't fit in that right approach too)

grypp · 2025-08-09T13:39:51Z

@npanchen Do you need this trait only for nvvm.barrier0? This is a special op that every thread in a CTA must run, so it’s convergent.
Do you have another op that you think also requires it in your case?

joker-eph · 2025-08-09T19:27:44Z

The exact issue is
...

You're showing me some pseudo source program, but I'm not sure what is the actual issue in MLIR and how this patch solves it right now.

On one hand I agree with you that adding something that might change in future isn't good.

This isn't exactly what I am saying I believe: I'm saying that it isn't good to add something that isn't well defined and documented, for which we can't tell users about how to use it.

On the other that's hard to reason about proper design without having convergent CF examples to support.

Aren't the example from the presentation and the Clang codegen a good reference about the examples to support?

Of course we could work around it by saying that "Convergent" is defined exactly as LLVM, and actually restrict it to LLVM function with flat CFGs only. But then that becomes a property of the LLVM layer and not something central in MLIR.

npanchen · 2025-08-11T16:28:55Z

The exact issue is
...

You're showing me some pseudo source program, but I'm not sure what is the actual issue in MLIR and how this patch solves it right now.

Ok, let me reiterate previous comments: the MLIR pseudo-code I provided as-is leads to invalid optimization done by jump threading, because callee is not marked with convergent attribute. The goal of that PR is to add special trait that some special pass can easily rely on to propagate convergency through a call graph. That is, to do something like that:

if (op->hasTrait<Convergent>() || /*if calls llvm.func that is convergent*/) {
  op->getParentOfType<LLVMFuncOp>()->setConvergent();
}

Intentionally use LLVMFuncOp as it does have convergent attr that can be used in "the special pass".

This isn't exactly what I am saying I believe: I'm saying that it isn't good to add something that isn't well defined and documented, for which we can't tell users about how to use it.
Aren't the example from the presentation and the Clang codegen a good reference about the examples to support?

I'm not sure I follow you. What exactly is not defined in added trait ? The added trait is defined in LLVM's term with a LLVM's link to convergent semantics. Is there anything else you want me to clarify/add to the doc ?

Of course we could work around it by saying that "Convergent" is defined exactly as LLVM, and actually restrict it to LLVM function with flat CFGs only. But then that becomes a property of the LLVM layer and not something central in MLIR.

IMO defining it "as LLVM" is totally fine first step. IIRC, SideEffect trait was decomposed into several other traits, so if convergent trait is "too generic" that can be split into other traits.
But if you do have some other ideas in mind that will be more generic and still cover LLVM's convergent, I'll be glad to hear them.

joker-eph · 2025-08-11T16:39:57Z

mlir/test/lib/Dialect/Test/TestOps.td

+def TestConvergentOp : TEST_Op<"convergent", [Convergent]> {
+  let arguments = (ins AnyType);
+  let results = (outs AnyType);
+}


What is this doing right now? Without an actual test exercising this op, adding an op isn't useful.

joker-eph · 2025-08-11T16:40:06Z

Ok, let me reiterate previous comments: the MLIR pseudo-code I provided as-is leads to invalid optimization done by jump threading, because callee is not marked with convergent attribute

There is no jump threading in MLIR and your patch does not show any tests: so that's not a clear answer to "I'm not sure what is the actual issue in MLIR and how this patch solves it right now".

I'm not sure I follow you. What exactly is not defined in added trait ? The added trait is defined in LLVM's term with a LLVM's link to convergent semantics. Is there anything else you want me to clarify/add to the doc ?

As mentioned before: LLVM does not have structured control-flow, so all the definitions from LLVM need to be revisited.

But if you do have some other ideas in mind that will be more generic and still cover LLVM's convergent, I'll be glad to hear them.

As mentioned before when I pointed what clang does: making it more generic requires new definitions and looking into the semantics we want for structured control-flow structure.

IMO defining it "as LLVM" is totally fine first step.

That means you need to move all this away from mlir/include/mlir/Interfaces/ControlFlowInterfaces.h into the LLVM dialect.

npanchen · 2025-08-11T16:42:39Z

@npanchen Do you need this trait only for nvvm.barrier0? This is a special op that every thread in a CTA must run, so it’s convergent. Do you have another op that you think also requires it in your case?

I cannot definitively answer yes or no. At least my current workaround, to propagate convergency through a call graph, only relies on nvvm.barrier0. We do use some other operations I've previously marked as convergent, but I don't see any runtime issues so far.
fyi: nvvm.barrier0 problem was uncovered recently due to recent changes in JT

npanchen · 2025-08-11T16:54:08Z

There is no jump threading in MLIR and your patch does not show any tests: so that's not a clear answer to "I'm not sure what is the actual issue in MLIR and how this patch solves it right now".

Yes, that's what I said before too: there's no direct impact on MLIR, but it does impact optimizations done later on generated LLVM IR.

As mentioned before: LLVM does not have structured control-flow, so all the definitions from LLVM need to be revisited.
As mentioned before when I pointed what clang does: making it more generic requires new definitions and looking into the semantics we want for structured control-flow structure.

Ok, not to go over same things multiple times: do I get you right that for you such trait has to be defined from all possible angles (structured cf, functions, simple op) in a single PR ?

That means you need to move all this away from mlir/include/mlir/Interfaces/ControlFlowInterfaces.h into the LLVM dialect.

Thanks, that looks reasonable to move it there.

joker-eph · 2025-08-11T18:28:04Z

Ok, not to go over same things multiple times: do I get you right that for you such trait has to be defined from all possible angles (structured cf, functions, simple op) in a single PR ?

To be added to the system: yes.
To just mirror what LLVM does, restricted to flat-CFG on the LLVM dialect, it can be OK otherwise.

Yes, that's what I said #152358 (comment): there's no direct impact on MLIR, but it does impact optimizations done later on generated LLVM IR.

Do you intend to write a MLIR pass that infer a convergent attribute on llvm.call operation by traversing the IR inteprocedurally? Isn't just the LLVM Attributor pass already doing this on LLVM IR and so you can do this after translation to LLVM IR today instead?

npanchen · 2025-08-11T18:41:34Z

To be added to the system: yes. To just mirror what LLVM does, restricted to flat-CFG on the LLVM dialect, it can be OK otherwise.

Sounds good.

Do you intend to write a MLIR pass that infer a convergent attribute on llvm.call operation by traversing the IR inteprocedurally?

I definitely can add to upstream for completeness even though internally we won't use it. However, if you or someone else is planning to work on convergency for higher level dialects, this pass will be redundant.

Isn't just the LLVM Attributor pass already doing this on LLVM IR and so you can do this after translation to LLVM IR today instead?

That's what I was unable to find. I do see it can promote convergent functions to non-convergent, but not other way around.
IMO, that does make sense as IR should be legal from the beginning, i.e. expecting some pass to add it isn't correct.

joker-eph · 2025-08-11T18:45:56Z

I definitely can add to upstream for completeness even though internally we won't use it.

I was more interested in understanding if this is how you plan to use this?

It's really difficult to review a feature without a use-case, and most of the time use-cases are "here is the transformation that will use it, and how it will be used". You haven't really been explaining this clearly to me, and your answer quoted above is keeping this a bit fuzzy to me.

joker-eph · 2025-08-11T18:47:43Z

IMO, that does make sense as IR should be legal from the beginning, i.e. expecting some pass to add it isn't correct.

Right, so in MLIR the safe thing to do should be to add the convergent attributes to every call/function unless we know the function does not need it (which can be an information provided as a flag, as target information, etc.).
The attributor should remove it correctly later.

npanchen · 2025-08-11T19:31:58Z

I was more interested in understanding if this is how you plan to use this?
It's really difficult to review a feature without a use-case, and most of the time use-cases are "here is the transformation that will use it, and how it will be used". You haven't really been explaining this clearly to me, and your answer quoted above is keeping this a bit fuzzy to me.

Need to differentiate plan before and after feedback to move to LLVM dialect.
"Before" suggestion, as I mentioned, that's prerequisite to annotate functions with this property too. I would definitely work on this propagation.
"After" suggestion, the pass that only works on LLVM Dialect makes less sense for internal needs. Internally we still have to propagate this early, so the only benefit from that patch is convenience of querying that property from the operation.

As for the use-case point, I don't really understand what you would like to see. Are you asking for -> LLVM Dialect -> LLVM IR -> LLVM opt example ?

ftynse · 2025-08-12T15:48:01Z

I agree with @joker-eph that if the convergent property were to be added to core IR of MLIR (lib/Interfaces is part of that), it needs to be designed with the full generality of MLIR in mind including structured control flow (which is not limited to the SCF dialect and is expressed using interfaces) and non-Nvidia targets. We actually had discussions about modeling this via side effects on terminators if I'm not mistaken, but I don't think it led to any particular implementation.

Having a trait on the LLVM and derived dialects, such as NVVM, also makes sense because LLVM IR has such a concept.

npanchen requested review from ftynse and joker-eph August 6, 2025 18:20

npanchen requested a review from grypp as a code owner August 6, 2025 18:20

llvmbot added mlir:llvm mlir labels Aug 6, 2025

joker-eph reviewed Aug 6, 2025

View reviewed changes

mlir/include/mlir/Interfaces/ControlFlowInterfaces.td Outdated Show resolved Hide resolved

joker-eph reviewed Aug 6, 2025

View reviewed changes

Fixed NVVM_PTXBuilder_Op and removed bad comment

058f9ae

grypp requested a review from durga4github August 7, 2025 13:08

grypp reviewed Aug 7, 2025

View reviewed changes

Only mark nvvm.barrier0 as convergent

cb8f0a2

joker-eph reviewed Aug 11, 2025

View reviewed changes

[mlir] Added Convergent trait that matches LLVM's semantics #152358

Are you sure you want to change the base?

[mlir] Added Convergent trait that matches LLVM's semantics #152358

Uh oh!

Conversation

npanchen commented Aug 6, 2025

Uh oh!

llvmbot commented Aug 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

joker-eph Aug 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

npanchen commented Aug 7, 2025

Uh oh!

joker-eph commented Aug 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

npanchen commented Aug 8, 2025

Uh oh!

grypp commented Aug 9, 2025

Uh oh!

joker-eph commented Aug 9, 2025

Uh oh!

npanchen commented Aug 11, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

joker-eph commented Aug 11, 2025

Uh oh!

npanchen commented Aug 11, 2025

Uh oh!

npanchen commented Aug 11, 2025

Uh oh!

joker-eph commented Aug 11, 2025

Uh oh!

npanchen commented Aug 11, 2025

Uh oh!

joker-eph commented Aug 11, 2025

Uh oh!

joker-eph commented Aug 11, 2025

Uh oh!

npanchen commented Aug 11, 2025

Uh oh!

[mlir] Added `Convergent` trait that matches LLVM's semantics #152358

[mlir] Added `Convergent` trait that matches LLVM's semantics #152358

llvmbot commented Aug 6, 2025 •

edited

Loading

joker-eph Aug 7, 2025 •

edited

Loading

joker-eph commented Aug 7, 2025 •

edited

Loading