Skip to content

[mlir] Added Convergent trait that matches LLVM's semantics #152358

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

npanchen
Copy link
Contributor

@npanchen npanchen commented Aug 6, 2025

LLVM provides convergent function attribute that says call to it must not be made control-dependent on any new condition. For example, that attribute disables jump threading that otherwise can lead to runtime errors or dead lock.
See https://llvm.org/docs/ConvergentOperations.html for more details.

It appears that MLIR does not provide a trait for this even though some operations, such as nvvm.barrier0 is convergent due it lowering to llvm::Intrinsic::nvvm_barrier_cta_sync_aligned_all.

The patch adds Convergent trait to ControlFlowInterface (IMO, that's appropriate place for this trait) and adds that trait to some NVVM operations that are lowered to convergent LLVM Intrinsic.

LLVM provides `convergent` function attribute that says call to it must
not be made control-dependent on any new condition. For example, that
attribute disables jump threading that otherwise can lead to runtime
errors or dead lock.
See https://llvm.org/docs/ConvergentOperations.html for more details.

It appears that MLIR does not provide a trait for this even though some
operations, such as `nvvm.barrier0` is convergent due it lowering to
`llvm::Intrinsic::nvvm_barrier_cta_sync_aligned_all`.

The patch adds `Convergent` trait to `ControlFlowInterface` (IMO, that's
appropriate place for this trait) and adds that trait to some NVVM
operations that are lowered to convergent LLVM Intrinsic.
@llvmbot
Copy link
Member

llvmbot commented Aug 6, 2025

@llvm/pr-subscribers-mlir

@llvm/pr-subscribers-mlir-llvm

Author: Nikolay Panchenko (npanchen)

Changes

LLVM provides convergent function attribute that says call to it must not be made control-dependent on any new condition. For example, that attribute disables jump threading that otherwise can lead to runtime errors or dead lock.
See https://llvm.org/docs/ConvergentOperations.html for more details.

It appears that MLIR does not provide a trait for this even though some operations, such as nvvm.barrier0 is convergent due it lowering to llvm::Intrinsic::nvvm_barrier_cta_sync_aligned_all.

The patch adds Convergent trait to ControlFlowInterface (IMO, that's appropriate place for this trait) and adds that trait to some NVVM operations that are lowered to convergent LLVM Intrinsic.


Full diff: https://github.com/llvm/llvm-project/pull/152358.diff

4 Files Affected:

  • (modified) mlir/include/mlir/Dialect/LLVMIR/NVVMOps.td (+23-14)
  • (modified) mlir/include/mlir/Interfaces/ControlFlowInterfaces.h (+5-1)
  • (modified) mlir/include/mlir/Interfaces/ControlFlowInterfaces.td (+4)
  • (modified) mlir/test/lib/Dialect/Test/TestOps.td (+5)
diff --git a/mlir/include/mlir/Dialect/LLVMIR/NVVMOps.td b/mlir/include/mlir/Dialect/LLVMIR/NVVMOps.td
index 30df3b739e5ca..e95328398fe0c 100644
--- a/mlir/include/mlir/Dialect/LLVMIR/NVVMOps.td
+++ b/mlir/include/mlir/Dialect/LLVMIR/NVVMOps.td
@@ -14,12 +14,13 @@
 #define NVVMIR_OPS
 
 include "mlir/IR/EnumAttr.td"
+include "mlir/Interfaces/ControlFlowInterfaces.td"
+include "mlir/Interfaces/InferIntRangeInterface.td"
+include "mlir/Interfaces/SideEffectInterfaces.td"
 include "mlir/Dialect/GPU/IR/CompilationAttrInterfaces.td"
 include "mlir/Dialect/LLVMIR/LLVMOpBase.td"
 include "mlir/Dialect/LLVMIR/NVVMRequiresSMTraits.td"
-include "mlir/Interfaces/SideEffectInterfaces.td"
 include "mlir/Dialect/LLVMIR/BasicPtxBuilderInterface.td"
-include "mlir/Interfaces/InferIntRangeInterface.td"
 include "mlir/Dialect/LLVMIR/LLVMTypes.td"
 
 def LLVM_PointerGeneric : LLVM_PointerInAddressSpace<0>;
@@ -561,7 +562,7 @@ def NVVM_MBarrierTestWaitSharedOp : NVVM_Op<"mbarrier.test.wait.shared">,
 // NVVM synchronization op definitions
 //===----------------------------------------------------------------------===//
 
-def NVVM_Barrier0Op : NVVM_Op<"barrier0"> {
+def NVVM_Barrier0Op : NVVM_Op<"barrier0", [Convergent]> {
   let assemblyFormat = "attr-dict";
   string llvmBuilder = [{
       createIntrinsicCall(
@@ -570,8 +571,9 @@ def NVVM_Barrier0Op : NVVM_Op<"barrier0"> {
   }];
 }
 
-def NVVM_BarrierOp : NVVM_Op<"barrier", [AttrSizedOperandSegments]> {
-  let arguments = (ins     
+def NVVM_BarrierOp : NVVM_Op<"barrier",
+  [Convergent, AttrSizedOperandSegments]> {
+  let arguments = (ins
     Optional<I32>:$barrierId,
     Optional<I32>:$numberOfThreads);
   string llvmBuilder = [{
@@ -598,7 +600,7 @@ def NVVM_BarrierOp : NVVM_Op<"barrier", [AttrSizedOperandSegments]> {
   ];
 }
 
-def NVVM_BarrierArriveOp : NVVM_PTXBuilder_Op<"barrier.arrive"> 
+def NVVM_BarrierArriveOp : NVVM_PTXBuilder_Op<"barrier.arrive", [Convergent]>
 {
   let arguments = (ins Optional<I32>:$barrierId, I32:$numberOfThreads);
 
@@ -624,7 +626,7 @@ def NVVM_BarrierArriveOp : NVVM_PTXBuilder_Op<"barrier.arrive">
   }];
 }
 
-def NVVM_ClusterArriveOp : NVVM_Op<"cluster.arrive"> {
+def NVVM_ClusterArriveOp : NVVM_Op<"cluster.arrive", [Convergent]> {
   let arguments = (ins OptionalAttr<UnitAttr>:$aligned);
 
   let summary = "Cluster Barrier Arrive Op";
@@ -647,7 +649,8 @@ def NVVM_ClusterArriveOp : NVVM_Op<"cluster.arrive"> {
   let assemblyFormat = "attr-dict";
 }
 
-def NVVM_ClusterArriveRelaxedOp : NVVM_Op<"cluster.arrive.relaxed", [NVVMRequiresSM<90>]> {
+def NVVM_ClusterArriveRelaxedOp : NVVM_Op<"cluster.arrive.relaxed",
+  [Convergent, NVVMRequiresSM<90>]> {
   let arguments = (ins OptionalAttr<UnitAttr>:$aligned);
 
   let summary = "Cluster Barrier Relaxed Arrive Op";
@@ -673,7 +676,8 @@ def NVVM_ClusterArriveRelaxedOp : NVVM_Op<"cluster.arrive.relaxed", [NVVMRequire
   let assemblyFormat = "attr-dict";
 }
 
-def NVVM_ClusterWaitOp : NVVM_Op<"cluster.wait", [NVVMRequiresSM<90>]> {
+def NVVM_ClusterWaitOp : NVVM_Op<"cluster.wait",
+  [Convergent, NVVMRequiresSM<90>]> {
   let arguments = (ins OptionalAttr<UnitAttr>:$aligned);
 
   let summary = "Cluster Barrier Wait Op";
@@ -1054,7 +1058,8 @@ def NVVM_CpAsyncWaitGroupOp : NVVM_Op<"cp.async.wait.group">,
   let assemblyFormat = "$n attr-dict";
 }
 
-def NVVM_CpAsyncMBarrierArriveOp : NVVM_Op<"cp.async.mbarrier.arrive"> {
+def NVVM_CpAsyncMBarrierArriveOp : NVVM_Op<"cp.async.mbarrier.arrive",
+  [Convergent]> {
   let summary = "NVVM Dialect Op for cp.async.mbarrier.arrive";
   let description = [{
     The `cp.async.mbarrier.arrive` Op makes the mbarrier object track
@@ -1079,7 +1084,8 @@ def NVVM_CpAsyncMBarrierArriveOp : NVVM_Op<"cp.async.mbarrier.arrive"> {
   }];
 }
 
-def NVVM_CpAsyncMBarrierArriveSharedOp : NVVM_Op<"cp.async.mbarrier.arrive.shared"> {
+def NVVM_CpAsyncMBarrierArriveSharedOp : NVVM_Op<"cp.async.mbarrier.arrive.shared",
+  [Convergent]> {
   let summary = "NVVM Dialect Op for cp.async.mbarrier.arrive.shared";
   let description = [{
     The `cp.async.mbarrier.arrive.shared` Op makes the mbarrier object
@@ -2806,7 +2812,8 @@ def NVVM_CpAsyncBulkSharedCTAToGlobalOp :
 // NVVM Wgmma Ops
 //===----------------------------------------------------------------------===//
 
-def NVVM_WgmmaFenceAlignedOp : NVVM_Op<"wgmma.fence.aligned", [NVVMRequiresSMa<[90]>]> {
+def NVVM_WgmmaFenceAlignedOp : NVVM_Op<"wgmma.fence.aligned",
+  [Convergent, NVVMRequiresSMa<[90]>]> {
   let arguments = (ins);
   let description = [{
     Enforce an ordering of register accesses between warpgroup level matrix 
@@ -2820,7 +2827,8 @@ def NVVM_WgmmaFenceAlignedOp : NVVM_Op<"wgmma.fence.aligned", [NVVMRequiresSMa<[
   }];
 }
 
-def NVVM_WgmmaGroupSyncAlignedOp : NVVM_Op<"wgmma.commit.group.sync.aligned", [NVVMRequiresSMa<[90]>]> {
+def NVVM_WgmmaGroupSyncAlignedOp : NVVM_Op<"wgmma.commit.group.sync.aligned",
+  [Convergent, NVVMRequiresSMa<[90]>]> {
   let assemblyFormat = "attr-dict";
   let description = [{
     Commits all prior uncommitted warpgroup level matrix multiplication operations.
@@ -2832,7 +2840,8 @@ def NVVM_WgmmaGroupSyncAlignedOp : NVVM_Op<"wgmma.commit.group.sync.aligned", [N
   }];
 }
 
-def NVVM_WgmmaWaitGroupSyncOp : NVVM_Op<"wgmma.wait.group.sync.aligned", [NVVMRequiresSMa<[90]>]> {
+def NVVM_WgmmaWaitGroupSyncOp : NVVM_Op<"wgmma.wait.group.sync.aligned",
+  [Convergent, NVVMRequiresSMa<[90]>]> {
   let arguments = (ins I64Attr:$group);
   let assemblyFormat = "attr-dict $group";
   let description = [{
diff --git a/mlir/include/mlir/Interfaces/ControlFlowInterfaces.h b/mlir/include/mlir/Interfaces/ControlFlowInterfaces.h
index d63800c12d132..750a9f86e49d7 100644
--- a/mlir/include/mlir/Interfaces/ControlFlowInterfaces.h
+++ b/mlir/include/mlir/Interfaces/ControlFlowInterfaces.h
@@ -337,8 +337,12 @@ struct ReturnLike : public TraitBase<ConcreteType, ReturnLike> {
     return success();
   }
 };
-} // namespace OpTrait
 
+// The Operation may not be made control-dependent on any additional values.
+// See https://llvm.org/docs/ConvergentOperations.html for more details.
+template <typename ConcreteType>
+struct Convergent : public TraitBase<ConcreteType, Convergent> {};
+} // namespace OpTrait
 } // namespace mlir
 
 //===----------------------------------------------------------------------===//
diff --git a/mlir/include/mlir/Interfaces/ControlFlowInterfaces.td b/mlir/include/mlir/Interfaces/ControlFlowInterfaces.td
index b8d08cc553caa..6eb2f9002d7cf 100644
--- a/mlir/include/mlir/Interfaces/ControlFlowInterfaces.td
+++ b/mlir/include/mlir/Interfaces/ControlFlowInterfaces.td
@@ -511,4 +511,8 @@ def ReturnLike : TraitList<[
     >
 ]>;
 
+// Use to inject an implementation of getSpeculatability.  Users should not use
+// this directly.
+def Convergent : NativeOpTrait<"Convergent">;
+
 #endif // MLIR_INTERFACES_CONTROLFLOWINTERFACES
diff --git a/mlir/test/lib/Dialect/Test/TestOps.td b/mlir/test/lib/Dialect/Test/TestOps.td
index 2eaad552a7a3a..ad14666a1a2cc 100644
--- a/mlir/test/lib/Dialect/Test/TestOps.td
+++ b/mlir/test/lib/Dialect/Test/TestOps.td
@@ -2113,6 +2113,11 @@ def TestTypeChangerOp : TEST_Op<"type_changer">,
 def TestValidOp : TEST_Op<"valid", [Terminator]>,
   Arguments<(ins Variadic<AnyType>)>;
 
+def TestConvergentOp : TEST_Op<"convergent", [Convergent]> {
+  let arguments = (ins AnyType);
+  let results = (outs AnyType);
+}
+
 def TestMergeBlocksOp : TEST_Op<"merge_blocks"> {
   let summary = "merge_blocks operation";
   let description = [{

@@ -1054,7 +1058,8 @@ def NVVM_CpAsyncWaitGroupOp : NVVM_Op<"cp.async.wait.group">,
let assemblyFormat = "$n attr-dict";
}

def NVVM_CpAsyncMBarrierArriveOp : NVVM_Op<"cp.async.mbarrier.arrive"> {
def NVVM_CpAsyncMBarrierArriveOp : NVVM_Op<"cp.async.mbarrier.arrive",
[Convergent]> {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure that these are actually convergent?
Actually it's not clear to me that any of the post-volta intrinsics needs to be.

See this comment: https://discourse.llvm.org/t/llvm-convergence-semantics/77642/12

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I follow your comment. Are you referring to cp.async.mbarrier.arrive ? If so, I don't see it has been discussed in that comment/thread.
As for cp.async.mbarrier.arrive, it's marked in LLVM with IntrConvergent.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If so, I don't see it has been discussed in that comment/thread.

The comment is a general one, so it potentially applies to all the intrinsics :)

As for cp.async.mbarrier.arrive, it's marked in LLVM with IntrConvergent.

I do have low confidence for LLVM current annotation as a reliable source of documentation right now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do have low confidence for LLVM current annotation as a reliable source of documentation right now.

That's valid point. At the same time having different traits set on LLVM intrinsic and MLIR operation, that is lowered to that intrinsic, looks not only confusing, but stinky.
Ideally, for low-level dialects, such as NVVM, it will be great just to query that property from the intrinsic it's lowered to, but that does require to have LLVMContext.

Specifically about convergent, my understanding, if intrinsics is marked with it, but in reality it's not convergent, it's only bad for performance, but stability-wise it's still correct.
But yeah, it will be great to go through all LLVM intrinsics and make sure proper traits are used.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@AlexMaclean @Prince781 that will be great to revisit nvvm's intrinsic properties. I know PTX doc definitely describes nvvm_barrier_cta_sync_aligned_all as convergent, but I couldn't find such statement for others

Copy link
Collaborator

@joker-eph joker-eph Aug 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes the ".aligned" specifier in PTX is what brings the convergence requirement.

But I'm not sure why cp.async.mbarrier.arrive for example would need to care about convergence?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's valid point. At the same time having different traits set on LLVM intrinsic and MLIR operation, that is lowered to that intrinsic, looks not only confusing, but stinky.

Absolutely: we should fix LLVM ;)
The problem is that NVVM non-aligned intrinsics should be modeled with some sort of write effects to a private memory: that could be pessimizing transformations (because special casing this everywhere isn't something LLVM has been optimizing for).

// The Operation may not be made control-dependent on any additional values.
// See https://llvm.org/docs/ConvergentOperations.html for more details.
template <typename ConcreteType>
struct Convergent : public TraitBase<ConcreteType, Convergent> {};
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any impact that MLIR is using structured control-flow in the support for convergence? In particular aren't token inserted during lowering to CFG to preserve loop structures for example?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no direct impact on absence of this in MLIR, but this trait is a prerequisite to properly annotate functions that invoke such operations. OtherwiseLLVM will apply optimizations incorrectly.
Specifically, the use case I saw was:

fn callee() {
  nvvm.barrier0
}

fn caller() {
  if (condition) {
    call callee()
  }
}

callee has to be marked with convergent attribute, otherwise if jump threading is applied, the code will become invalid.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OtherwiseLLVM will apply optimizations incorrectly.

For each Op that MLIR generates, LLVM sets convergency. So LLVM won't do incorrect optimization for MLIR.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For each Op that MLIR generates, LLVM sets convergency.

Are you sure we would do it inter-procedurally and annotate the actual call here?
Or are we relying on some attribute propagation later?

@npanchen : when mapping a structured control-flow to a CFG, just annotating things as "convergent" isn't enough I believe. See this example: https://youtu.be/_Z5DuiVCFAw?t=434 ; basically structured control-flow needs to say "something" about the reconvergence property and the management of anchors (the video is amazing for explaining all this I think!)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the convergent property is only set on the intrinsic, not on arbitrary function that calls to that intrinsic. Since LLVM does not infer that property automatically, that can result to undesired optimization.
Just to highlight that JT works differently for the case I described above: https://godbolt.org/z/9T4d7MM58. To note, the IR there looks ok, but the problem I observed is similar, but just has more complex CF.

Also, the MLIR trait seems generally useful to prevent, say, loop multiversioning by trip count if loop contains such operation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@npanchen : when mapping a structured control-flow to a CFG, just annotating things as "convergent" isn't enough I believe. See this example: https://youtu.be/_Z5DuiVCFAw?t=434 ; basically structured control-flow needs to say "something" about the reconvergence property and the management of anchors (the video is amazing for explaining all this I think!)

Thanks, that's quite interesting video. I actually was unable to find when convergence.loop/convergence.entry/convergence.anchor are generated. Are they even still in use ?
If they are, yeah, that might be more interesting from design point of view. As of know I can only think about adding RecursivelyConvergent trait. But not sure that's a good idea for functions.

completely side note: there's also noduplicate property, which seems not to be used a lot, but it's not supported by MLIR and it does directly impact inliner.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are emitted by clang, search for shouldEmitConvergenceTokens() there, you'll see the kind of things I have in mind for our structured-to-cfg needs (and actually: a semantics definition of convergence guarantee on our structured control-flow)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. I was not searching right.

@grypp grypp requested a review from durga4github August 7, 2025 13:08
def NVVM_BarrierOp : NVVM_Op<"barrier", [AttrSizedOperandSegments]> {
let arguments = (ins
def NVVM_BarrierOp : NVVM_Op<"barrier",
[Convergent, AttrSizedOperandSegments]> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This instruction can synchronize different thread counts. Could you clarify what convergent refers to in this context? I may be missing the nuance—does it mean a convergent warp or a convergent CTA?

nvvm.barrier 32
nvvm.barrier 128

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's is based on LLVM's assumption that intrinsics are convergent.
See my other reply about consistency of convergent properties.

// The Operation may not be made control-dependent on any additional values.
// See https://llvm.org/docs/ConvergentOperations.html for more details.
template <typename ConcreteType>
struct Convergent : public TraitBase<ConcreteType, Convergent> {};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OtherwiseLLVM will apply optimizations incorrectly.

For each Op that MLIR generates, LLVM sets convergency. So LLVM won't do incorrect optimization for MLIR.

@npanchen
Copy link
Contributor Author

npanchen commented Aug 7, 2025

@joker-eph @grypp is there anything I need to address in this PR ?

@joker-eph
Copy link
Collaborator

joker-eph commented Aug 7, 2025

All the discussions we're having above looks like things to close on.

Right now you're adding convergent on a bunch of operations which are not convergent as far as I know.

Also adding a trait on something for which we don't have a good definition or an idea of how it'll work with our structured control-flow does not seem like something I'm comfortable to do: I think the plan needs to be more fleshed out.

What concrete problem is this patch solving for you right now?

@npanchen
Copy link
Contributor Author

npanchen commented Aug 8, 2025

All the discussions we're having above looks like things to close on.

Right now you're adding convergent on a bunch of operations which are not convergent as far as I know.
What concrete problem is this patch solving for you right now?

The exact issue is

fn callee() {
  nvvm.barrier0
}

fn caller() {
  if (condition) {
    call callee()
  }
}

I'm ok-ish to add new trait to nvvm.barrier0. But it will be great if you can poke your colleagues to revisit remaining intrinsics so that remaining MLIR ops can be updated properly.

Also adding a trait on something for which we don't have a good definition or an idea of how it'll work with our structured control-flow does not seem like something I'm comfortable to do: I think the plan needs to be more fleshed out.

On one hand I agree with you that adding something that might change in future isn't good. On the other that's hard to reason about proper design without having convergent CF examples to support.
Since the change is quite minimal I don't think it's a big problem to be replaced with a proper design later (that assumes RecursivelyConvergent won't be right approach and trait I'm adding won't fit in that right approach too)

@grypp
Copy link
Member

grypp commented Aug 9, 2025

@npanchen Do you need this trait only for nvvm.barrier0? This is a special op that every thread in a CTA must run, so it’s convergent.
Do you have another op that you think also requires it in your case?

@joker-eph
Copy link
Collaborator

The exact issue is
...

You're showing me some pseudo source program, but I'm not sure what is the actual issue in MLIR and how this patch solves it right now.

On one hand I agree with you that adding something that might change in future isn't good.

This isn't exactly what I am saying I believe: I'm saying that it isn't good to add something that isn't well defined and documented, for which we can't tell users about how to use it.

On the other that's hard to reason about proper design without having convergent CF examples to support.

Aren't the example from the presentation and the Clang codegen a good reference about the examples to support?

Of course we could work around it by saying that "Convergent" is defined exactly as LLVM, and actually restrict it to LLVM function with flat CFGs only. But then that becomes a property of the LLVM layer and not something central in MLIR.

@npanchen
Copy link
Contributor Author

The exact issue is
...

You're showing me some pseudo source program, but I'm not sure what is the actual issue in MLIR and how this patch solves it right now.

Ok, let me reiterate previous comments: the MLIR pseudo-code I provided as-is leads to invalid optimization done by jump threading, because callee is not marked with convergent attribute. The goal of that PR is to add special trait that some special pass can easily rely on to propagate convergency through a call graph. That is, to do something like that:

if (op->hasTrait<Convergent>() || /*if calls llvm.func that is convergent*/) {
  op->getParentOfType<LLVMFuncOp>()->setConvergent();
}

Intentionally use LLVMFuncOp as it does have convergent attr that can be used in "the special pass".

This isn't exactly what I am saying I believe: I'm saying that it isn't good to add something that isn't well defined and documented, for which we can't tell users about how to use it.
Aren't the example from the presentation and the Clang codegen a good reference about the examples to support?

I'm not sure I follow you. What exactly is not defined in added trait ? The added trait is defined in LLVM's term with a LLVM's link to convergent semantics. Is there anything else you want me to clarify/add to the doc ?

Of course we could work around it by saying that "Convergent" is defined exactly as LLVM, and actually restrict it to LLVM function with flat CFGs only. But then that becomes a property of the LLVM layer and not something central in MLIR.

IMO defining it "as LLVM" is totally fine first step. IIRC, SideEffect trait was decomposed into several other traits, so if convergent trait is "too generic" that can be split into other traits.
But if you do have some other ideas in mind that will be more generic and still cover LLVM's convergent, I'll be glad to hear them.

def TestConvergentOp : TEST_Op<"convergent", [Convergent]> {
let arguments = (ins AnyType);
let results = (outs AnyType);
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this doing right now? Without an actual test exercising this op, adding an op isn't useful.

@joker-eph
Copy link
Collaborator

Ok, let me reiterate previous comments: the MLIR pseudo-code I provided as-is leads to invalid optimization done by jump threading, because callee is not marked with convergent attribute

There is no jump threading in MLIR and your patch does not show any tests: so that's not a clear answer to "I'm not sure what is the actual issue in MLIR and how this patch solves it right now".

I'm not sure I follow you. What exactly is not defined in added trait ? The added trait is defined in LLVM's term with a LLVM's link to convergent semantics. Is there anything else you want me to clarify/add to the doc ?

As mentioned before: LLVM does not have structured control-flow, so all the definitions from LLVM need to be revisited.

But if you do have some other ideas in mind that will be more generic and still cover LLVM's convergent, I'll be glad to hear them.

As mentioned before when I pointed what clang does: making it more generic requires new definitions and looking into the semantics we want for structured control-flow structure.

IMO defining it "as LLVM" is totally fine first step.

That means you need to move all this away from mlir/include/mlir/Interfaces/ControlFlowInterfaces.h into the LLVM dialect.

@npanchen
Copy link
Contributor Author

@npanchen Do you need this trait only for nvvm.barrier0? This is a special op that every thread in a CTA must run, so it’s convergent. Do you have another op that you think also requires it in your case?

I cannot definitively answer yes or no. At least my current workaround, to propagate convergency through a call graph, only relies on nvvm.barrier0. We do use some other operations I've previously marked as convergent, but I don't see any runtime issues so far.
fyi: nvvm.barrier0 problem was uncovered recently due to recent changes in JT

@npanchen
Copy link
Contributor Author

There is no jump threading in MLIR and your patch does not show any tests: so that's not a clear answer to "I'm not sure what is the actual issue in MLIR and how this patch solves it right now".

Yes, that's what I said before too: there's no direct impact on MLIR, but it does impact optimizations done later on generated LLVM IR.

As mentioned before: LLVM does not have structured control-flow, so all the definitions from LLVM need to be revisited.
As mentioned before when I pointed what clang does: making it more generic requires new definitions and looking into the semantics we want for structured control-flow structure.

Ok, not to go over same things multiple times: do I get you right that for you such trait has to be defined from all possible angles (structured cf, functions, simple op) in a single PR ?

That means you need to move all this away from mlir/include/mlir/Interfaces/ControlFlowInterfaces.h into the LLVM dialect.

Thanks, that looks reasonable to move it there.

@joker-eph
Copy link
Collaborator

Ok, not to go over same things multiple times: do I get you right that for you such trait has to be defined from all possible angles (structured cf, functions, simple op) in a single PR ?

To be added to the system: yes.
To just mirror what LLVM does, restricted to flat-CFG on the LLVM dialect, it can be OK otherwise.

Yes, that's what I said #152358 (comment): there's no direct impact on MLIR, but it does impact optimizations done later on generated LLVM IR.

Do you intend to write a MLIR pass that infer a convergent attribute on llvm.call operation by traversing the IR inteprocedurally? Isn't just the LLVM Attributor pass already doing this on LLVM IR and so you can do this after translation to LLVM IR today instead?

@npanchen
Copy link
Contributor Author

To be added to the system: yes. To just mirror what LLVM does, restricted to flat-CFG on the LLVM dialect, it can be OK otherwise.

Sounds good.

Do you intend to write a MLIR pass that infer a convergent attribute on llvm.call operation by traversing the IR inteprocedurally?

I definitely can add to upstream for completeness even though internally we won't use it. However, if you or someone else is planning to work on convergency for higher level dialects, this pass will be redundant.

Isn't just the LLVM Attributor pass already doing this on LLVM IR and so you can do this after translation to LLVM IR today instead?

That's what I was unable to find. I do see it can promote convergent functions to non-convergent, but not other way around.
IMO, that does make sense as IR should be legal from the beginning, i.e. expecting some pass to add it isn't correct.

@joker-eph
Copy link
Collaborator

I definitely can add to upstream for completeness even though internally we won't use it.

I was more interested in understanding if this is how you plan to use this?

It's really difficult to review a feature without a use-case, and most of the time use-cases are "here is the transformation that will use it, and how it will be used". You haven't really been explaining this clearly to me, and your answer quoted above is keeping this a bit fuzzy to me.

@joker-eph
Copy link
Collaborator

IMO, that does make sense as IR should be legal from the beginning, i.e. expecting some pass to add it isn't correct.

Right, so in MLIR the safe thing to do should be to add the convergent attributes to every call/function unless we know the function does not need it (which can be an information provided as a flag, as target information, etc.).
The attributor should remove it correctly later.

@npanchen
Copy link
Contributor Author

I was more interested in understanding if this is how you plan to use this?
It's really difficult to review a feature without a use-case, and most of the time use-cases are "here is the transformation that will use it, and how it will be used". You haven't really been explaining this clearly to me, and your answer quoted above is keeping this a bit fuzzy to me.

Need to differentiate plan before and after feedback to move to LLVM dialect.
"Before" suggestion, as I mentioned, that's prerequisite to annotate functions with this property too. I would definitely work on this propagation.
"After" suggestion, the pass that only works on LLVM Dialect makes less sense for internal needs. Internally we still have to propagate this early, so the only benefit from that patch is convenience of querying that property from the operation.

As for the use-case point, I don't really understand what you would like to see. Are you asking for -> LLVM Dialect -> LLVM IR -> LLVM opt example ?

@ftynse
Copy link
Member

ftynse commented Aug 12, 2025

I agree with @joker-eph that if the convergent property were to be added to core IR of MLIR (lib/Interfaces is part of that), it needs to be designed with the full generality of MLIR in mind including structured control flow (which is not limited to the SCF dialect and is expressed using interfaces) and non-Nvidia targets. We actually had discussions about modeling this via side effects on terminators if I'm not mistaken, but I don't think it led to any particular implementation.

Having a trait on the LLVM and derived dialects, such as NVVM, also makes sense because LLVM IR has such a concept.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants