-
Notifications
You must be signed in to change notification settings - Fork 14.7k
[mlir] Added Convergent
trait that matches LLVM's semantics
#152358
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
LLVM provides `convergent` function attribute that says call to it must not be made control-dependent on any new condition. For example, that attribute disables jump threading that otherwise can lead to runtime errors or dead lock. See https://llvm.org/docs/ConvergentOperations.html for more details. It appears that MLIR does not provide a trait for this even though some operations, such as `nvvm.barrier0` is convergent due it lowering to `llvm::Intrinsic::nvvm_barrier_cta_sync_aligned_all`. The patch adds `Convergent` trait to `ControlFlowInterface` (IMO, that's appropriate place for this trait) and adds that trait to some NVVM operations that are lowered to convergent LLVM Intrinsic.
@llvm/pr-subscribers-mlir @llvm/pr-subscribers-mlir-llvm Author: Nikolay Panchenko (npanchen) ChangesLLVM provides It appears that MLIR does not provide a trait for this even though some operations, such as The patch adds Full diff: https://github.com/llvm/llvm-project/pull/152358.diff 4 Files Affected:
diff --git a/mlir/include/mlir/Dialect/LLVMIR/NVVMOps.td b/mlir/include/mlir/Dialect/LLVMIR/NVVMOps.td
index 30df3b739e5ca..e95328398fe0c 100644
--- a/mlir/include/mlir/Dialect/LLVMIR/NVVMOps.td
+++ b/mlir/include/mlir/Dialect/LLVMIR/NVVMOps.td
@@ -14,12 +14,13 @@
#define NVVMIR_OPS
include "mlir/IR/EnumAttr.td"
+include "mlir/Interfaces/ControlFlowInterfaces.td"
+include "mlir/Interfaces/InferIntRangeInterface.td"
+include "mlir/Interfaces/SideEffectInterfaces.td"
include "mlir/Dialect/GPU/IR/CompilationAttrInterfaces.td"
include "mlir/Dialect/LLVMIR/LLVMOpBase.td"
include "mlir/Dialect/LLVMIR/NVVMRequiresSMTraits.td"
-include "mlir/Interfaces/SideEffectInterfaces.td"
include "mlir/Dialect/LLVMIR/BasicPtxBuilderInterface.td"
-include "mlir/Interfaces/InferIntRangeInterface.td"
include "mlir/Dialect/LLVMIR/LLVMTypes.td"
def LLVM_PointerGeneric : LLVM_PointerInAddressSpace<0>;
@@ -561,7 +562,7 @@ def NVVM_MBarrierTestWaitSharedOp : NVVM_Op<"mbarrier.test.wait.shared">,
// NVVM synchronization op definitions
//===----------------------------------------------------------------------===//
-def NVVM_Barrier0Op : NVVM_Op<"barrier0"> {
+def NVVM_Barrier0Op : NVVM_Op<"barrier0", [Convergent]> {
let assemblyFormat = "attr-dict";
string llvmBuilder = [{
createIntrinsicCall(
@@ -570,8 +571,9 @@ def NVVM_Barrier0Op : NVVM_Op<"barrier0"> {
}];
}
-def NVVM_BarrierOp : NVVM_Op<"barrier", [AttrSizedOperandSegments]> {
- let arguments = (ins
+def NVVM_BarrierOp : NVVM_Op<"barrier",
+ [Convergent, AttrSizedOperandSegments]> {
+ let arguments = (ins
Optional<I32>:$barrierId,
Optional<I32>:$numberOfThreads);
string llvmBuilder = [{
@@ -598,7 +600,7 @@ def NVVM_BarrierOp : NVVM_Op<"barrier", [AttrSizedOperandSegments]> {
];
}
-def NVVM_BarrierArriveOp : NVVM_PTXBuilder_Op<"barrier.arrive">
+def NVVM_BarrierArriveOp : NVVM_PTXBuilder_Op<"barrier.arrive", [Convergent]>
{
let arguments = (ins Optional<I32>:$barrierId, I32:$numberOfThreads);
@@ -624,7 +626,7 @@ def NVVM_BarrierArriveOp : NVVM_PTXBuilder_Op<"barrier.arrive">
}];
}
-def NVVM_ClusterArriveOp : NVVM_Op<"cluster.arrive"> {
+def NVVM_ClusterArriveOp : NVVM_Op<"cluster.arrive", [Convergent]> {
let arguments = (ins OptionalAttr<UnitAttr>:$aligned);
let summary = "Cluster Barrier Arrive Op";
@@ -647,7 +649,8 @@ def NVVM_ClusterArriveOp : NVVM_Op<"cluster.arrive"> {
let assemblyFormat = "attr-dict";
}
-def NVVM_ClusterArriveRelaxedOp : NVVM_Op<"cluster.arrive.relaxed", [NVVMRequiresSM<90>]> {
+def NVVM_ClusterArriveRelaxedOp : NVVM_Op<"cluster.arrive.relaxed",
+ [Convergent, NVVMRequiresSM<90>]> {
let arguments = (ins OptionalAttr<UnitAttr>:$aligned);
let summary = "Cluster Barrier Relaxed Arrive Op";
@@ -673,7 +676,8 @@ def NVVM_ClusterArriveRelaxedOp : NVVM_Op<"cluster.arrive.relaxed", [NVVMRequire
let assemblyFormat = "attr-dict";
}
-def NVVM_ClusterWaitOp : NVVM_Op<"cluster.wait", [NVVMRequiresSM<90>]> {
+def NVVM_ClusterWaitOp : NVVM_Op<"cluster.wait",
+ [Convergent, NVVMRequiresSM<90>]> {
let arguments = (ins OptionalAttr<UnitAttr>:$aligned);
let summary = "Cluster Barrier Wait Op";
@@ -1054,7 +1058,8 @@ def NVVM_CpAsyncWaitGroupOp : NVVM_Op<"cp.async.wait.group">,
let assemblyFormat = "$n attr-dict";
}
-def NVVM_CpAsyncMBarrierArriveOp : NVVM_Op<"cp.async.mbarrier.arrive"> {
+def NVVM_CpAsyncMBarrierArriveOp : NVVM_Op<"cp.async.mbarrier.arrive",
+ [Convergent]> {
let summary = "NVVM Dialect Op for cp.async.mbarrier.arrive";
let description = [{
The `cp.async.mbarrier.arrive` Op makes the mbarrier object track
@@ -1079,7 +1084,8 @@ def NVVM_CpAsyncMBarrierArriveOp : NVVM_Op<"cp.async.mbarrier.arrive"> {
}];
}
-def NVVM_CpAsyncMBarrierArriveSharedOp : NVVM_Op<"cp.async.mbarrier.arrive.shared"> {
+def NVVM_CpAsyncMBarrierArriveSharedOp : NVVM_Op<"cp.async.mbarrier.arrive.shared",
+ [Convergent]> {
let summary = "NVVM Dialect Op for cp.async.mbarrier.arrive.shared";
let description = [{
The `cp.async.mbarrier.arrive.shared` Op makes the mbarrier object
@@ -2806,7 +2812,8 @@ def NVVM_CpAsyncBulkSharedCTAToGlobalOp :
// NVVM Wgmma Ops
//===----------------------------------------------------------------------===//
-def NVVM_WgmmaFenceAlignedOp : NVVM_Op<"wgmma.fence.aligned", [NVVMRequiresSMa<[90]>]> {
+def NVVM_WgmmaFenceAlignedOp : NVVM_Op<"wgmma.fence.aligned",
+ [Convergent, NVVMRequiresSMa<[90]>]> {
let arguments = (ins);
let description = [{
Enforce an ordering of register accesses between warpgroup level matrix
@@ -2820,7 +2827,8 @@ def NVVM_WgmmaFenceAlignedOp : NVVM_Op<"wgmma.fence.aligned", [NVVMRequiresSMa<[
}];
}
-def NVVM_WgmmaGroupSyncAlignedOp : NVVM_Op<"wgmma.commit.group.sync.aligned", [NVVMRequiresSMa<[90]>]> {
+def NVVM_WgmmaGroupSyncAlignedOp : NVVM_Op<"wgmma.commit.group.sync.aligned",
+ [Convergent, NVVMRequiresSMa<[90]>]> {
let assemblyFormat = "attr-dict";
let description = [{
Commits all prior uncommitted warpgroup level matrix multiplication operations.
@@ -2832,7 +2840,8 @@ def NVVM_WgmmaGroupSyncAlignedOp : NVVM_Op<"wgmma.commit.group.sync.aligned", [N
}];
}
-def NVVM_WgmmaWaitGroupSyncOp : NVVM_Op<"wgmma.wait.group.sync.aligned", [NVVMRequiresSMa<[90]>]> {
+def NVVM_WgmmaWaitGroupSyncOp : NVVM_Op<"wgmma.wait.group.sync.aligned",
+ [Convergent, NVVMRequiresSMa<[90]>]> {
let arguments = (ins I64Attr:$group);
let assemblyFormat = "attr-dict $group";
let description = [{
diff --git a/mlir/include/mlir/Interfaces/ControlFlowInterfaces.h b/mlir/include/mlir/Interfaces/ControlFlowInterfaces.h
index d63800c12d132..750a9f86e49d7 100644
--- a/mlir/include/mlir/Interfaces/ControlFlowInterfaces.h
+++ b/mlir/include/mlir/Interfaces/ControlFlowInterfaces.h
@@ -337,8 +337,12 @@ struct ReturnLike : public TraitBase<ConcreteType, ReturnLike> {
return success();
}
};
-} // namespace OpTrait
+// The Operation may not be made control-dependent on any additional values.
+// See https://llvm.org/docs/ConvergentOperations.html for more details.
+template <typename ConcreteType>
+struct Convergent : public TraitBase<ConcreteType, Convergent> {};
+} // namespace OpTrait
} // namespace mlir
//===----------------------------------------------------------------------===//
diff --git a/mlir/include/mlir/Interfaces/ControlFlowInterfaces.td b/mlir/include/mlir/Interfaces/ControlFlowInterfaces.td
index b8d08cc553caa..6eb2f9002d7cf 100644
--- a/mlir/include/mlir/Interfaces/ControlFlowInterfaces.td
+++ b/mlir/include/mlir/Interfaces/ControlFlowInterfaces.td
@@ -511,4 +511,8 @@ def ReturnLike : TraitList<[
>
]>;
+// Use to inject an implementation of getSpeculatability. Users should not use
+// this directly.
+def Convergent : NativeOpTrait<"Convergent">;
+
#endif // MLIR_INTERFACES_CONTROLFLOWINTERFACES
diff --git a/mlir/test/lib/Dialect/Test/TestOps.td b/mlir/test/lib/Dialect/Test/TestOps.td
index 2eaad552a7a3a..ad14666a1a2cc 100644
--- a/mlir/test/lib/Dialect/Test/TestOps.td
+++ b/mlir/test/lib/Dialect/Test/TestOps.td
@@ -2113,6 +2113,11 @@ def TestTypeChangerOp : TEST_Op<"type_changer">,
def TestValidOp : TEST_Op<"valid", [Terminator]>,
Arguments<(ins Variadic<AnyType>)>;
+def TestConvergentOp : TEST_Op<"convergent", [Convergent]> {
+ let arguments = (ins AnyType);
+ let results = (outs AnyType);
+}
+
def TestMergeBlocksOp : TEST_Op<"merge_blocks"> {
let summary = "merge_blocks operation";
let description = [{
|
@@ -1054,7 +1058,8 @@ def NVVM_CpAsyncWaitGroupOp : NVVM_Op<"cp.async.wait.group">, | |||
let assemblyFormat = "$n attr-dict"; | |||
} | |||
|
|||
def NVVM_CpAsyncMBarrierArriveOp : NVVM_Op<"cp.async.mbarrier.arrive"> { | |||
def NVVM_CpAsyncMBarrierArriveOp : NVVM_Op<"cp.async.mbarrier.arrive", | |||
[Convergent]> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure that these are actually convergent?
Actually it's not clear to me that any of the post-volta intrinsics needs to be.
See this comment: https://discourse.llvm.org/t/llvm-convergence-semantics/77642/12
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure I follow your comment. Are you referring to cp.async.mbarrier.arrive
? If so, I don't see it has been discussed in that comment/thread.
As for cp.async.mbarrier.arrive
, it's marked in LLVM with IntrConvergent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If so, I don't see it has been discussed in that comment/thread.
The comment is a general one, so it potentially applies to all the intrinsics :)
As for cp.async.mbarrier.arrive, it's marked in LLVM with IntrConvergent.
I do have low confidence for LLVM current annotation as a reliable source of documentation right now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do have low confidence for LLVM current annotation as a reliable source of documentation right now.
That's valid point. At the same time having different traits set on LLVM intrinsic and MLIR operation, that is lowered to that intrinsic, looks not only confusing, but stinky.
Ideally, for low-level dialects, such as NVVM, it will be great just to query that property from the intrinsic it's lowered to, but that does require to have LLVMContext.
Specifically about convergent
, my understanding, if intrinsics is marked with it, but in reality it's not convergent, it's only bad for performance, but stability-wise it's still correct.
But yeah, it will be great to go through all LLVM intrinsics and make sure proper traits are used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@AlexMaclean @Prince781 that will be great to revisit nvvm's intrinsic properties. I know PTX doc definitely describes nvvm_barrier_cta_sync_aligned_all
as convergent, but I couldn't find such statement for others
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes the ".aligned" specifier in PTX is what brings the convergence requirement.
But I'm not sure why cp.async.mbarrier.arrive
for example would need to care about convergence?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's valid point. At the same time having different traits set on LLVM intrinsic and MLIR operation, that is lowered to that intrinsic, looks not only confusing, but stinky.
Absolutely: we should fix LLVM ;)
The problem is that NVVM non-aligned intrinsics should be modeled with some sort of write effects to a private memory: that could be pessimizing transformations (because special casing this everywhere isn't something LLVM has been optimizing for).
// The Operation may not be made control-dependent on any additional values. | ||
// See https://llvm.org/docs/ConvergentOperations.html for more details. | ||
template <typename ConcreteType> | ||
struct Convergent : public TraitBase<ConcreteType, Convergent> {}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any impact that MLIR is using structured control-flow in the support for convergence? In particular aren't token inserted during lowering to CFG to preserve loop structures for example?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's no direct impact on absence of this in MLIR, but this trait is a prerequisite to properly annotate functions that invoke such operations. OtherwiseLLVM will apply optimizations incorrectly.
Specifically, the use case I saw was:
fn callee() {
nvvm.barrier0
}
fn caller() {
if (condition) {
call callee()
}
}
callee
has to be marked with convergent
attribute, otherwise if jump threading is applied, the code will become invalid.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OtherwiseLLVM will apply optimizations incorrectly.
For each Op that MLIR generates, LLVM sets convergency
. So LLVM won't do incorrect optimization for MLIR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For each Op that MLIR generates, LLVM sets convergency.
Are you sure we would do it inter-procedurally and annotate the actual call here?
Or are we relying on some attribute propagation later?
@npanchen : when mapping a structured control-flow to a CFG, just annotating things as "convergent" isn't enough I believe. See this example: https://youtu.be/_Z5DuiVCFAw?t=434 ; basically structured control-flow needs to say "something" about the reconvergence property and the management of anchors (the video is amazing for explaining all this I think!)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the convergent
property is only set on the intrinsic, not on arbitrary function that calls to that intrinsic. Since LLVM does not infer that property automatically, that can result to undesired optimization.
Just to highlight that JT works differently for the case I described above: https://godbolt.org/z/9T4d7MM58. To note, the IR there looks ok, but the problem I observed is similar, but just has more complex CF.
Also, the MLIR trait seems generally useful to prevent, say, loop multiversioning by trip count if loop contains such operation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@npanchen : when mapping a structured control-flow to a CFG, just annotating things as "convergent" isn't enough I believe. See this example: https://youtu.be/_Z5DuiVCFAw?t=434 ; basically structured control-flow needs to say "something" about the reconvergence property and the management of anchors (the video is amazing for explaining all this I think!)
Thanks, that's quite interesting video. I actually was unable to find when convergence.loop
/convergence.entry
/convergence.anchor
are generated. Are they even still in use ?
If they are, yeah, that might be more interesting from design point of view. As of know I can only think about adding RecursivelyConvergent
trait. But not sure that's a good idea for functions.
completely side note: there's also noduplicate
property, which seems not to be used a lot, but it's not supported by MLIR and it does directly impact inliner.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They are emitted by clang, search for shouldEmitConvergenceTokens()
there, you'll see the kind of things I have in mind for our structured-to-cfg needs (and actually: a semantics definition of convergence guarantee on our structured control-flow)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. I was not searching right.
def NVVM_BarrierOp : NVVM_Op<"barrier", [AttrSizedOperandSegments]> { | ||
let arguments = (ins | ||
def NVVM_BarrierOp : NVVM_Op<"barrier", | ||
[Convergent, AttrSizedOperandSegments]> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This instruction can synchronize different thread counts. Could you clarify what convergent
refers to in this context? I may be missing the nuance—does it mean a convergent warp or a convergent CTA?
nvvm.barrier 32
nvvm.barrier 128
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's is based on LLVM's assumption that intrinsics are convergent.
See my other reply about consistency of convergent properties.
// The Operation may not be made control-dependent on any additional values. | ||
// See https://llvm.org/docs/ConvergentOperations.html for more details. | ||
template <typename ConcreteType> | ||
struct Convergent : public TraitBase<ConcreteType, Convergent> {}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OtherwiseLLVM will apply optimizations incorrectly.
For each Op that MLIR generates, LLVM sets convergency
. So LLVM won't do incorrect optimization for MLIR.
@joker-eph @grypp is there anything I need to address in this PR ? |
All the discussions we're having above looks like things to close on. Right now you're adding convergent on a bunch of operations which are not convergent as far as I know. Also adding a trait on something for which we don't have a good definition or an idea of how it'll work with our structured control-flow does not seem like something I'm comfortable to do: I think the plan needs to be more fleshed out. What concrete problem is this patch solving for you right now? |
The exact issue is
I'm ok-ish to add new trait to
On one hand I agree with you that adding something that might change in future isn't good. On the other that's hard to reason about proper design without having convergent CF examples to support. |
@npanchen Do you need this trait only for nvvm.barrier0? This is a special op that every thread in a CTA must run, so it’s convergent. |
You're showing me some pseudo source program, but I'm not sure what is the actual issue in MLIR and how this patch solves it right now.
This isn't exactly what I am saying I believe: I'm saying that it isn't good to add something that isn't well defined and documented, for which we can't tell users about how to use it.
Aren't the example from the presentation and the Clang codegen a good reference about the examples to support? Of course we could work around it by saying that "Convergent" is defined exactly as LLVM, and actually restrict it to LLVM function with flat CFGs only. But then that becomes a property of the LLVM layer and not something central in MLIR. |
Ok, let me reiterate previous comments: the MLIR pseudo-code I provided as-is leads to invalid optimization done by jump threading, because
Intentionally use LLVMFuncOp as it does have
I'm not sure I follow you. What exactly is not defined in added trait ? The added trait is defined in LLVM's term with a LLVM's link to convergent semantics. Is there anything else you want me to clarify/add to the doc ?
IMO defining it "as LLVM" is totally fine first step. IIRC, |
def TestConvergentOp : TEST_Op<"convergent", [Convergent]> { | ||
let arguments = (ins AnyType); | ||
let results = (outs AnyType); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is this doing right now? Without an actual test exercising this op, adding an op isn't useful.
There is no jump threading in MLIR and your patch does not show any tests: so that's not a clear answer to "I'm not sure what is the actual issue in MLIR and how this patch solves it right now".
As mentioned before: LLVM does not have structured control-flow, so all the definitions from LLVM need to be revisited.
As mentioned before when I pointed what clang does: making it more generic requires new definitions and looking into the semantics we want for structured control-flow structure.
That means you need to move all this away from |
I cannot definitively answer yes or no. At least my current workaround, to propagate convergency through a call graph, only relies on |
Yes, that's what I said before too: there's no direct impact on MLIR, but it does impact optimizations done later on generated LLVM IR.
Ok, not to go over same things multiple times: do I get you right that for you such trait has to be defined from all possible angles (structured cf, functions, simple op) in a single PR ?
Thanks, that looks reasonable to move it there. |
To be added to the system: yes.
Do you intend to write a MLIR pass that infer a convergent attribute on |
Sounds good.
I definitely can add to upstream for completeness even though internally we won't use it. However, if you or someone else is planning to work on convergency for higher level dialects, this pass will be redundant.
That's what I was unable to find. I do see it can promote convergent functions to non-convergent, but not other way around. |
I was more interested in understanding if this is how you plan to use this? It's really difficult to review a feature without a use-case, and most of the time use-cases are "here is the transformation that will use it, and how it will be used". You haven't really been explaining this clearly to me, and your answer quoted above is keeping this a bit fuzzy to me. |
Right, so in MLIR the safe thing to do should be to add the convergent attributes to every call/function unless we know the function does not need it (which can be an information provided as a flag, as target information, etc.). |
Need to differentiate plan before and after feedback to move to LLVM dialect. As for the use-case point, I don't really understand what you would like to see. Are you asking for -> LLVM Dialect -> LLVM IR -> LLVM opt example ? |
I agree with @joker-eph that if the convergent property were to be added to core IR of MLIR (lib/Interfaces is part of that), it needs to be designed with the full generality of MLIR in mind including structured control flow (which is not limited to the SCF dialect and is expressed using interfaces) and non-Nvidia targets. We actually had discussions about modeling this via side effects on terminators if I'm not mistaken, but I don't think it led to any particular implementation. Having a trait on the LLVM and derived dialects, such as NVVM, also makes sense because LLVM IR has such a concept. |
LLVM provides
convergent
function attribute that says call to it must not be made control-dependent on any new condition. For example, that attribute disables jump threading that otherwise can lead to runtime errors or dead lock.See https://llvm.org/docs/ConvergentOperations.html for more details.
It appears that MLIR does not provide a trait for this even though some operations, such as
nvvm.barrier0
is convergent due it lowering tollvm::Intrinsic::nvvm_barrier_cta_sync_aligned_all
.The patch adds
Convergent
trait toControlFlowInterface
(IMO, that's appropriate place for this trait) and adds that trait to some NVVM operations that are lowered to convergent LLVM Intrinsic.