Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
48 commits
Select commit Hold shift + click to select a range
f9e5a7c
[Intrinsics][AArch64] Add intrinsic to mask off aliasing vector lanes
SamTebbs33 Nov 15, 2024
071728f
Rework lowering location
SamTebbs33 Jan 10, 2025
80a72ca
Fix ISD node name string and remove shouldExpand function
SamTebbs33 Jan 15, 2025
daa2ac4
Format
SamTebbs33 Jan 16, 2025
3fcb9e8
Move promote case
SamTebbs33 Jan 27, 2025
6628a98
Fix tablegen comment
SamTebbs33 Jan 27, 2025
0644542
Remove DAGTypeLegalizer::
SamTebbs33 Jan 27, 2025
75af361
Use getConstantOperandVal
SamTebbs33 Jan 27, 2025
5f563d9
Remove isPredicateCCSettingOp case
SamTebbs33 Jan 29, 2025
24df6bf
Remove overloads for pointer and element size parameters
SamTebbs33 Jan 30, 2025
ec37dfa
Clarify elementSize and writeAfterRead = 0
SamTebbs33 Jan 30, 2025
8d81955
Add i=0 to VF-1
SamTebbs33 Jan 30, 2025
8a09412
Rename to get.nonalias.lane.mask
SamTebbs33 Jan 30, 2025
45cbaff
Fix pointer types in example
SamTebbs33 Jan 30, 2025
1b7b0da
Remove shouldExpandGetAliasLaneMask
SamTebbs33 Jan 30, 2025
0a0de88
Lower to ISD node rather than intrinsic
SamTebbs33 Jan 30, 2025
54d32ad
Rename to noalias
SamTebbs33 Jan 31, 2025
2066929
Rename to loop.dependence.raw/war.mask
SamTebbs33 Feb 26, 2025
9b3a71a
Rename in langref
SamTebbs33 Mar 10, 2025
215d2e7
Reword argument description
SamTebbs33 Mar 21, 2025
ec2bfed
Fixup langref
SamTebbs33 May 20, 2025
9f5f91a
IsWriteAfterRead -> IsReadAfterWrite and avoid using ops vector
SamTebbs33 May 20, 2025
eb8d5af
Extend vXi1 setcc to account for intrinsic VT promotion
SamTebbs33 May 20, 2025
c3d6fc8
Remove experimental from intrinsic name
SamTebbs33 May 21, 2025
9c5631d
Clean up vector type creation
SamTebbs33 May 21, 2025
52fca12
Address review
SamTebbs33 Aug 5, 2025
9a985ab
Remove experimental from comment
SamTebbs33 Aug 7, 2025
b09d354
Add splitting
SamTebbs33 Aug 7, 2025
56f9a6b
Add widening
SamTebbs33 Aug 7, 2025
26bf362
Remove assertions and expand invalid immediates
SamTebbs33 Aug 11, 2025
a84e5e2
Remove comment about mismatched type and immediate
SamTebbs33 Aug 11, 2025
054f859
Improve lowering and splitting code a bit
SamTebbs33 Aug 12, 2025
970e7f9
Remove splitting from lowering
SamTebbs33 Aug 12, 2025
fddda14
Improve wording in lang ref
SamTebbs33 Aug 12, 2025
36be558
Rebase
SamTebbs33 Aug 12, 2025
c3d2acf
Remove backend promotion
SamTebbs33 Aug 13, 2025
8af5019
Don't create StoreVT
SamTebbs33 Aug 13, 2025
558bc3e
Use ternary for Addend
SamTebbs33 Aug 13, 2025
32e0192
Stop adding to PtrB
SamTebbs33 Aug 13, 2025
3d7c2da
Move nosve/nosve2 tests to separate files
SamTebbs33 Aug 13, 2025
5402e27
Rebase
SamTebbs33 Aug 15, 2025
5075b5f
Remove unneeded lowering cases
SamTebbs33 Aug 18, 2025
d85d375
Simplify lang ref again
SamTebbs33 Aug 19, 2025
4dedf42
More langref re-wording
SamTebbs33 Aug 27, 2025
33be150
Define a store-to-load forwarding hazard
SamTebbs33 Aug 28, 2025
587a25c
Scalarize <1 x Y> intrinsic calls
SamTebbs33 Aug 31, 2025
3abc7ba
Address review
SamTebbs33 Sep 1, 2025
8eb12a0
Address review
SamTebbs33 Sep 2, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
124 changes: 124 additions & 0 deletions llvm/docs/LangRef.rst
Original file line number Diff line number Diff line change
Expand Up @@ -24105,6 +24105,130 @@ Examples:
%wide.masked.load = call <4 x i32> @llvm.masked.load.v4i32.p0v4i32(<4 x i32>* %3, i32 4, <4 x i1> %active.lane.mask, <4 x i32> poison)


.. _int_loop_dependence_war_mask:

'``llvm.loop.dependence.war.mask.*``' Intrinsics
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Syntax:
"""""""
This is an overloaded intrinsic.

::

declare <4 x i1> @llvm.loop.dependence.war.mask.v4i1(ptr %ptrA, ptr %ptrB, i64 immarg %elementSize)
declare <8 x i1> @llvm.loop.dependence.war.mask.v8i1(ptr %ptrA, ptr %ptrB, i64 immarg %elementSize)
declare <16 x i1> @llvm.loop.dependence.war.mask.v16i1(ptr %ptrA, ptr %ptrB, i64 immarg %elementSize)
declare <vscale x 16 x i1> @llvm.loop.dependence.war.mask.nxv16i1(ptr %ptrA, ptr %ptrB, i64 immarg %elementSize)


Overview:
"""""""""

Given a vector load from %ptrA followed by a vector store to %ptrB, this
instruction generates a mask where an active lane indicates that the
write-after-read sequence can be performed safely for that lane, without the
danger of a write-after-read hazard occurring.

A write-after-read hazard occurs when a write-after-read sequence for a given
lane in a vector ends up being executed as a read-after-write sequence due to
the aliasing of pointers.

Arguments:
""""""""""

The first two arguments are pointers and the last argument is an immediate.
The result is a vector with the i1 element type.

Semantics:
""""""""""

``%elementSize`` is the size of the accessed elements in bytes.
The intrinsic returns ``poison`` if the distance between ``%prtA`` and ``%ptrB``
is smaller than ``VF * %elementsize`` and either ``%ptrA + VF * %elementSize``
or ``%ptrB + VF * %elementSize`` wrap.
The element of the result mask is active when loading from %ptrA then storing to
%ptrB is safe and doesn't result in a write-after-read hazard, meaning that:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

Suggested change
%ptrB is safe and doesn't result in a write-after-read hazard:
%ptrB is safe and doesn't result in a write-after-read hazard, meaning that:

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, thank you.


* (ptrB - ptrA) <= 0 (guarantees that all lanes are loaded before any stores), or
* (ptrB - ptrA) >= elementSize * lane (guarantees that this lane is loaded
before the store to the same address)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think committed is a term that is defined/used in LangRef. Would be goot to reframe this as well in general terms.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.


Examples:
"""""""""

.. code-block:: llvm

%loop.dependence.mask = call <4 x i1> @llvm.loop.dependence.war.mask.v4i1(ptr %ptrA, ptr %ptrB, i64 4)
%vecA = call <4 x i32> @llvm.masked.load.v4i32.p0v4i32(ptr %ptrA, i32 4, <4 x i1> %loop.dependence.mask, <4 x i32> poison)
[...]
call @llvm.masked.store.v4i32.p0v4i32(<4 x i32> %vecA, ptr %ptrB, i32 4, <4 x i1> %loop.dependence.mask)

.. _int_loop_dependence_raw_mask:

'``llvm.loop.dependence.raw.mask.*``' Intrinsics
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Syntax:
"""""""
This is an overloaded intrinsic.

::

declare <4 x i1> @llvm.loop.dependence.raw.mask.v4i1(ptr %ptrA, ptr %ptrB, i64 immarg %elementSize)
declare <8 x i1> @llvm.loop.dependence.raw.mask.v8i1(ptr %ptrA, ptr %ptrB, i64 immarg %elementSize)
declare <16 x i1> @llvm.loop.dependence.raw.mask.v16i1(ptr %ptrA, ptr %ptrB, i64 immarg %elementSize)
declare <vscale x 16 x i1> @llvm.loop.dependence.raw.mask.nxv16i1(ptr %ptrA, ptr %ptrB, i64 immarg %elementSize)


Overview:
"""""""""

Given a vector store to %ptrA followed by a vector load from %ptrB, this
instruction generates a mask where an active lane indicates that the
read-after-write sequence can be performed safely for that lane, without a
read-after-write hazard or a store-to-load forwarding hazard being introduced.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

store-to-load forwarding hazard is not defined. Do we need this wording here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The wording for the store-to-load forwarding (hazard) behaviour cannot be removed, because it is the only distinction between this intrinsic and the .war intrinsic. i.e. The "safe" requirement is not the only behaviour that this intrinsic implements.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've re-added the hazard wording, thanks.


A read-after-write hazard occurs when a read-after-write sequence for a given
lane in a vector ends up being executed as a write-after-read sequence due to
the aliasing of pointers.
Comment on lines +24192 to +24194
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we just explain this generally (hazard language is not used in langref), does this simply say that instead a first reading and then storing a lane, it is stored first instead?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason I specifically suggested using the "introduces a hazard" terminology (along with a subsequent definition of what a hazard is) is because "safe" and "no alias" do not cover the semantics.
'no alias' can't be used because both intrinsics still return an all-active mask when their pointers fully alias.
'safe' doesn't cover the .raw intrinsic because it sets lanes to inactive when they are safe but would otherwise introduce a store-to-load forwarding hazard.


A store-to-load forwarding hazard occurs when a vector store writes to an
address that partially overlaps with the address of a subsequent vector load,
meaning that the vector load can't be performed until the vector store is
complete.

Arguments:
""""""""""

The first two arguments are pointers and the last argument is an immediate.
The result is a vector with the i1 element type.

Semantics:
""""""""""

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The case for ptrA == ptrB needs to be explicitly called out as 'safe' here, because it doesn't introduce any new hazards.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

``%elementSize`` is the size of the accessed elements in bytes.
The intrinsic returns ``poison`` if the distance between ``%prtA`` and ``%ptrB``
is smaller than ``VF * %elementsize`` and either ``%ptrA + VF * %elementSize``
or ``%ptrB + VF * %elementSize`` wrap.
The element of the result mask is active when storing to %ptrA then loading from
%ptrB is safe and doesn't result in aliasing, meaning that:

* abs(ptrB - ptrA) >= elementSize * lane (guarantees that the store of this lane
occurs before loading from this address), or
* ptrA == ptrB (doesn't introduce any new hazards that weren't in the scalar
code)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* ptrA == ptrB, doesn't introduce any new hazards
* ptrA == ptrB (doesn't introduce any new hazards that weren't present in scalar code)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.


Examples:
"""""""""

.. code-block:: llvm

%loop.dependence.mask = call <4 x i1> @llvm.loop.dependence.raw.mask.v4i1(ptr %ptrA, ptr %ptrB, i64 4)
call @llvm.masked.store.v4i32.p0v4i32(<4 x i32> %vecA, ptr %ptrA, i32 4, <4 x i1> %loop.dependence.mask)
[...]
%vecB = call <4 x i32> @llvm.masked.load.v4i32.p0v4i32(ptr %ptrB, i32 4, <4 x i1> %loop.dependence.mask, <4 x i32> poison)

.. _int_experimental_vp_splice:

'``llvm.experimental.vp.splice``' Intrinsic
Expand Down
6 changes: 6 additions & 0 deletions llvm/include/llvm/CodeGen/ISDOpcodes.h
Original file line number Diff line number Diff line change
Expand Up @@ -1558,6 +1558,12 @@ enum NodeType {
// bits conform to getBooleanContents similar to the SETCC operator.
GET_ACTIVE_LANE_MASK,

// The `llvm.loop.dependence.{war, raw}.mask` intrinsics
// Operands: Load pointer, Store pointer, Element size
// Output: Mask
LOOP_DEPENDENCE_WAR_MASK,
LOOP_DEPENDENCE_RAW_MASK,

// llvm.clear_cache intrinsic
// Operands: Input Chain, Start Addres, End Address
// Outputs: Output Chain
Expand Down
10 changes: 10 additions & 0 deletions llvm/include/llvm/IR/Intrinsics.td
Original file line number Diff line number Diff line change
Expand Up @@ -2420,6 +2420,16 @@ let IntrProperties = [IntrNoMem, ImmArg<ArgIndex<1>>] in {
llvm_i32_ty]>;
}

def int_loop_dependence_raw_mask:
DefaultAttrsIntrinsic<[llvm_anyvector_ty],
[llvm_ptr_ty, llvm_ptr_ty, llvm_i64_ty],
[IntrNoMem, IntrNoSync, IntrWillReturn, ImmArg<ArgIndex<2>>]>;

def int_loop_dependence_war_mask:
DefaultAttrsIntrinsic<[llvm_anyvector_ty],
[llvm_ptr_ty, llvm_ptr_ty, llvm_i64_ty],
[IntrNoMem, IntrNoSync, IntrWillReturn, ImmArg<ArgIndex<2>>]>;

def int_get_active_lane_mask:
DefaultAttrsIntrinsic<[llvm_anyvector_ty],
[llvm_anyint_ty, LLVMMatchType<1>],
Expand Down
8 changes: 8 additions & 0 deletions llvm/include/llvm/Target/TargetSelectionDAG.td
Original file line number Diff line number Diff line change
Expand Up @@ -833,6 +833,14 @@ def step_vector : SDNode<"ISD::STEP_VECTOR", SDTypeProfile<1, 1,
def scalar_to_vector : SDNode<"ISD::SCALAR_TO_VECTOR", SDTypeProfile<1, 1, []>,
[]>;

def SDTLoopDepMask : SDTypeProfile<1, 3, [SDTCisVec<0>, SDTCisInt<1>,
SDTCisSameAs<2, 1>, SDTCisInt<3>,
SDTCVecEltisVT<0,i1>]>;
def loop_dependence_war_mask : SDNode<"ISD::LOOP_DEPENDENCE_WAR_MASK",
SDTLoopDepMask, []>;
def loop_dependence_raw_mask : SDNode<"ISD::LOOP_DEPENDENCE_RAW_MASK",
SDTLoopDepMask, []>;

// vector_extract/vector_insert are similar to extractelt/insertelt but allow
// types that require promotion (a 16i8 extract where i8 is not a legal type so
// uses i32 for example). extractelt/insertelt are preferred where the element
Expand Down
11 changes: 11 additions & 0 deletions llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -324,6 +324,11 @@ void DAGTypeLegalizer::PromoteIntegerResult(SDNode *N, unsigned ResNo) {
Res = PromoteIntRes_VP_REDUCE(N);
break;

case ISD::LOOP_DEPENDENCE_WAR_MASK:
case ISD::LOOP_DEPENDENCE_RAW_MASK:
Res = PromoteIntRes_LOOP_DEPENDENCE_MASK(N);
break;

case ISD::FREEZE:
Res = PromoteIntRes_FREEZE(N);
break;
Expand Down Expand Up @@ -374,6 +379,12 @@ SDValue DAGTypeLegalizer::PromoteIntRes_MERGE_VALUES(SDNode *N,
return GetPromotedInteger(Op);
}

SDValue DAGTypeLegalizer::PromoteIntRes_LOOP_DEPENDENCE_MASK(SDNode *N) {
EVT VT = N->getValueType(0);
EVT NewVT = TLI.getTypeToTransformTo(*DAG.getContext(), VT);
return DAG.getNode(N->getOpcode(), SDLoc(N), NewVT, N->ops());
}

SDValue DAGTypeLegalizer::PromoteIntRes_AssertSext(SDNode *N) {
// Sign-extend the new bits, and continue the assertion.
SDValue Op = SExtPromotedInteger(N->getOperand(0));
Expand Down
5 changes: 5 additions & 0 deletions llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h
Original file line number Diff line number Diff line change
Expand Up @@ -382,6 +382,7 @@ class LLVM_LIBRARY_VISIBILITY DAGTypeLegalizer {
SDValue PromoteIntRes_VECTOR_FIND_LAST_ACTIVE(SDNode *N);
SDValue PromoteIntRes_GET_ACTIVE_LANE_MASK(SDNode *N);
SDValue PromoteIntRes_PARTIAL_REDUCE_MLA(SDNode *N);
SDValue PromoteIntRes_LOOP_DEPENDENCE_MASK(SDNode *N);

// Integer Operand Promotion.
bool PromoteIntegerOperand(SDNode *N, unsigned OpNo);
Expand Down Expand Up @@ -436,6 +437,7 @@ class LLVM_LIBRARY_VISIBILITY DAGTypeLegalizer {
SDValue PromoteIntOp_VECTOR_FIND_LAST_ACTIVE(SDNode *N, unsigned OpNo);
SDValue PromoteIntOp_GET_ACTIVE_LANE_MASK(SDNode *N);
SDValue PromoteIntOp_PARTIAL_REDUCE_MLA(SDNode *N);
SDValue PromoteIntOp_LOOP_DEPENDENCE_MASK(SDNode *N, unsigned OpNo);

void SExtOrZExtPromotedOperands(SDValue &LHS, SDValue &RHS);
void PromoteSetCCOperands(SDValue &LHS,SDValue &RHS, ISD::CondCode Code);
Expand Down Expand Up @@ -868,6 +870,7 @@ class LLVM_LIBRARY_VISIBILITY DAGTypeLegalizer {
// Vector Result Scalarization: <1 x ty> -> ty.
void ScalarizeVectorResult(SDNode *N, unsigned ResNo);
SDValue ScalarizeVecRes_MERGE_VALUES(SDNode *N, unsigned ResNo);
SDValue ScalarizeVecRes_LOOP_DEPENDENCE_MASK(SDNode *N);
SDValue ScalarizeVecRes_BinOp(SDNode *N);
SDValue ScalarizeVecRes_CMP(SDNode *N);
SDValue ScalarizeVecRes_TernaryOp(SDNode *N);
Expand Down Expand Up @@ -963,6 +966,7 @@ class LLVM_LIBRARY_VISIBILITY DAGTypeLegalizer {
void SplitVecRes_FIX(SDNode *N, SDValue &Lo, SDValue &Hi);

void SplitVecRes_BITCAST(SDNode *N, SDValue &Lo, SDValue &Hi);
void SplitVecRes_LOOP_DEPENDENCE_MASK(SDNode *N, SDValue &Lo, SDValue &Hi);
void SplitVecRes_BUILD_VECTOR(SDNode *N, SDValue &Lo, SDValue &Hi);
void SplitVecRes_CONCAT_VECTORS(SDNode *N, SDValue &Lo, SDValue &Hi);
void SplitVecRes_EXTRACT_SUBVECTOR(SDNode *N, SDValue &Lo, SDValue &Hi);
Expand Down Expand Up @@ -1069,6 +1073,7 @@ class LLVM_LIBRARY_VISIBILITY DAGTypeLegalizer {
SDValue WidenVecRes_ADDRSPACECAST(SDNode *N);
SDValue WidenVecRes_AssertZext(SDNode* N);
SDValue WidenVecRes_BITCAST(SDNode* N);
SDValue WidenVecRes_LOOP_DEPENDENCE_MASK(SDNode *N);
SDValue WidenVecRes_BUILD_VECTOR(SDNode* N);
SDValue WidenVecRes_CONCAT_VECTORS(SDNode* N);
SDValue WidenVecRes_EXTEND_VECTOR_INREG(SDNode* N);
Expand Down
51 changes: 51 additions & 0 deletions llvm/lib/CodeGen/SelectionDAG/LegalizeVectorOps.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -138,6 +138,7 @@ class VectorLegalizer {
SDValue ExpandVP_FNEG(SDNode *Node);
SDValue ExpandVP_FABS(SDNode *Node);
SDValue ExpandVP_FCOPYSIGN(SDNode *Node);
SDValue ExpandLOOP_DEPENDENCE_MASK(SDNode *N);
SDValue ExpandSELECT(SDNode *Node);
std::pair<SDValue, SDValue> ExpandLoad(SDNode *N);
SDValue ExpandStore(SDNode *N);
Expand Down Expand Up @@ -475,6 +476,8 @@ SDValue VectorLegalizer::LegalizeOp(SDValue Op) {
case ISD::VECTOR_COMPRESS:
case ISD::SCMP:
case ISD::UCMP:
case ISD::LOOP_DEPENDENCE_WAR_MASK:
case ISD::LOOP_DEPENDENCE_RAW_MASK:
Action = TLI.getOperationAction(Node->getOpcode(), Node->getValueType(0));
break;
case ISD::SMULFIX:
Expand Down Expand Up @@ -1291,6 +1294,10 @@ void VectorLegalizer::Expand(SDNode *Node, SmallVectorImpl<SDValue> &Results) {
case ISD::UCMP:
Results.push_back(TLI.expandCMP(Node, DAG));
return;
case ISD::LOOP_DEPENDENCE_WAR_MASK:
case ISD::LOOP_DEPENDENCE_RAW_MASK:
Results.push_back(ExpandLOOP_DEPENDENCE_MASK(Node));
return;

case ISD::FADD:
case ISD::FMUL:
Expand Down Expand Up @@ -1796,6 +1803,50 @@ SDValue VectorLegalizer::ExpandVP_FCOPYSIGN(SDNode *Node) {
return DAG.getNode(ISD::BITCAST, DL, VT, CopiedSign);
}

SDValue VectorLegalizer::ExpandLOOP_DEPENDENCE_MASK(SDNode *N) {
SDLoc DL(N);
SDValue SourceValue = N->getOperand(0);
SDValue SinkValue = N->getOperand(1);
SDValue EltSize = N->getOperand(2);

bool IsReadAfterWrite = N->getOpcode() == ISD::LOOP_DEPENDENCE_RAW_MASK;
EVT VT = N->getValueType(0);
EVT PtrVT = SourceValue->getValueType(0);

SDValue Diff = DAG.getNode(ISD::SUB, DL, PtrVT, SinkValue, SourceValue);
if (IsReadAfterWrite)
Diff = DAG.getNode(ISD::ABS, DL, PtrVT, Diff);

Diff = DAG.getNode(ISD::SDIV, DL, PtrVT, Diff, EltSize);

// If the difference is positive then some elements may alias
EVT CmpVT = TLI.getSetCCResultType(DAG.getDataLayout(), *DAG.getContext(),
Diff.getValueType());
SDValue Zero = DAG.getTargetConstant(0, DL, PtrVT);
SDValue Cmp = DAG.getSetCC(DL, CmpVT, Diff, Zero,
IsReadAfterWrite ? ISD::SETEQ : ISD::SETLE);

// Create the lane mask
EVT SplatVT = VT.changeElementType(PtrVT);
SDValue DiffSplat = DAG.getSplat(SplatVT, DL, Diff);
SDValue VectorStep = DAG.getStepVector(DL, SplatVT);
EVT MaskVT = VT.changeElementType(MVT::i1);
SDValue DiffMask =
DAG.getSetCC(DL, MaskVT, VectorStep, DiffSplat, ISD::CondCode::SETULT);

EVT EltVT = VT.getVectorElementType();
// Extend the diff setcc in case the intrinsic has been promoted to a vector
// type with elements larger than i1
if (EltVT.getScalarSizeInBits() > MaskVT.getScalarSizeInBits())
DiffMask = DAG.getNode(ISD::ANY_EXTEND, DL, VT, DiffMask);

// Splat the compare result then OR it with the lane mask
if (CmpVT.getScalarSizeInBits() < EltVT.getScalarSizeInBits())
Cmp = DAG.getNode(ISD::ZERO_EXTEND, DL, EltVT, Cmp);
SDValue Splat = DAG.getSplat(VT, DL, Cmp);
return DAG.getNode(ISD::OR, DL, VT, DiffMask, Splat);
}

void VectorLegalizer::ExpandFP_TO_UINT(SDNode *Node,
SmallVectorImpl<SDValue> &Results) {
// Attempt to expand using TargetLowering.
Expand Down
Loading