-
Notifications
You must be signed in to change notification settings - Fork 15.4k
[LLVM][CodeGen][SVE] Improve lowering of fixed length masked mem ops. #134402
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -20190,6 +20190,12 @@ performInsertSubvectorCombine(SDNode *N, TargetLowering::DAGCombinerInfo &DCI, | |
| EVT VecVT = Vec.getValueType(); | ||
| EVT SubVT = SubVec.getValueType(); | ||
|
|
||
| // Promote fixed length vector zeros. | ||
| if (VecVT.isScalableVector() && SubVT.isFixedLengthVector() && | ||
| Vec.isUndef() && isZerosVector(SubVec.getNode())) | ||
| return VecVT.isInteger() ? DAG.getConstant(0, DL, VecVT) | ||
| : DAG.getConstantFP(0, DL, VecVT); | ||
|
|
||
| // Only do this for legal fixed vector types. | ||
| if (!VecVT.isFixedLengthVector() || | ||
| !DAG.getTargetLoweringInfo().isTypeLegal(VecVT) || | ||
|
|
@@ -28697,17 +28703,36 @@ static SDValue convertFixedMaskToScalableVector(SDValue Mask, | |
| SDLoc DL(Mask); | ||
| EVT InVT = Mask.getValueType(); | ||
| EVT ContainerVT = getContainerForFixedLengthVector(DAG, InVT); | ||
|
|
||
| auto Pg = getPredicateForFixedLengthVector(DAG, DL, InVT); | ||
| SDValue Pg = getPredicateForFixedLengthVector(DAG, DL, InVT); | ||
|
|
||
| if (ISD::isBuildVectorAllOnes(Mask.getNode())) | ||
| return Pg; | ||
|
|
||
| auto Op1 = convertToScalableVector(DAG, ContainerVT, Mask); | ||
| auto Op2 = DAG.getConstant(0, DL, ContainerVT); | ||
| bool InvertCond = false; | ||
| if (isBitwiseNot(Mask)) { | ||
| InvertCond = true; | ||
| Mask = Mask.getOperand(0); | ||
| } | ||
|
|
||
| SDValue Op1, Op2; | ||
| ISD::CondCode CC; | ||
|
|
||
| // When Mask is the result of a SETCC, it's better to regenerate the compare. | ||
| if (Mask.getOpcode() == ISD::SETCC) { | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Nice! Could this be extended to peak through ISD::SIGN_EXTEND too? I'm thinking of cases such as: I've seen this come up when using define i64 @cmpeq_i8(<16 x i8> %a, <16 x i8> %b) {
%cmp = icmp eq <16 x i8> %a, %b
%ctz = tail call i64 @llvm.experimental.cttz.elts(<16 x i1> %cmp, i1 1)
ret i64 %ctz
}Otherwise I'll have a look into it once this PR lands. :)
Collaborator
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. For operation legalisation I would not expect to see such code because
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sounds good, I'll have a look later then. Thanks :) |
||
| Op1 = convertToScalableVector(DAG, ContainerVT, Mask.getOperand(0)); | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is v4i32 = SETCC NE v4i16 %a, v4i16 %b since v4i16 is also a legal type. I'm just a bit worried that we're effectively promoting a type here and, if so, is that a problem?
Collaborator
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't believe mixing element sizes like this is legal for NEON. The expectation is that all vector types will have the same element count and bit length. You can see this today (albeit slightly less so since I've refactored the integer side) but for NEON we simply lower SETCC operations onto AArch64ISD::FCM## operations who definitions have the same requirement.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Fair enough! Just wanted to make sure. |
||
| Op2 = convertToScalableVector(DAG, ContainerVT, Mask.getOperand(1)); | ||
| CC = cast<CondCodeSDNode>(Mask.getOperand(2))->get(); | ||
| } else { | ||
| Op1 = convertToScalableVector(DAG, ContainerVT, Mask); | ||
| Op2 = DAG.getConstant(0, DL, ContainerVT); | ||
| CC = ISD::SETNE; | ||
| } | ||
|
|
||
| if (InvertCond) | ||
| CC = getSetCCInverse(CC, Op1.getValueType()); | ||
|
|
||
| return DAG.getNode(AArch64ISD::SETCC_MERGE_ZERO, DL, Pg.getValueType(), | ||
| {Pg, Op1, Op2, DAG.getCondCode(ISD::SETNE)}); | ||
| {Pg, Op1, Op2, DAG.getCondCode(CC)}); | ||
| } | ||
|
|
||
| // Convert all fixed length vector loads larger than NEON to masked_loads. | ||
|
|
||
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was forced to write this to maintain existing code quality. There is no specific reason to limit the combine to zeros but I figured any expansion was best done in a separate PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah this does look like a useful combine when applied to other constants, but like you say best for another PR. Not sure in practice if we'll actually end up with different assembly or not?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We will because NEON only has reg-reg and reg-zero compare instructions whereas SVE has reg-imm as well. You can see this today by changing the existing SVE VLS tests to use non-zero immediates where the generated code emits an unnecessary splat.