-
Notifications
You must be signed in to change notification settings - Fork 15.1k
[PowerPC] Replace vspltisw+vadduwm instructions with xxleqv+vsubuwm for adding the vector {1, 1, 1, 1} #160882
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
@llvm/pr-subscribers-backend-powerpc Author: None (Himadhith) ChangesThis patch leverages generation of vector of -1s to be cheaper than vector of 1s to optimize the current implementation for In this optimized version we replace Full diff: https://github.com/llvm/llvm-project/pull/160882.diff 1 Files Affected:
diff --git a/llvm/lib/Target/PowerPC/PPCInstrVSX.td b/llvm/lib/Target/PowerPC/PPCInstrVSX.td
index 4e5165bfcda55..dc850d2470cfd 100644
--- a/llvm/lib/Target/PowerPC/PPCInstrVSX.td
+++ b/llvm/lib/Target/PowerPC/PPCInstrVSX.td
@@ -3627,6 +3627,10 @@ def : Pat<(v4i32 (build_vector immSExt5NonZero:$A, immSExt5NonZero:$A,
immSExt5NonZero:$A, immSExt5NonZero:$A)),
(v4i32 (VSPLTISW imm:$A))>;
+// Optimise for vector of 1s addition operation
+def : Pat<(add v4i32:$A, (build_vector (i32 1), (i32 1), (i32 1), (i32 1))),
+ (VSUBUWM $A, (v4i32 (COPY_TO_REGCLASS (XXLEQVOnes), VSRC)))>;
+
// Splat loads.
def : Pat<(v8i16 (PPCldsplat ForceXForm:$A)),
(v8i16 (VSPLTHs 3, (MTVSRWZ (LHZX ForceXForm:$A))))>;
|
6018f73 to
40edcce
Compare
8079d5c to
da6de91
Compare
|
I'm guessing this is not ready to be reviewed as it need https://github.com/llvm/llvm-project/pull/160476/files to be in first enable to show the difference. |
Yes as soon as the NFC patch gets merged I will rebase and the file should reflect the changes. Should I keep this as a draft till then? |
da6de91 to
5de66e2
Compare
| (v4i32 (VSPLTISW imm:$A))>; | ||
|
|
||
| // Optimize for vector of 1s addition operation | ||
| def : Pat<(add v4i32:$A, (build_vector (i32 1), (i32 1), (i32 1), (i32 1))), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this work only for v4i32 vector types? Why not v2i64, v8i16 and v16i8 types?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried to add Patterns for the other 3 types which are not present, I noticed that for v2i64 type the tablegen pattern matching was not working as it is generating the following ISAs:
vspltisw 3, 1
vupklsw 3, 3
vaddudm 2, 2, 3
Which is difficult to replace gracefully using tablegen method. Instead, opting for DAG combiner method to handle this case in the backend.
| ; This pattern is expected to be optimized in a future patch by using `xxleqv` to generate vector of -1s | ||
| ; followed by subtraction operation. | ||
| ; Optimized version of vector addition with {1,1,1,1} by replacing `vspltisw + vadduwm` with 'xxleqv + vsubuwm' | ||
| define dso_local noundef <4 x i32> @test1(<4 x i32> %a) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above comment. Support v2i64, v8i16 and v16i8 types as well ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will add a NFC patch shortly to address the other 3 types.
|
✅ With the latest revision this PR passed the C/C++ code formatter. |
1221560 to
c27a492
Compare
…ector of -1s is cheaper than vector of 1s
2619e1d to
d74869b
Compare
tonykuttai
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please modify the description to reflect that
ADDoperation substituted withSUBBuild vector of all 1sin RHS getting replaced withBuild vector of all -1s
| ; NOVSX-NEXT: addi 3, 3, .LCPI1_0@toc@l | ||
| ; NOVSX-NEXT: lvx 3, 0, 3 | ||
| ; NOVSX-NEXT: vaddudm 2, 2, 3 | ||
| ; NOVSX-NEXT: vsubudm 2, 2, 3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please investigate why this got affected.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was because the code did not check for VSX attribute. The hasVSX() check fixed this.
| // Check if RHS is BUILD_VECTOR | ||
| // To satisfy commutative property a+b = b+a | ||
| if (RHS.getOpcode() != ISD::BUILD_VECTOR) | ||
| std::swap(LHS, RHS); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BUILD_VECTOR have to be on the RHS. We don't need the swap here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
e50a0d6 to
67a8060
Compare
tonykuttai
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for addressing the comments. LGTM
6f3cb1d to
432d6e0
Compare
|
This patch does not handle |
This patch optimizes vector addition operations involving
all-onesvectors by leveraging the generation of vectors of -1s(usingxxleqv, which is cheaper than generating vectors of 1s(vspltisw). These are the respective vector types.v2i64:A + vector {1, 1}v4i32:A + vector {1, 1, 1, 1}v8i16:A + vector {1, 1, 1, 1, 1, 1, 1, 1}v16i8:A + vector {1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1}The optimized version replaces
vspltisw (4 cycles)withxxleqv (2 cycles)using the following identity:A - (-1) = A + 1.