-
Notifications
You must be signed in to change notification settings - Fork 15.1k
[NFC][PowerPC] Lockdown instructions of vspltisw for addition of vector of 1s #160476
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@llvm/pr-subscribers-backend-powerpc Author: None (Himadhith) ChangesThis NFC patch looks to lock down the instruction generated for the operation of Full diff: https://github.com/llvm/llvm-project/pull/160476.diff 1 Files Affected:
diff --git a/llvm/test/CodeGen/PowerPC/vector-all-ones.ll b/llvm/test/CodeGen/PowerPC/vector-all-ones.ll
new file mode 100644
index 0000000000000..7ad41482ffe81
--- /dev/null
+++ b/llvm/test/CodeGen/PowerPC/vector-all-ones.ll
@@ -0,0 +1,47 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 6
+; RUN: llc -verify-machineinstrs -mcpu=pwr9 -mtriple=powerpc64le-unknown-linux-gnu \
+; RUN: -ppc-asm-full-reg-names --ppc-vsr-nums-as-vr < %s | FileCheck %s --check-prefix=POWERPC_64LE
+
+; RUN: llc -verify-machineinstrs -mcpu=pwr9 -mtriple=powerpc64-ibm-aix \
+; RUN: -ppc-asm-full-reg-names --ppc-vsr-nums-as-vr < %s | FileCheck %s --check-prefix=POWERPC_64
+
+; RUN: llc -verify-machineinstrs -mcpu=pwr9 -mtriple=powerpc-ibm-aix \
+; RUN: -ppc-asm-full-reg-names --ppc-vsr-nums-as-vr < %s | FileCheck %s --check-prefix=POWERPC_32
+
+; Currently the generated code uses `vspltisw` to generate vector of 1s followed by add operation.
+; This pattern is expected to be optimized in a future patch by using `xxleqv` to generate vector of -1s
+; followed by subtraction operation.
+define dso_local <4 x i32> @test1(<4 x i32> %a) {
+; POWERPC_64LE-LABEL: test1:
+; POWERPC_64LE: # %bb.0: # %entry
+; POWERPC_64LE-NEXT: vspltisw v3, 1
+; POWERPC_64LE-NEXT: stxv v2, -16(r1)
+; POWERPC_64LE-NEXT: vadduwm v2, v2, v3
+; POWERPC_64LE-NEXT: stxv v3, -32(r1)
+; POWERPC_64LE-NEXT: blr
+;
+; POWERPC_64-LABEL: test1:
+; POWERPC_64: # %bb.0: # %entry
+; POWERPC_64-NEXT: vspltisw v3, 1
+; POWERPC_64-NEXT: stxv v2, -16(r1)
+; POWERPC_64-NEXT: vadduwm v2, v2, v3
+; POWERPC_64-NEXT: stxv v3, -32(r1)
+; POWERPC_64-NEXT: blr
+;
+; POWERPC_32-LABEL: test1:
+; POWERPC_32: # %bb.0: # %entry
+; POWERPC_32-NEXT: vspltisw v3, 1
+; POWERPC_32-NEXT: stxv v2, -16(r1)
+; POWERPC_32-NEXT: vadduwm v2, v2, v3
+; POWERPC_32-NEXT: stxv v3, -32(r1)
+; POWERPC_32-NEXT: blr
+entry:
+ %a.addr = alloca <4 x i32>, align 16
+ %b = alloca <4 x i32>, align 16
+ store <4 x i32> %a, ptr %a.addr, align 16
+ store <4 x i32> splat (i32 1), ptr %b, align 16
+ %0 = load <4 x i32>, ptr %a.addr, align 16
+ %1 = load <4 x i32>, ptr %b, align 16
+ %add = add <4 x i32> %0, %1
+ ret <4 x i32> %add
+}
|
tonykuttai
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
Is there an existing PR that will optimize this code gen? |
75f2131 to
f154ac2
Compare
[PowerPC] Replace vspltisw+vadduwm instructions with xxleqv+vsubuwm for adding the vector {1, 1, 1, 1} |
| ; RUN: -ppc-asm-full-reg-names --ppc-vsr-nums-as-vr < %s | FileCheck %s --check-prefix=POWERPC_64 | ||
|
|
||
| ; RUN: llc -verify-machineinstrs -O3 -mtriple=powerpc-ibm-aix \ | ||
| ; RUN: -ppc-asm-full-reg-names --ppc-vsr-nums-as-vr < %s | FileCheck %s --check-prefix=POWERPC_32 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All 3 set of checks are the same AFAICT... is there a reason we need all 3 to be explicit vs just using the default CHECK?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh yea good point, thanks. No explicit reason for needing all 3 checks, I will change it to use the default CHECK.
f154ac2 to
908c6b8
Compare
lei137
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I updated your PR title since you repeated NFC in it.
Thank you! |
AditiRM
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/72/builds/15238 Here is the relevant piece of the build log for the reference |
…lvm#160476) This NFC patch looks to lock down the instruction generated for the operation of `A + vector {1, 1, 1, 1}` in which the current code emits `vspltisw`. It can be made better with the use of a `2 cycle` instruction `xxleqv` over the current `4 cycle vspltisw`. --------- Co-authored-by: himadhith <[email protected]>
…8 into exisiting testfile (#163201) The previous [NFC patch](#160476 (comment)) addressed only the vector type `v4i32`, this is a continuation for the previous patch which adds the remaining 3 vector types which were left out. This should include the following operands: - `v2i64`: `A + vector {1, 1,}` - `v8i16`: `A + vector {1, 1, 1, 1, 1, 1, 1, 1}` - `v16i8`: `A + vector {1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1}` --------- Co-authored-by: himadhith <[email protected]>
…16 and v16i8 into exisiting testfile (#163201) The previous [NFC patch](llvm/llvm-project#160476 (comment)) addressed only the vector type `v4i32`, this is a continuation for the previous patch which adds the remaining 3 vector types which were left out. This should include the following operands: - `v2i64`: `A + vector {1, 1,}` - `v8i16`: `A + vector {1, 1, 1, 1, 1, 1, 1, 1}` - `v16i8`: `A + vector {1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1}` --------- Co-authored-by: himadhith <[email protected]>
This NFC patch looks to lock down the instruction generated for the operation of
A + vector {1, 1, 1, 1}in which the current code emitsvspltisw.It can be made better with the use of a
2 cycleinstructionxxleqvover the current4 cycle vspltisw.