Skip to content

Commit 0e72c3d

Browse files
Himadhithhimadhith
andauthored
[NFC] Lockdown instructions of vspltisw for addition of vector of 1s (#160476)
This NFC patch looks to lock down the instruction generated for the operation of `A + vector {1, 1, 1, 1}` in which the current code emits `vspltisw`. It can be made better with the use of a `2 cycle` instruction `xxleqv` over the current `4 cycle vspltisw`. --------- Co-authored-by: himadhith <[email protected]>
1 parent 7ff6973 commit 0e72c3d

File tree

1 file changed

+23
-0
lines changed

1 file changed

+23
-0
lines changed
Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,23 @@
1+
; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 6
2+
; RUN: llc -verify-machineinstrs -O3 -mcpu=pwr9 -mtriple=powerpc64le-unknown-linux-gnu \
3+
; RUN: -ppc-asm-full-reg-names --ppc-vsr-nums-as-vr < %s | FileCheck %s
4+
5+
; RUN: llc -verify-machineinstrs -O3 -mcpu=pwr9 -mtriple=powerpc64-ibm-aix \
6+
; RUN: -ppc-asm-full-reg-names --ppc-vsr-nums-as-vr < %s | FileCheck %s
7+
8+
; RUN: llc -verify-machineinstrs -O3 -mcpu=pwr9 -mtriple=powerpc-ibm-aix \
9+
; RUN: -ppc-asm-full-reg-names --ppc-vsr-nums-as-vr < %s | FileCheck %s
10+
11+
; Currently the generated code uses `vspltisw` to generate vector of 1s followed by add operation.
12+
; This pattern is expected to be optimized in a future patch by using `xxleqv` to generate vector of -1s
13+
; followed by subtraction operation.
14+
define dso_local noundef <4 x i32> @test1(<4 x i32> %a) {
15+
; CHECK-LABEL: test1:
16+
; CHECK: # %bb.0: # %entry
17+
; CHECK-NEXT: vspltisw v3, 1
18+
; CHECK-NEXT: vadduwm v2, v2, v3
19+
; CHECK-NEXT: blr
20+
entry:
21+
%add = add <4 x i32> %a, splat (i32 1)
22+
ret <4 x i32> %add
23+
}

0 commit comments

Comments
 (0)