Skip to content

Conversation

@Himadhith
Copy link
Contributor

@Himadhith Himadhith commented Sep 26, 2025

This patch optimizes vector addition operations involving all-ones vectors by leveraging the generation of vectors of -1s(using xxleqv, which is cheaper than generating vectors of 1s(vspltisw). These are the respective vector types.
v2i64: A + vector {1, 1}
v4i32: A + vector {1, 1, 1, 1}
v8i16: A + vector {1, 1, 1, 1, 1, 1, 1, 1}
v16i8: A + vector {1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1}

The optimized version replaces vspltisw (4 cycles) with xxleqv (2 cycles) using the following identity:
A - (-1) = A + 1.

@llvmbot
Copy link
Member

llvmbot commented Sep 26, 2025

@llvm/pr-subscribers-backend-powerpc

Author: None (Himadhith)

Changes

This patch leverages generation of vector of -1s to be cheaper than vector of 1s to optimize the current implementation for A + vector {1, 1, 1, 1}.

In this optimized version we replace vspltisw (4 cycles) with xxleqv (2 cycles) using the following identity:
A - (-1) = A + 1.


Full diff: https://github.com/llvm/llvm-project/pull/160882.diff

1 Files Affected:

  • (modified) llvm/lib/Target/PowerPC/PPCInstrVSX.td (+4)
diff --git a/llvm/lib/Target/PowerPC/PPCInstrVSX.td b/llvm/lib/Target/PowerPC/PPCInstrVSX.td
index 4e5165bfcda55..dc850d2470cfd 100644
--- a/llvm/lib/Target/PowerPC/PPCInstrVSX.td
+++ b/llvm/lib/Target/PowerPC/PPCInstrVSX.td
@@ -3627,6 +3627,10 @@ def : Pat<(v4i32 (build_vector immSExt5NonZero:$A, immSExt5NonZero:$A,
                                immSExt5NonZero:$A, immSExt5NonZero:$A)),
           (v4i32 (VSPLTISW imm:$A))>;
 
+// Optimise for vector of 1s addition operation
+def : Pat<(add v4i32:$A, (build_vector (i32 1), (i32 1), (i32 1), (i32 1))),
+          (VSUBUWM $A, (v4i32 (COPY_TO_REGCLASS (XXLEQVOnes), VSRC)))>;
+
 // Splat loads.
 def : Pat<(v8i16 (PPCldsplat ForceXForm:$A)),
           (v8i16 (VSPLTHs 3, (MTVSRWZ (LHZX ForceXForm:$A))))>;

@lei137
Copy link
Contributor

lei137 commented Sep 26, 2025

I'm guessing this is not ready to be reviewed as it need https://github.com/llvm/llvm-project/pull/160476/files to be in first enable to show the difference.

@Himadhith
Copy link
Contributor Author

Himadhith commented Sep 26, 2025

I'm guessing this is not ready to be reviewed as it need https://github.com/llvm/llvm-project/pull/160476/files to be in first enable to show the difference.

Yes as soon as the NFC patch gets merged I will rebase and the file should reflect the changes. Should I keep this as a draft till then?

(v4i32 (VSPLTISW imm:$A))>;

// Optimize for vector of 1s addition operation
def : Pat<(add v4i32:$A, (build_vector (i32 1), (i32 1), (i32 1), (i32 1))),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this work only for v4i32 vector types? Why not v2i64, v8i16 and v16i8 types?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to add Patterns for the other 3 types which are not present, I noticed that for v2i64 type the tablegen pattern matching was not working as it is generating the following ISAs:

	vspltisw 3, 1
	vupklsw 3, 3
	vaddudm 2, 2, 3

Which is difficult to replace gracefully using tablegen method. Instead, opting for DAG combiner method to handle this case in the backend.

; This pattern is expected to be optimized in a future patch by using `xxleqv` to generate vector of -1s
; followed by subtraction operation.
; Optimized version of vector addition with {1,1,1,1} by replacing `vspltisw + vadduwm` with 'xxleqv + vsubuwm'
define dso_local noundef <4 x i32> @test1(<4 x i32> %a) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above comment. Support v2i64, v8i16 and v16i8 types as well ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will add a NFC patch shortly to address the other 3 types.

@github-actions
Copy link

github-actions bot commented Oct 13, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

@Himadhith Himadhith force-pushed the himadhith/xxleqv_vec branch 5 times, most recently from 1221560 to c27a492 Compare October 16, 2025 05:42
@Himadhith Himadhith force-pushed the himadhith/xxleqv_vec branch 3 times, most recently from 2619e1d to d74869b Compare October 16, 2025 18:17
Copy link
Contributor

@tonykuttai tonykuttai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please modify the description to reflect that

  • ADD operation substituted with SUB
  • Build vector of all 1s in RHS getting replaced with Build vector of all -1s

; NOVSX-NEXT: addi 3, 3, .LCPI1_0@toc@l
; NOVSX-NEXT: lvx 3, 0, 3
; NOVSX-NEXT: vaddudm 2, 2, 3
; NOVSX-NEXT: vsubudm 2, 2, 3
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please investigate why this got affected.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was because the code did not check for VSX attribute. The hasVSX() check fixed this.

// Check if RHS is BUILD_VECTOR
// To satisfy commutative property a+b = b+a
if (RHS.getOpcode() != ISD::BUILD_VECTOR)
std::swap(LHS, RHS);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BUILD_VECTOR have to be on the RHS. We don't need the swap here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@Himadhith Himadhith force-pushed the himadhith/xxleqv_vec branch from e50a0d6 to 67a8060 Compare October 17, 2025 05:41
Copy link
Contributor

@tonykuttai tonykuttai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for addressing the comments. LGTM

@Himadhith Himadhith force-pushed the himadhith/xxleqv_vec branch 2 times, most recently from 6f3cb1d to 432d6e0 Compare October 17, 2025 05:46
@Himadhith
Copy link
Contributor Author

This patch does not handle v1i128 vector type because it does not emit the instruction vspltisw.

# %bb.0:                                # %entry
        addis 3, 2, .LCPI4_0@toc@ha
        addi 3, 3, .LCPI4_0@toc@l
        lxvd2x 0, 0, 3
        xxswapd 35, 0
        vadduqm 2, 2, 3
        blr
        .long   0
        .quad   0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants