[AIEX] Add a combiners to optimize unaligned vector loads #698

andcarminati · 2025-11-06T08:40:58Z

In this case, we can load the scalar value directly instead of building a full vector (legalizer will scalarize this load anyway) to extract.

This combiner was motivated by the following real case (there are more pathological cases, related to <32 x s6> for example):

name:            test
alignment:       16
legalized:       false
tracksRegLiveness: true
body:             |
  bb.1:
  liveins: $p0

    %0:_(p0) = COPY $p0
    %14:_(<8 x s16>) = G_LOAD %0(p0) :: (dereferenceable load (<8 x s16>), align 2)
    %15:_(<16 x s8>) = G_BITCAST %14(<8 x s16>)
    %17:_(s32) = G_CONSTANT i32 4
    %16:_(s8) = G_EXTRACT_VECTOR_ELT %15(<16 x s8>), %17(s32)
    %18:_(s32) = G_ZEXT %16(s8)
    %158:_(<64 x s8>) = G_AIE_PAD_VECTOR_UNDEF %15(<16 x s8>)
    %23:_(s32) = G_CONSTANT i32 2
    %156:_(s32) = G_AIE_ZEXT_EXTRACT_VECTOR_ELT %158(<64 x s8>), %23(s32)
    PseudoRET implicit $lr, implicit %18, implicit %156

...

Peano's current final code for this case is:

test:                                   // @test
// %bb.0:
	lda.s16	 r0, [p0], #2
	lda.s16	 r1, [p0], #2
	lda.s16	 r2, [p0], #2
	lda.s16	 r3, [p0], #2
	lda.s16	 r4, [p0], #2
	lda.s16	 r5, [p0], #2
	lda.s16	 r6, [p0, #0]
	lda.s16	 r7, [p0, #2];		vpush.hi.16	 x0, x0, r0
	vpush.hi.16	 x0, x0, r1
	vpush.hi.16	 x0, x0, r2
	vpush.hi.16	 x0, x0, r3
	vpush.hi.16	 x0, x0, r4
	vpush.hi.16	 x0, x0, r5
	ret	lr;		vpush.hi.16	 x0, x0, r6
	mova	r0, #48;		vpush.hi.16	 x0, x0, r7 //  Delay Slot 5
	vshift	x0, x0, x0, r0                  //  Delay Slot 4
	vextract.8	 r0, x0, #4, vaddsign0  //  Delay Slot 3
	vextract.8	 r1, x0, #2, vaddsign0  //  Delay Slot 2
	nop	                                //  Delay Slot 1

With this PR:

test:                                   // @test
// %bb.0:
	lda.u8	 r0, [p0, #4];		nopb	;		nopxm	;		nops	
	lda.u8	 r1, [p0, #2]
	ret	lr
	nop	                                //  Delay Slot 5
	nop	                                //  Delay Slot 4
	nop	                                //  Delay Slot 3
	nop	                                //  Delay Slot 2
	nop	                                //  Delay Slot 1

andcarminati · 2025-11-12T11:13:00Z

QoR results:

Regressions: HardswishAsHardsigmoid_aie2_0 and Hardswish_aie2_0. We have a regalloc side effect, one spilling is killing the SWP.

martien-de-jong · 2025-11-12T17:03:31Z

llvm/lib/Target/AIE/AIECombinerHelper.cpp

+  const bool IsSExtExtract = (Opcode == SExtExtractOpcode);
+  const bool IsPlainExtract = (Opcode == TargetOpcode::G_EXTRACT_VECTOR_ELT);
+
+  if (!IsZExtExtract && !IsSExtExtract && !IsPlainExtract)


It looks as if the pattern's opcode check precludes this case?

martien-de-jong · 2025-11-12T17:04:41Z

llvm/lib/Target/AIE/AIECombinerHelper.cpp

+
+  // Get the index operand
+  const Register IdxReg = MI.getOperand(2).getReg();
+  auto IdxCst = getIConstantVRegValWithLookThrough(IdxReg, MRI);


martien-de-jong · 2025-11-12T17:08:33Z

llvm/lib/Target/AIE/AIECombinerHelper.cpp

+  if (MMO->getAlign().value() >= LoadVecSizeInBytes)
+    return false;
+
+  const unsigned ElemSizeInBytes = ElemSize / 8;


Declare closer to its first use

martien-de-jong · 2025-11-12T17:11:23Z

llvm/lib/Target/AIE/AIECombinerHelper.cpp

+    const Register PtrReg = LoadMI->getOperand(1).getReg();
+    const LLT S20 = LLT::scalar(20);
+
+    // Calculate byte offset: Index * ElemSizeInBytes


I think this comment is redundant

martien-de-jong · 2025-11-12T17:13:35Z

llvm/lib/Target/AIE/AIECombinerHelper.cpp

+///   %new_ptr:_(p0) = G_PTR_ADD %ptr, %offset
+///   %elt:_(sX) = G_LOAD %new_ptr :: (align 1)
+///   %result:_(s32) = G_[Z/S]EXT %elt
+bool llvm::matchUnalignedExtractLoad(MachineInstr &MI, MachineRegisterInfo &MRI,


Can we have a more descriptive name for MI? ExtractMI?

martien-de-jong · 2025-11-12T17:27:23Z

llvm/lib/Target/AIE/AIECombinerHelper.cpp

+
+    // Set insertion point right after the original vector load
+    if (LoadMI->getNextNode())
+      B.setInstr(*LoadMI->getNextNode());


Wouldn't std::next(iterator(LoadMI)) be always valid for setInsertPt() ?

martien-de-jong · 2025-11-13T08:41:08Z

llvm/lib/Target/AIE/AIECombinerHelper.cpp

+    // alignment using GCD to find the maximum provable alignment
+    const unsigned OrigAlign = MMO->getAlign().value();
+    const unsigned ScalarAlign =
+        ByteOffset == 0 ? OrigAlign : std::gcd(OrigAlign, (unsigned)ByteOffset);


I think this could be written as std::gcd(OrigAlign, OrigAlign + ByteOffset);

martien-de-jong · 2025-11-13T08:45:31Z

llvm/lib/Target/AIE/AIECombinerHelper.cpp

+    // Handle the result based on the original opcode
+    if (IsZExtExtract || IsSExtExtract) {
+      // Need to extend to s32
+      const Register DstReg = MI.getOperand(0).getReg();


This can be hoisted, it's necessary for all cases. Then we can do
if (IsZExtract) {
} else if (IsSExtract) {
} else {
}

The second set of extracts can be removed actually.

andcarminati · 2025-11-13T10:53:08Z

Another combiner was included to improve scalarization of unaligned vector loads.

In this case, we can load the scalar value directly instead of building a full vector (legalizer will scalarize this load anyway) to extract.

martien-de-jong · 2025-11-13T12:36:18Z

llvm/lib/Target/AIE/AIECombinerHelper.cpp

+  unsigned NewElemSize = 0;
+  if (Alignment >= 8 && ElemSize < 64) {
+    NewElemSize = 64;
+  } else if (Alignment >= 4 && ElemSize < 32) {


We will only arrive here with ElemSize 8 or 16, both smaller than 32 and 64

Sure, this will be simplified.

martien-de-jong · 2025-11-13T12:41:38Z

llvm/lib/Target/AIE/AIECombinerHelper.cpp

+
+  // Check if the vector size is compatible with the new element size
+  const unsigned VecSizeInBits = DstTy.getSizeInBits();
+  if (VecSizeInBits % NewElemSize != 0)


We may be discarding 64 while 32 or 16 might still work.

We are not handling 64 anymore.

andcarminati · 2025-11-13T12:41:59Z

llvm/lib/Target/AIE/AIECombinerHelper.cpp

+  const unsigned ElemSize = ElemTy.getSizeInBits();
+
+  // Skip if the load is only used for extracts - let matchUnalignedExtractLoad
+  // handle it This prevents the two combiners from competing for the same


nit: missing .

andcarminati · 2025-11-13T12:42:07Z

llvm/lib/Target/AIE/AIECombinerHelper.cpp

+    return false;
+
+  // Skip if the load has a single user that is a G_STORE with the same
+  // alignment This case can be perfectly scalarized during legalization


nit: missing .

martien-de-jong · 2025-11-13T12:46:05Z

llvm/lib/Target/AIE/AIECombinerHelper.cpp

+    return false;
+
+  // Calculate new number of elements
+  const unsigned NewNumElems = VecSizeInBits / NewElemSize;


I think it's confusing that both these variable are in bits, but not both called InBits

martien-de-jong · 2025-11-13T12:52:01Z

llvm/lib/Target/AIE/AIECombinerHelper.cpp

+  const unsigned NewNumElems = VecSizeInBits / NewElemSize;
+
+  // Capture the pointer register before creating the lambda
+  const Register PtrReg = LoadMI.getOperand(1).getReg();


You can writePtrReg = LoadMI.getOperand(1).getReg()in the capture list, giving it local scope

martien-de-jong · 2025-11-13T12:58:18Z

llvm/lib/Target/AIE/AIECombinerHelper.cpp

+    MachineFunction &MF = B.getMF();
+
+    // Create the new vector type with better-aligned elements
+    const LLT NewVecTy = LLT::fixed_vector(NewNumElems, NewElemSize);


capture NewVecTy, dropping the two constructor parameters?

I moved NewNumElems to the lambda.

…gnment In this case, we can improve the legalized code.

andcarminati requested review from F-Stuckmann, SagarMaheshwari99, abhinay-anubola, abnikant, katerynamuts, khallouh, konstantinschwarz, martien-de-jong, mludevid, niwinanto and stephenneuendorffer as code owners November 6, 2025 08:40

andcarminati force-pushed the andreu.unaligned.load.extract branch from d3c310b to 4838126 Compare November 10, 2025 13:28

martien-de-jong reviewed Nov 12, 2025

View reviewed changes

martien-de-jong reviewed Nov 13, 2025

View reviewed changes

andcarminati force-pushed the andreu.unaligned.load.extract branch from 4838126 to e684035 Compare November 13, 2025 10:51

andcarminati changed the title ~~[AIEX] Add a combiner to handle extract from unaligned vector load~~ [AIEX] Add a combiners to optimize unaligned vector loads Nov 13, 2025

[AIEX] Add a combiner to handle extract from unaligned vector load

b5d7a42

In this case, we can load the scalar value directly instead of building a full vector (legalizer will scalarize this load anyway) to extract.

martien-de-jong reviewed Nov 13, 2025

View reviewed changes

andcarminati force-pushed the andreu.unaligned.load.extract branch from e684035 to 20e7808 Compare November 13, 2025 12:37

martien-de-jong reviewed Nov 13, 2025

View reviewed changes

andcarminati commented Nov 13, 2025

View reviewed changes

martien-de-jong reviewed Nov 13, 2025

View reviewed changes

[AIEX] Add a combiner to change vector load element type based on ali…

ec55fce

…gnment In this case, we can improve the legalized code.

andcarminati force-pushed the andreu.unaligned.load.extract branch from 20e7808 to ec55fce Compare November 13, 2025 14:35

[AIEX] Add a combiners to optimize unaligned vector loads #698

Are you sure you want to change the base?

[AIEX] Add a combiners to optimize unaligned vector loads #698

Uh oh!

Conversation

andcarminati commented Nov 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

andcarminati commented Nov 12, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

andcarminati commented Nov 13, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

martien-de-jong Nov 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

andcarminati commented Nov 6, 2025 •

edited

Loading

martien-de-jong Nov 13, 2025 •

edited

Loading