Skip to content

Conversation

@ValentijnvdBeek
Copy link
Contributor

@ValentijnvdBeek ValentijnvdBeek commented May 17, 2024

This merge requests implements an infrastructure based on streams that allows for the creation of shuffle vector pattern masks that can be used to replace inefficient scalarizations with faster generic opcodes. It implements five patterns that are useful for the AIE going forwards, namely one that matches the concatenation of vectors, the same in reverse, the insertion of a vector and the merging of two vectors.

Some todos in the merge request that can be done after the rest has been OK'ed

  • The assertions that check the initial state are done a bit adhoc, it might be nice to pass them as function pointers as well
  • Inefficient matching for larger groups of patterns. The patterns now match the minimum of what they can. For example, only the first two parts of vectors are transformed. Most of these can be extended, but this leads to very large set of patterns quickly
  • There are two matching functions, since the concat version does a lot of additional work that I am not sure what it does. I should probably figure out if those can be fully merged or whether just the simply variant ought to survive
  • Use llvm::SmallVector and not std::vector

@ValentijnvdBeek ValentijnvdBeek self-assigned this May 17, 2024
@ValentijnvdBeek ValentijnvdBeek added llvm:instcombine Code that modifies the combiner llvm:core Modifies non-AIE specific code backend:aie Code that modifies AIE code labels May 17, 2024
@ValentijnvdBeek ValentijnvdBeek force-pushed the vvandebe.shufflevector.pattern.optimization branch from 10708e3 to 0417b1c Compare May 22, 2024 09:15
@ValentijnvdBeek ValentijnvdBeek force-pushed the vvandebe.shufflevector.pattern.optimization branch from 0417b1c to 429acc5 Compare May 31, 2024 09:59
@ValentijnvdBeek ValentijnvdBeek force-pushed the vvandebe.shufflevector.basic branch from 2f752d1 to 3e692b4 Compare June 11, 2024 09:13
@ValentijnvdBeek ValentijnvdBeek force-pushed the vvandebe.shufflevector.pattern.optimization branch from 429acc5 to 72a0a03 Compare June 11, 2024 09:47
@ValentijnvdBeek ValentijnvdBeek changed the base branch from vvandebe.shufflevector.basic to aie-public June 11, 2024 09:49
@ValentijnvdBeek ValentijnvdBeek changed the base branch from aie-public to vvandebe.shufflevector.basic June 11, 2024 09:54
@ValentijnvdBeek ValentijnvdBeek force-pushed the vvandebe.shufflevector.pattern.optimization branch from 72a0a03 to 228e499 Compare June 14, 2024 12:31
@ValentijnvdBeek ValentijnvdBeek marked this pull request as ready for review June 14, 2024 12:34
@ValentijnvdBeek ValentijnvdBeek force-pushed the vvandebe.shufflevector.pattern.optimization branch from 228e499 to 0e664aa Compare June 14, 2024 12:56
; CHECK-GI-DOT-NEXT: addv s0, v0.4s
; CHECK-GI-DOT-NEXT: fmov w0, s0
; CHECK-GI-DOT-NEXT: ret
; CHECK-LABEL: add_pair_v8i16_v4i32_double_sext_zext_shuffle:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yay!

; CHECK-NEXT: mova r16, #13
; CHECK-NEXT: vextract.s32 r5, x2, r16
; CHECK-NEXT: j #.LBB0_3
; CHECK-NEXT: nopb ; nopa ; nops ; nopx ; vmov wl0, wh0; nopv
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀

@ValentijnvdBeek ValentijnvdBeek force-pushed the vvandebe.shufflevector.basic branch from 3e692b4 to 79d7e96 Compare June 24, 2024 10:08
@ValentijnvdBeek ValentijnvdBeek force-pushed the vvandebe.shufflevector.pattern.optimization branch 2 times, most recently from f872cf4 to 73a92c2 Compare June 25, 2024 09:36
@github-actions
Copy link

github-actions bot commented Jun 25, 2024

⚠️ C/C++ code formatter, clang-format found issues in your code. ⚠️

You can test this locally with the following command:
git-clang-format --diff ea89ed6223aa93251b9708c16b9d605ab30dccdb 07244e73f9104198bb9cb52273afc033367e0429 -- llvm/include/llvm/CodeGen/GlobalISel/CombinerHelper.h llvm/lib/CodeGen/GlobalISel/CombinerHelper.cpp llvm/lib/Target/AIE/AIE2PreLegalizerCombiner.cpp
View the diff from clang-format here.
diff --git a/llvm/lib/CodeGen/GlobalISel/CombinerHelper.cpp b/llvm/lib/CodeGen/GlobalISel/CombinerHelper.cpp
index 70297e7d43..b42b0b7c29 100644
--- a/llvm/lib/CodeGen/GlobalISel/CombinerHelper.cpp
+++ b/llvm/lib/CodeGen/GlobalISel/CombinerHelper.cpp
@@ -352,32 +352,31 @@ Register CombinerHelper::createUnmergeValue(
     TargetReg = MRI.createGenericVirtualRegister(HalfSizeTy);
   }
 
-    // Each destination fits n times into the source and each iteration we
-    // exactly half the source. Therefore we need to pick on which side we want
-    // to iterate on.
-    const uint32_t DstNumElements =
-        DstTy.isVector() ? DstTy.getNumElements() : 1;
-    const uint32_t HalfWay = Start + ((End - Start) / 2);
-    const uint32_t Position = DestinationIndex * DstNumElements;
-
-    uint32_t NextStart, NextEnd;
-    if (Position < HalfWay) {
-      Builder.buildInstr(TargetOpcode::G_UNMERGE_VALUES, {TargetReg, TmpReg},
-                         {SrcReg});
-      NextStart = Start;
-      NextEnd = HalfWay;
-    } else {
-      Builder.buildInstr(TargetOpcode::G_UNMERGE_VALUES, {TmpReg, TargetReg},
-                         {SrcReg});
-      NextStart = HalfWay;
-      NextEnd = End;
-    }
+  // Each destination fits n times into the source and each iteration we
+  // exactly half the source. Therefore we need to pick on which side we want
+  // to iterate on.
+  const uint32_t DstNumElements = DstTy.isVector() ? DstTy.getNumElements() : 1;
+  const uint32_t HalfWay = Start + ((End - Start) / 2);
+  const uint32_t Position = DestinationIndex * DstNumElements;
+
+  uint32_t NextStart, NextEnd;
+  if (Position < HalfWay) {
+    Builder.buildInstr(TargetOpcode::G_UNMERGE_VALUES, {TargetReg, TmpReg},
+                       {SrcReg});
+    NextStart = Start;
+    NextEnd = HalfWay;
+  } else {
+    Builder.buildInstr(TargetOpcode::G_UNMERGE_VALUES, {TmpReg, TargetReg},
+                       {SrcReg});
+    NextStart = HalfWay;
+    NextEnd = End;
+  }
 
-    if (HalfSizeTy.isVector() && DstTy != HalfSizeTy)
-      return createUnmergeValue(MI, TargetReg, DstReg, DestinationIndex,
-                                NextStart, NextEnd);
+  if (HalfSizeTy.isVector() && DstTy != HalfSizeTy)
+    return createUnmergeValue(MI, TargetReg, DstReg, DestinationIndex,
+                              NextStart, NextEnd);
 
-    return DstReg;
+  return DstReg;
 }
 
 bool CombinerHelper::tryCombineShuffleVector(MachineInstr &MI) {

Copy link
Collaborator

@konstantinschwarz konstantinschwarz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are plenty of upstream tests failing. Could you please take a look?

@ValentijnvdBeek ValentijnvdBeek force-pushed the vvandebe.shufflevector.pattern.optimization branch 2 times, most recently from b49d34c to f855c29 Compare August 12, 2024 12:27
@ValentijnvdBeek ValentijnvdBeek force-pushed the vvandebe.shufflevector.pattern.optimization branch from f855c29 to 4d6af83 Compare August 13, 2024 14:46
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious: Maybe IRTranslator already has a similar piece of code that we can re-use?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not from what I can see, G_UNMERGE_VALUES isn't meant for vectors to begin with. It only supports scalars, according to the documentation. Most backends just support this as an extension since there is no other opcode to do it. In theory, this will be ripped out in favour of a simple G_SUBREGISTER_EXTRACT in the near future if that opcode gets proper support. Utils.cpp does have something that is a bit similar, called llvm::extractParts, but that splits into two parts while this code goes through it like a binary tree.

In any case, I would keep it like this since this code should be replaced before upstreaming and that is easier if we are the only ones depending on it.

@ValentijnvdBeek ValentijnvdBeek force-pushed the vvandebe.shufflevector.pattern.optimization branch from 4d6af83 to 13cc82a Compare August 15, 2024 08:52
Comment on lines +272 to +275
Register createUnmergeValue(MachineInstr &MI, const Register SrcReg,
const Register DstReg, uint8_t DestinationIndex,
const uint32_t Start, const uint32_t End);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

G_UNMERGE_VALUE isn't really meant to be used for vectors like this, it is actually defined as a scalar opcode. There is a nice drop-in replacement opcode called G_EXTRACT_SUBVECTOR which has landed in our tree in early August, but there is no support in other backends for this yet. So this implementation is a necessary evil until the other opcode is able to replace it.

Comment on lines -45 to 47
; CHECK-GISEL: // %bb.0: // %entry
; CHECK-GISEL-NEXT: ext v0.16b, v0.16b, v1.16b, #8
; CHECK-GISEL-NEXT: // kill: def $d0 killed $d0 killed $q0
; CHECK-GISEL-NEXT: mov d0, v0.d[1]
; CHECK-GISEL-NEXT: ret
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test is:

define <8 x i8> @i8_off8(<16 x i8> %arg1, <16 x i8> %arg2) {
entry:
  %shuffle = shufflevector <16 x i8> %arg1, <16 x i8> %arg2, <8 x i32> <i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
  ret <8 x i8> %shuffle
}

This turns into:

bb.1.entry:
  liveins: $q0, $q1
  %0:_(<16 x s8>) = COPY $q0
  %3:_(<8 x s8>), %2:_(<8 x s8>) = G_UNMERGE_VALUES %0:_(<16 x s8>)
  $d0 = COPY %2:_(<8 x s8>)
  RET_ReallyLR implicit $d0

This is expected

Comment on lines -257 to 260
; CHECK-GISEL: // %bb.0: // %entry
; CHECK-GISEL-NEXT: movi v1.2d, #0000000000000000
; CHECK-GISEL-NEXT: ext v0.16b, v0.16b, v1.16b, #8
; CHECK-GISEL-NEXT: // kill: def $d0 killed $d0 killed $q0
; CHECK-GISEL-NEXT: mov d0, v0.d[1]
; CHECK-GISEL-NEXT: ret
entry:
%shuffle = shufflevector <16 x i8> %arg1, <16 x i8> zeroinitializer, <8 x i32> <i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is:

define <8 x i8> @i8_zero_off8(<16 x i8> %arg1) {
entry:
  %shuffle = shufflevector <16 x i8> %arg1, <16 x i8> zeroinitializer, <8 x i32> <i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
  ret <8 x i8> %shuffle
}

And turns into:

bb.1.entry:
  liveins: $q0, $q1
  %0:_(<16 x s8>) = COPY $q0
  %5:_(<8 x s8>), %4:_(<8 x s8>) = G_UNMERGE_VALUES %0:_(<16 x s8>)
  $d0 = COPY %4:_(<8 x s8>)
  RET_ReallyLR implicit $d0

this is expected

Comment on lines -3747 to 3757
; CHECK-GI-NEXT: ushll v4.4s, v0.4h, #0
; CHECK-GI-NEXT: ushll2 v0.4s, v0.8h, #0
; CHECK-GI-NEXT: ushll v5.4s, v1.4h, #0
; CHECK-GI-NEXT: ushll2 v1.4s, v1.8h, #0
; CHECK-GI-NEXT: ushll v6.4s, v2.4h, #0
; CHECK-GI-NEXT: ushll2 v2.4s, v2.8h, #0
; CHECK-GI-NEXT: ushll v7.4s, v3.4h, #0
; CHECK-GI-NEXT: ushll2 v3.4s, v3.8h, #0
; CHECK-GI-NEXT: add v0.4s, v4.4s, v0.4s
; CHECK-GI-NEXT: add v1.4s, v5.4s, v1.4s
; CHECK-GI-NEXT: add v2.4s, v6.4s, v2.4s
; CHECK-GI-NEXT: add v3.4s, v7.4s, v3.4s
; CHECK-GI-NEXT: uaddw2 v0.4s, v4.4s, v0.8h
; CHECK-GI-NEXT: uaddw2 v1.4s, v5.4s, v1.8h
; CHECK-GI-NEXT: uaddw2 v2.4s, v6.4s, v2.8h
; CHECK-GI-NEXT: uaddw2 v3.4s, v7.4s, v3.8h
; CHECK-GI-NEXT: add v0.4s, v0.4s, v1.4s
; CHECK-GI-NEXT: add v1.4s, v2.4s, v3.4s
; CHECK-GI-NEXT: add v0.4s, v0.4s, v1.4s
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is for me the most surprising test result and hardest to parse. The input LLVM IR looks like:

define i32 @add_pair_v8i16_v4i32_double_sext_zext_shuffle(<8 x i16> %ax, <8 x i16> %ay, <8 x i16> %bx, <8 x i16> %by) {
entry:
  %axx = zext <8 x i16> %ax to <8 x i32>
  %s1h = shufflevector <8 x i32> %axx, <8 x i32> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
  %s1l = shufflevector <8 x i32> %axx, <8 x i32> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>
  %axs = add <4 x i32> %s1h, %s1l
  %ayy = zext <8 x i16> %ay to <8 x i32>
  %s2h = shufflevector <8 x i32> %ayy, <8 x i32> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
  %s2l = shufflevector <8 x i32> %ayy, <8 x i32> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>
  %ays = add <4 x i32> %s2h, %s2l
  %az = add <4 x i32> %axs, %ays
  %bxx = zext <8 x i16> %bx to <8 x i32>
  %s3h = shufflevector <8 x i32> %bxx, <8 x i32> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
  %s3l = shufflevector <8 x i32> %bxx, <8 x i32> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>
  %bxs = add <4 x i32> %s3h, %s3l
  %byy = zext <8 x i16> %by to <8 x i32>
  %s4h = shufflevector <8 x i32> %byy, <8 x i32> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
  %s4l = shufflevector <8 x i32> %byy, <8 x i32> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>
  %bys = add <4 x i32> %s4h, %s4l
  %bz = add <4 x i32> %bxs, %bys
  %z = add <4 x i32> %az, %bz
  %z2 = call i32 @llvm.vector.reduce.add.v4i32(<4 x i32> %z)
  ret i32 %z2
}

This is translated into the following global isel IR:


bb.1.entry:                                                                                                                                                                                                                                                        liveins: $q0, $q1, $q2, $q3
  %0:_(<8 x s16>) = COPY $q0
  %1:_(<8 x s16>) = COPY $q1
  %2:_(<8 x s16>) = COPY $q2
  %3:_(<8 x s16>) = COPY $q3
  %33:_(<4 x s16>), %34:_(<4 x s16>) = G_UNMERGE_VALUES %0:_(<8 x s16>)
  %35:_(<4 x s32>) = G_ZEXT %33:_(<4 x s16>)
  %36:_(<4 x s32>) = G_ZEXT %34:_(<4 x s16>)
  %8:_(<4 x s32>) = G_ADD %35:_, %36:_
  %37:_(<4 x s16>), %38:_(<4 x s16>) = G_UNMERGE_VALUES %1:_(<8 x s16>)
  %39:_(<4 x s32>) = G_ZEXT %37:_(<4 x s16>)
  %40:_(<4 x s32>) = G_ZEXT %38:_(<4 x s16>)
  %12:_(<4 x s32>) = G_ADD %39:_, %40:_
  %13:_(<4 x s32>) = G_ADD %8:_, %12:_
  %41:_(<4 x s16>), %42:_(<4 x s16>) = G_UNMERGE_VALUES %2:_(<8 x s16>)
  %43:_(<4 x s32>) = G_ZEXT %41:_(<4 x s16>)
  %44:_(<4 x s32>) = G_ZEXT %42:_(<4 x s16>)
  %17:_(<4 x s32>) = G_ADD %43:_, %44:_
  %45:_(<4 x s16>), %46:_(<4 x s16>) = G_UNMERGE_VALUES %3:_(<8 x s16>)
  %47:_(<4 x s32>) = G_ZEXT %45:_(<4 x s16>)
  %48:_(<4 x s32>) = G_ZEXT %46:_(<4 x s16>)
  %21:_(<4 x s32>) = G_ADD %47:_, %48:_
  %22:_(<4 x s32>) = G_ADD %17:_, %21:_
  %23:_(<4 x s32>) = G_ADD %13:_, %22:_
  %24:_(s32) = G_VECREDUCE_ADD %23:_(<4 x s32>)
  $w0 = COPY %24:_(s32)
  RET_ReallyLR implicit $w0
                                                                                                                                               

This corresponds to the the shufflevectors perfectly and actually shows that they simplify the unmerges as well, which is a welcome bonus.

if (DstNumElts <= 2)
return false;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe do these tests in a callback? Now, if one fails, it will exit the combiner completely rather than continuing on. For now, this is fine, but with more patterns that are close, this might be problems.

; CHECK-GI-NEXT: mov v0.b[6], v7.b[0]
; CHECK-GI-NEXT: mov v0.b[7], v16.b[0]
; CHECK-GI-NEXT: tbl v0.16b, { v0.16b, v1.16b }, v2.16b
; CHECK-GI-NEXT: mov v0.d[1], v1.d[0]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The input LLVM IR is:

──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────define <16 x i8> @test_concat_v16i8_v8i8_v16i8(<8 x i8> %x, <16 x i8> %y) #0 {
entry:
  %vecext = extractelement <8 x i8> %x, i32 0
  %vecinit = insertelement <16 x i8> undef, i8 %vecext, i32 0
  %vecext1 = extractelement <8 x i8> %x, i32 1
  %vecinit2 = insertelement <16 x i8> %vecinit, i8 %vecext1, i32 1
  %vecext3 = extractelement <8 x i8> %x, i32 2
  %vecinit4 = insertelement <16 x i8> %vecinit2, i8 %vecext3, i32 2
  %vecext5 = extractelement <8 x i8> %x, i32 3
  %vecinit6 = insertelement <16 x i8> %vecinit4, i8 %vecext5, i32 3
  %vecext7 = extractelement <8 x i8> %x, i32 4
  %vecinit8 = insertelement <16 x i8> %vecinit6, i8 %vecext7, i32 4
  %vecext9 = extractelement <8 x i8> %x, i32 5
  %vecinit10 = insertelement <16 x i8> %vecinit8, i8 %vecext9, i32 5
  %vecext11 = extractelement <8 x i8> %x, i32 6
  %vecinit12 = insertelement <16 x i8> %vecinit10, i8 %vecext11, i32 6
  %vecext13 = extractelement <8 x i8> %x, i32 7
  %vecinit14 = insertelement <16 x i8> %vecinit12, i8 %vecext13, i32 7
  %vecinit30 = shufflevector <16 x i8> %vecinit14, <16 x i8> %y, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 16, i32 17, i32 18, i32 19, i32 20, i32 21, i32 22, i32 2>
  ret <16 x i8> %vecinit30
}

Our outputted post-combiner IR is:

bb.1.entry:
  liveins: $d0, $q1
  %0:_(<8 x s8>) = COPY $d0
  %1:_(<16 x s8>) = COPY $q1
  %3:_(s64) = G_CONSTANT i64 0
  %2:_(s8) = G_EXTRACT_VECTOR_ELT %0:_(<8 x s8>), %3:_(s64)
  %7:_(s64) = G_CONSTANT i64 1
  %6:_(s8) = G_EXTRACT_VECTOR_ELT %0:_(<8 x s8>), %7:_(s64)
  %10:_(s64) = G_CONSTANT i64 2
  %9:_(s8) = G_EXTRACT_VECTOR_ELT %0:_(<8 x s8>), %10:_(s64)
  %13:_(s64) = G_CONSTANT i64 3
  %12:_(s8) = G_EXTRACT_VECTOR_ELT %0:_(<8 x s8>), %13:_(s64)
  %16:_(s64) = G_CONSTANT i64 4
  %15:_(s8) = G_EXTRACT_VECTOR_ELT %0:_(<8 x s8>), %16:_(s64)
  %19:_(s64) = G_CONSTANT i64 5
  %18:_(s8) = G_EXTRACT_VECTOR_ELT %0:_(<8 x s8>), %19:_(s64)
  %22:_(s64) = G_CONSTANT i64 6
  %21:_(s8) = G_EXTRACT_VECTOR_ELT %0:_(<8 x s8>), %22:_(s64)
  %25:_(s64) = G_CONSTANT i64 7
  %24:_(s8) = G_EXTRACT_VECTOR_ELT %0:_(<8 x s8>), %25:_(s64)
  %34:_(<8 x s8>) = G_BUILD_VECTOR %2:_(s8), %6:_(s8), %9:_(s8), %12:_(s8), %15:_(s8), %18:_(s8), %21:_(s8), %24:_(s8)
  %31:_(<8 x s8>), %32:_(<8 x s8>) = G_UNMERGE_VALUES %1:_(<16 x s8>)
  %27:_(<16 x s8>) = G_CONCAT_VECTORS %34:_(<8 x s8>), %31:_(<8 x s8>)
  $q0 = COPY %27:_(<16 x s8>)
  RET_ReallyLR implicit $q0

The IR is alright, it correctly puts the vector together, however there is a combiner missing that takes the extractions, turns it into a shufflevector and then optimizes it out.

Comment on lines +2140 to 2144
; CHECK-GI-NEXT: // kill: def $d0 killed $d0 def $q0
; CHECK-GI-NEXT: mov s2, v0.s[1]
; CHECK-GI-NEXT: mov v0.s[1], v2.s[0]
; CHECK-GI-NEXT: ldr q2, [x8, :lo12:.LCPI135_0]
; CHECK-GI-NEXT: tbl v0.16b, { v0.16b, v1.16b }, v2.16b
; CHECK-GI-NEXT: mov v0.d[1], v1.d[0]
; CHECK-GI-NEXT: ret
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above

@ValentijnvdBeek
Copy link
Contributor Author

To explain the lateness of the reviews: ASUS managed to lose my hawkpoint laptop and have been promising that they will ship it soon (tm) for the past three weeks. I will see if I can get some progress done on my old 4 core

ValentijnvdBeek and others added 2 commits September 23, 2024 23:53
These generators are used to match onto shufflemask for optimizations.
The idea is that each shufflemask essentially encodes a function that
turns one vector into another. Generators are those functions and
allow us to match shufflevectors by generating masks. Since masks
are frequently very similar, this allows to define many masks in
relatively few lines.
@ValentijnvdBeek ValentijnvdBeek force-pushed the vvandebe.shufflevector.pattern.optimization branch from 13cc82a to c38562a Compare September 23, 2024 22:30
@ValentijnvdBeek ValentijnvdBeek force-pushed the vvandebe.shufflevector.pattern.optimization branch from c38562a to ebe6489 Compare September 23, 2024 22:36
@ValentijnvdBeek
Copy link
Contributor Author

@konstantinschwarz @gbossu friendly review bump. Hopefully the tests don't fail (I forgot where the git-clang-format hook was again) :)

Changes from the last time:

  • Split some changes up into different commits
  • Fixed a few nits (the ones recently in a nit commit as I am bettering my life)
  • Explained the assembly changes outside AIE2
  • Rebased onto current Main

It is a bit late, so I hope I haven't missed anything. Good luck today!

@ValentijnvdBeek
Copy link
Contributor Author

FYI: there will be a conflict issue whenever LLVM main is merged into this branch due to CombinerHelper::tryCombineShuffleVector being split up into two parts. To incorporate it into this branch, you simply put the stream matching code in the new matchVectorMaskSequence function and it should be fine.

The relevant PR is: llvm/llvm-project#110545

@jsetoain the above PR is interesting for you since it sidesteps the need to convert into shufflevector explicitly. It will optimize sequences of inserts/extracts/buildvector automatically. You may want to cherry-pick the original version which turns into a G_SHUFFLE_VECTOR and then optimizes that rather than the current one that runs the optimizer directly.

Cheers!

@konstantinschwarz
Copy link
Collaborator

Hi @ValentijnvdBeek, I finally came back to this, sorry for the long delay.
In #224 I implemented an alternative combine for matching the G_SHUFFLE_VECTOR -> G_EXTRACT_SUBVECTOR (still G_UNMERGE_VALUES until the next upstream sync).

It uses the already available helper functions for creating contiguous shuffle masks. WDYT?

@ValentijnvdBeek
Copy link
Contributor Author

Hi @ValentijnvdBeek, I finally came back to this, sorry for the long delay. In #224 I implemented an alternative combine for matching the G_SHUFFLE_VECTOR -> G_EXTRACT_SUBVECTOR (still G_UNMERGE_VALUES until the next upstream sync).

It uses the already available helper functions for creating contiguous shuffle masks. WDYT?

It is fine, I am just happy if the work doesn't totally bit rot. I like the linked code, for the most part, it takes my approach and makes it less of a tech demo (which this sort of is a bit). You do cover a part of the approach though, which admittedly the one that this implements, but there are some worries that I have about the general cost of this combiner in the future. It is pretty expensive and putting it in many different combiners does add a lot of code (and causes it to check the same values a lot). But as an approach within AIE, that is probably fine.

So as of this PR, what could be an idea is that I close it and you guys pick it over for those combiners. I will then take it back, be inspired by your approach and rewrite it with some of my crazier ideas to reduce the complexity so that I can push it upstream directly. Together with maybe #129, that way it is also out of your hair and review todo list. Is that an idea @konstantinschwarz?

isoard-amd pushed a commit that referenced this pull request Jul 24, 2025
…arallel fusion #94391 (#97607)"

This reverts commit edbc0e3.

Reason for rollback. ASAN complains about this PR:

==4320==ERROR: AddressSanitizer: heap-use-after-free on address 0x502000006cd8 at pc 0x55e2978d63cf bp 0x7ffe6431c2b0 sp 0x7ffe6431c2a8
READ of size 8 at 0x502000006cd8 thread T0
    #0 0x55e2978d63ce in map<llvm::MutableArrayRef<mlir::BlockArgument> &, llvm::MutableArrayRef<mlir::BlockArgument>, nullptr> mlir/include/mlir/IR/IRMapping.h:40:11
    #1 0x55e2978d63ce in mlir::createFused(mlir::LoopLikeOpInterface, mlir::LoopLikeOpInterface, mlir::RewriterBase&, std::__u::function<llvm::SmallVector<mlir::Value, 6u> (mlir::OpBuilder&, mlir::Location, llvm::ArrayRef<mlir::BlockArgument>)>, llvm::function_ref<void (mlir::RewriterBase&, mlir::LoopLikeOpInterface, mlir::LoopLikeOpInterface&, mlir::IRMapping)>) mlir/lib/Interfaces/LoopLikeInterface.cpp:156:11
    #2 0x55e2952a614b in mlir::fuseIndependentSiblingForLoops(mlir::scf::ForOp, mlir::scf::ForOp, mlir::RewriterBase&) mlir/lib/Dialect/SCF/Utils/Utils.cpp:1398:43
    #3 0x55e291480c6f in mlir::transform::LoopFuseSiblingOp::apply(mlir::transform::TransformRewriter&, mlir::transform::TransformResults&, mlir::transform::TransformState&) mlir/lib/Dialect/SCF/TransformOps/SCFTransformOps.cpp:482:17
    #4 0x55e29149ed5e in mlir::transform::detail::TransformOpInterfaceInterfaceTraits::Model<mlir::transform::LoopFuseSiblingOp>::apply(mlir::transform::detail::TransformOpInterfaceInterfaceTraits::Concept const*, mlir::Operation*, mlir::transform::TransformRewriter&, mlir::transform::TransformResults&, mlir::transform::TransformState&) blaze-out/k8-opt-asan/bin/mlir/include/mlir/Dialect/Transform/Interfaces/TransformInterfaces.h.inc:477:56
    #5 0x55e297494a60 in apply blaze-out/k8-opt-asan/bin/mlir/include/mlir/Dialect/Transform/Interfaces/TransformInterfaces.cpp.inc:61:14
    #6 0x55e297494a60 in mlir::transform::TransformState::applyTransform(mlir::transform::TransformOpInterface) mlir/lib/Dialect/Transform/Interfaces/TransformInterfaces.cpp:953:48
    #7 0x55e294646a8d in applySequenceBlock(mlir::Block&, mlir::transform::FailurePropagationMode, mlir::transform::TransformState&, mlir::transform::TransformResults&) mlir/lib/Dialect/Transform/IR/TransformOps.cpp:1788:15
    #8 0x55e29464f927 in mlir::transform::NamedSequenceOp::apply(mlir::transform::TransformRewriter&, mlir::transform::TransformResults&, mlir::transform::TransformState&) mlir/lib/Dialect/Transform/IR/TransformOps.cpp:2155:10
    #9 0x55e2945d28ee in mlir::transform::detail::TransformOpInterfaceInterfaceTraits::Model<mlir::transform::NamedSequenceOp>::apply(mlir::transform::detail::TransformOpInterfaceInterfaceTraits::Concept const*, mlir::Operation*, mlir::transform::TransformRewriter&, mlir::transform::TransformResults&, mlir::transform::TransformState&) blaze-out/k8-opt-asan/bin/mlir/include/mlir/Dialect/Transform/Interfaces/TransformInterfaces.h.inc:477:56
    #10 0x55e297494a60 in apply blaze-out/k8-opt-asan/bin/mlir/include/mlir/Dialect/Transform/Interfaces/TransformInterfaces.cpp.inc:61:14
    #11 0x55e297494a60 in mlir::transform::TransformState::applyTransform(mlir::transform::TransformOpInterface) mlir/lib/Dialect/Transform/Interfaces/TransformInterfaces.cpp:953:48
    #12 0x55e2974a5fe2 in mlir::transform::applyTransforms(mlir::Operation*, mlir::transform::TransformOpInterface, mlir::RaggedArray<llvm::PointerUnion<mlir::Operation*, mlir::Attribute, mlir::Value>> const&, mlir::transform::TransformOptions const&, bool) mlir/lib/Dialect/Transform/Interfaces/TransformInterfaces.cpp:2016:16
    #13 0x55e2945888d7 in mlir::transform::applyTransformNamedSequence(mlir::RaggedArray<llvm::PointerUnion<mlir::Operation*, mlir::Attribute, mlir::Value>>, mlir::transform::TransformOpInterface, mlir::ModuleOp, mlir::transform::TransformOptions const&) mlir/lib/Dialect/Transform/Transforms/TransformInterpreterUtils.cpp:234:10
    #14 0x55e294582446 in (anonymous namespace)::InterpreterPass::runOnOperation() mlir/lib/Dialect/Transform/Transforms/InterpreterPass.cpp:147:16
    #15 0x55e2978e93c6 in operator() mlir/lib/Pass/Pass.cpp:527:17
    #16 0x55e2978e93c6 in void llvm::function_ref<void ()>::callback_fn<mlir::detail::OpToOpPassAdaptor::run(mlir::Pass*, mlir::Operation*, mlir::AnalysisManager, bool, unsigned int)::$_1>(long) llvm/include/llvm/ADT/STLFunctionalExtras.h:45:12
    #17 0x55e2978e207a in operator() llvm/include/llvm/ADT/STLFunctionalExtras.h:68:12
    #18 0x55e2978e207a in executeAction<mlir::PassExecutionAction, mlir::Pass &> mlir/include/mlir/IR/MLIRContext.h:275:7
    #19 0x55e2978e207a in mlir::detail::OpToOpPassAdaptor::run(mlir::Pass*, mlir::Operation*, mlir::AnalysisManager, bool, unsigned int) mlir/lib/Pass/Pass.cpp:521:21
    #20 0x55e2978e5fbf in runPipeline mlir/lib/Pass/Pass.cpp:593:16
    #21 0x55e2978e5fbf in mlir::PassManager::runPasses(mlir::Operation*, mlir::AnalysisManager) mlir/lib/Pass/Pass.cpp:904:10
    #22 0x55e2978e5b65 in mlir::PassManager::run(mlir::Operation*) mlir/lib/Pass/Pass.cpp:884:60
    #23 0x55e291ebb460 in performActions(llvm::raw_ostream&, std::__u::shared_ptr<llvm::SourceMgr> const&, mlir::MLIRContext*, mlir::MlirOptMainConfig const&) mlir/lib/Tools/mlir-opt/MlirOptMain.cpp:408:17
    #24 0x55e291ebabd9 in processBuffer mlir/lib/Tools/mlir-opt/MlirOptMain.cpp:481:9
    #25 0x55e291ebabd9 in operator() mlir/lib/Tools/mlir-opt/MlirOptMain.cpp:548:12
    #26 0x55e291ebabd9 in llvm::LogicalResult llvm::function_ref<llvm::LogicalResult (std::__u::unique_ptr<llvm::MemoryBuffer, std::__u::default_delete<llvm::MemoryBuffer>>, llvm::raw_ostream&)>::callback_fn<mlir::MlirOptMain(llvm::raw_ostream&, std::__u::unique_ptr<llvm::MemoryBuffer, std::__u::default_delete<llvm::MemoryBuffer>>, mlir::DialectRegistry&, mlir::MlirOptMainConfig const&)::$_0>(long, std::__u::unique_ptr<llvm::MemoryBuffer, std::__u::default_delete<llvm::MemoryBuffer>>, llvm::raw_ostream&) llvm/include/llvm/ADT/STLFunctionalExtras.h:45:12
    #27 0x55e297b1cffe in operator() llvm/include/llvm/ADT/STLFunctionalExtras.h:68:12
    #28 0x55e297b1cffe in mlir::splitAndProcessBuffer(std::__u::unique_ptr<llvm::MemoryBuffer, std::__u::default_delete<llvm::MemoryBuffer>>, llvm::function_ref<llvm::LogicalResult (std::__u::unique_ptr<llvm::MemoryBuffer, std::__u::default_delete<llvm::MemoryBuffer>>, llvm::raw_ostream&)>, llvm::raw_ostream&, llvm::StringRef, llvm::StringRef)::$_0::operator()(llvm::StringRef) const mlir/lib/Support/ToolUtilities.cpp:86:16
    #29 0x55e297b1c9c5 in interleave<const llvm::StringRef *, (lambda at mlir/lib/Support/ToolUtilities.cpp:79:23), (lambda at llvm/include/llvm/ADT/STLExtras.h:2147:49), void> llvm/include/llvm/ADT/STLExtras.h:2125:3
    #30 0x55e297b1c9c5 in interleave<llvm::SmallVector<llvm::StringRef, 8U>, (lambda at mlir/lib/Support/ToolUtilities.cpp:79:23), llvm::raw_ostream, llvm::StringRef> llvm/include/llvm/ADT/STLExtras.h:2147:3
    #31 0x55e297b1c9c5 in mlir::splitAndProcessBuffer(std::__u::unique_ptr<llvm::MemoryBuffer, std::__u::default_delete<llvm::MemoryBuffer>>, llvm::function_ref<llvm::LogicalResult (std::__u::unique_ptr<llvm::MemoryBuffer, std::__u::default_delete<llvm::MemoryBuffer>>, llvm::raw_ostream&)>, llvm::raw_ostream&, llvm::StringRef, llvm::StringRef) mlir/lib/Support/ToolUtilities.cpp:89:3
    #32 0x55e291eb0cf0 in mlir::MlirOptMain(llvm::raw_ostream&, std::__u::unique_ptr<llvm::MemoryBuffer, std::__u::default_delete<llvm::MemoryBuffer>>, mlir::DialectRegistry&, mlir::MlirOptMainConfig const&) mlir/lib/Tools/mlir-opt/MlirOptMain.cpp:551:10
    #33 0x55e291eb115c in mlir::MlirOptMain(int, char**, llvm::StringRef, llvm::StringRef, mlir::DialectRegistry&) mlir/lib/Tools/mlir-opt/MlirOptMain.cpp:589:14
    #34 0x55e291eb15f8 in mlir::MlirOptMain(int, char**, llvm::StringRef, mlir::DialectRegistry&) mlir/lib/Tools/mlir-opt/MlirOptMain.cpp:605:10
    #35 0x55e29130d1be in main mlir/tools/mlir-opt/mlir-opt.cpp:311:33
    #36 0x7fbcf3fff3d3 in __libc_start_main (/usr/grte/v5/lib64/libc.so.6+0x613d3) (BuildId: 9a996398ce14a94560b0c642eb4f6e94)
    #37 0x55e2912365a9 in _start /usr/grte/v5/debug-src/src/csu/../sysdeps/x86_64/start.S:120

0x502000006cd8 is located 8 bytes inside of 16-byte region [0x502000006cd0,0x502000006ce0)
freed by thread T0 here:
    #0 0x55e29130b7e2 in operator delete(void*, unsigned long) compiler-rt/lib/asan/asan_new_delete.cpp:155:3
    #1 0x55e2979eb657 in __libcpp_operator_delete<void *, unsigned long>
    #2 0x55e2979eb657 in __do_deallocate_handle_size<>
    #3 0x55e2979eb657 in __libcpp_deallocate
    #4 0x55e2979eb657 in deallocate
    #5 0x55e2979eb657 in deallocate
    #6 0x55e2979eb657 in operator()
    #7 0x55e2979eb657 in ~vector
    #8 0x55e2979eb657 in mlir::Block::~Block() mlir/lib/IR/Block.cpp:24:1
    #9 0x55e2979ebc17 in deleteNode llvm/include/llvm/ADT/ilist.h:42:39
    #10 0x55e2979ebc17 in erase llvm/include/llvm/ADT/ilist.h:205:5
    #11 0x55e2979ebc17 in erase llvm/include/llvm/ADT/ilist.h:209:39
    #12 0x55e2979ebc17 in mlir::Block::erase() mlir/lib/IR/Block.cpp:67:28
    #13 0x55e297aef978 in mlir::RewriterBase::eraseBlock(mlir::Block*) mlir/lib/IR/PatternMatch.cpp:245:10
    #14 0x55e297af0563 in mlir::RewriterBase::inlineBlockBefore(mlir::Block*, mlir::Block*, llvm::ilist_iterator<llvm::ilist_detail::node_options<mlir::Operation, false, false, void, false, void>, false, false>, mlir::ValueRange) mlir/lib/IR/PatternMatch.cpp:331:3
    #15 0x55e297af06d8 in mlir::RewriterBase::mergeBlocks(mlir::Block*, mlir::Block*, mlir::ValueRange) mlir/lib/IR/PatternMatch.cpp:341:3
    #16 0x55e297036608 in mlir::scf::ForOp::replaceWithAdditionalYields(mlir::RewriterBase&, mlir::ValueRange, bool, std::__u::function<llvm::SmallVector<mlir::Value, 6u> (mlir::OpBuilder&, mlir::Location, llvm::ArrayRef<mlir::BlockArgument>)> const&) mlir/lib/Dialect/SCF/IR/SCF.cpp:575:12
    #17 0x55e2970673ca in mlir::detail::LoopLikeOpInterfaceInterfaceTraits::Model<mlir::scf::ForOp>::replaceWithAdditionalYields(mlir::detail::LoopLikeOpInterfaceInterfaceTraits::Concept const*, mlir::Operation*, mlir::RewriterBase&, mlir::ValueRange, bool, std::__u::function<llvm::SmallVector<mlir::Value, 6u> (mlir::OpBuilder&, mlir::Location, llvm::ArrayRef<mlir::BlockArgument>)> const&) blaze-out/k8-opt-asan/bin/mlir/include/mlir/Interfaces/LoopLikeInterface.h.inc:658:56
    #18 0x55e2978d5feb in replaceWithAdditionalYields blaze-out/k8-opt-asan/bin/mlir/include/mlir/Interfaces/LoopLikeInterface.cpp.inc:105:14
    #19 0x55e2978d5feb in mlir::createFused(mlir::LoopLikeOpInterface, mlir::LoopLikeOpInterface, mlir::RewriterBase&, std::__u::function<llvm::SmallVector<mlir::Value, 6u> (mlir::OpBuilder&, mlir::Location, llvm::ArrayRef<mlir::BlockArgument>)>, llvm::function_ref<void (mlir::RewriterBase&, mlir::LoopLikeOpInterface, mlir::LoopLikeOpInterface&, mlir::IRMapping)>) mlir/lib/Interfaces/LoopLikeInterface.cpp:135:14
    #20 0x55e2952a614b in mlir::fuseIndependentSiblingForLoops(mlir::scf::ForOp, mlir::scf::ForOp, mlir::RewriterBase&) mlir/lib/Dialect/SCF/Utils/Utils.cpp:1398:43
    #21 0x55e291480c6f in mlir::transform::LoopFuseSiblingOp::apply(mlir::transform::TransformRewriter&, mlir::transform::TransformResults&, mlir::transform::TransformState&) mlir/lib/Dialect/SCF/TransformOps/SCFTransformOps.cpp:482:17
    #22 0x55e29149ed5e in mlir::transform::detail::TransformOpInterfaceInterfaceTraits::Model<mlir::transform::LoopFuseSiblingOp>::apply(mlir::transform::detail::TransformOpInterfaceInterfaceTraits::Concept const*, mlir::Operation*, mlir::transform::TransformRewriter&, mlir::transform::TransformResults&, mlir::transform::TransformState&) blaze-out/k8-opt-asan/bin/mlir/include/mlir/Dialect/Transform/Interfaces/TransformInterfaces.h.inc:477:56
    #23 0x55e297494a60 in apply blaze-out/k8-opt-asan/bin/mlir/include/mlir/Dialect/Transform/Interfaces/TransformInterfaces.cpp.inc:61:14
    #24 0x55e297494a60 in mlir::transform::TransformState::applyTransform(mlir::transform::TransformOpInterface) mlir/lib/Dialect/Transform/Interfaces/TransformInterfaces.cpp:953:48
    #25 0x55e294646a8d in applySequenceBlock(mlir::Block&, mlir::transform::FailurePropagationMode, mlir::transform::TransformState&, mlir::transform::TransformResults&) mlir/lib/Dialect/Transform/IR/TransformOps.cpp:1788:15
    #26 0x55e29464f927 in mlir::transform::NamedSequenceOp::apply(mlir::transform::TransformRewriter&, mlir::transform::TransformResults&, mlir::transform::TransformState&) mlir/lib/Dialect/Transform/IR/TransformOps.cpp:2155:10
    #27 0x55e2945d28ee in mlir::transform::detail::TransformOpInterfaceInterfaceTraits::Model<mlir::transform::NamedSequenceOp>::apply(mlir::transform::detail::TransformOpInterfaceInterfaceTraits::Concept const*, mlir::Operation*, mlir::transform::TransformRewriter&, mlir::transform::TransformResults&, mlir::transform::TransformState&) blaze-out/k8-opt-asan/bin/mlir/include/mlir/Dialect/Transform/Interfaces/TransformInterfaces.h.inc:477:56
    #28 0x55e297494a60 in apply blaze-out/k8-opt-asan/bin/mlir/include/mlir/Dialect/Transform/Interfaces/TransformInterfaces.cpp.inc:61:14
    #29 0x55e297494a60 in mlir::transform::TransformState::applyTransform(mlir::transform::TransformOpInterface) mlir/lib/Dialect/Transform/Interfaces/TransformInterfaces.cpp:953:48
    #30 0x55e2974a5fe2 in mlir::transform::applyTransforms(mlir::Operation*, mlir::transform::TransformOpInterface, mlir::RaggedArray<llvm::PointerUnion<mlir::Operation*, mlir::Attribute, mlir::Value>> const&, mlir::transform::TransformOptions const&, bool) mlir/lib/Dialect/Transform/Interfaces/TransformInterfaces.cpp:2016:16
    #31 0x55e2945888d7 in mlir::transform::applyTransformNamedSequence(mlir::RaggedArray<llvm::PointerUnion<mlir::Operation*, mlir::Attribute, mlir::Value>>, mlir::transform::TransformOpInterface, mlir::ModuleOp, mlir::transform::TransformOptions const&) mlir/lib/Dialect/Transform/Transforms/TransformInterpreterUtils.cpp:234:10
    #32 0x55e294582446 in (anonymous namespace)::InterpreterPass::runOnOperation() mlir/lib/Dialect/Transform/Transforms/InterpreterPass.cpp:147:16
    #33 0x55e2978e93c6 in operator() mlir/lib/Pass/Pass.cpp:527:17
    #34 0x55e2978e93c6 in void llvm::function_ref<void ()>::callback_fn<mlir::detail::OpToOpPassAdaptor::run(mlir::Pass*, mlir::Operation*, mlir::AnalysisManager, bool, unsigned int)::$_1>(long) llvm/include/llvm/ADT/STLFunctionalExtras.h:45:12
    #35 0x55e2978e207a in operator() llvm/include/llvm/ADT/STLFunctionalExtras.h:68:12
    #36 0x55e2978e207a in executeAction<mlir::PassExecutionAction, mlir::Pass &> mlir/include/mlir/IR/MLIRContext.h:275:7
    #37 0x55e2978e207a in mlir::detail::OpToOpPassAdaptor::run(mlir::Pass*, mlir::Operation*, mlir::AnalysisManager, bool, unsigned int) mlir/lib/Pass/Pass.cpp:521:21
    #38 0x55e2978e5fbf in runPipeline mlir/lib/Pass/Pass.cpp:593:16
    #39 0x55e2978e5fbf in mlir::PassManager::runPasses(mlir::Operation*, mlir::AnalysisManager) mlir/lib/Pass/Pass.cpp:904:10
    #40 0x55e2978e5b65 in mlir::PassManager::run(mlir::Operation*) mlir/lib/Pass/Pass.cpp:884:60
    #41 0x55e291ebb460 in performActions(llvm::raw_ostream&, std::__u::shared_ptr<llvm::SourceMgr> const&, mlir::MLIRContext*, mlir::MlirOptMainConfig const&) mlir/lib/Tools/mlir-opt/MlirOptMain.cpp:408:17
    #42 0x55e291ebabd9 in processBuffer mlir/lib/Tools/mlir-opt/MlirOptMain.cpp:481:9
    #43 0x55e291ebabd9 in operator() mlir/lib/Tools/mlir-opt/MlirOptMain.cpp:548:12
    #44 0x55e291ebabd9 in llvm::LogicalResult llvm::function_ref<llvm::LogicalResult (std::__u::unique_ptr<llvm::MemoryBuffer, std::__u::default_delete<llvm::MemoryBuffer>>, llvm::raw_ostream&)>::callback_fn<mlir::MlirOptMain(llvm::raw_ostream&, std::__u::unique_ptr<llvm::MemoryBuffer, std::__u::default_delete<llvm::MemoryBuffer>>, mlir::DialectRegistry&, mlir::MlirOptMainConfig const&)::$_0>(long, std::__u::unique_ptr<llvm::MemoryBuffer, std::__u::default_delete<llvm::MemoryBuffer>>, llvm::raw_ostream&) llvm/include/llvm/ADT/STLFunctionalExtras.h:45:12
    #45 0x55e297b1cffe in operator() llvm/include/llvm/ADT/STLFunctionalExtras.h:68:12
    #46 0x55e297b1cffe in mlir::splitAndProcessBuffer(std::__u::unique_ptr<llvm::MemoryBuffer, std::__u::default_delete<llvm::MemoryBuffer>>, llvm::function_ref<llvm::LogicalResult (std::__u::unique_ptr<llvm::MemoryBuffer, std::__u::default_delete<llvm::MemoryBuffer>>, llvm::raw_ostream&)>, llvm::raw_ostream&, llvm::StringRef, llvm::StringRef)::$_0::operator()(llvm::StringRef) const mlir/lib/Support/ToolUtilities.cpp:86:16
    #47 0x55e297b1c9c5 in interleave<const llvm::StringRef *, (lambda at mlir/lib/Support/ToolUtilities.cpp:79:23), (lambda at llvm/include/llvm/ADT/STLExtras.h:2147:49), void> llvm/include/llvm/ADT/STLExtras.h:2125:3
    #48 0x55e297b1c9c5 in interleave<llvm::SmallVector<llvm::StringRef, 8U>, (lambda at mlir/lib/Support/ToolUtilities.cpp:79:23), llvm::raw_ostream, llvm::StringRef> llvm/include/llvm/ADT/STLExtras.h:2147:3
    #49 0x55e297b1c9c5 in mlir::splitAndProcessBuffer(std::__u::unique_ptr<llvm::MemoryBuffer, std::__u::default_delete<llvm::MemoryBuffer>>, llvm::function_ref<llvm::LogicalResult (std::__u::unique_ptr<llvm::MemoryBuffer, std::__u::default_delete<llvm::MemoryBuffer>>, llvm::raw_ostream&)>, llvm::raw_ostream&, llvm::StringRef, llvm::StringRef) mlir/lib/Support/ToolUtilities.cpp:89:3
    #50 0x55e291eb0cf0 in mlir::MlirOptMain(llvm::raw_ostream&, std::__u::unique_ptr<llvm::MemoryBuffer, std::__u::default_delete<llvm::MemoryBuffer>>, mlir::DialectRegistry&, mlir::MlirOptMainConfig const&) mlir/lib/Tools/mlir-opt/MlirOptMain.cpp:551:10
    #51 0x55e291eb115c in mlir::MlirOptMain(int, char**, llvm::StringRef, llvm::StringRef, mlir::DialectRegistry&) mlir/lib/Tools/mlir-opt/MlirOptMain.cpp:589:14

previously allocated by thread T0 here:
    #0 0x55e29130ab5d in operator new(unsigned long) compiler-rt/lib/asan/asan_new_delete.cpp:86:3
    #1 0x55e2979ed5d4 in __libcpp_operator_new<unsigned long>
    #2 0x55e2979ed5d4 in __libcpp_allocate
    #3 0x55e2979ed5d4 in allocate
    #4 0x55e2979ed5d4 in __allocate_at_least<std::__u::allocator<mlir::BlockArgument> >
    #5 0x55e2979ed5d4 in __split_buffer
    #6 0x55e2979ed5d4 in mlir::BlockArgument* std::__u::vector<mlir::BlockArgument, std::__u::allocator<mlir::BlockArgument>>::__push_back_slow_path<mlir::BlockArgument const&>(mlir::BlockArgument const&)
    #7 0x55e2979ec0f2 in push_back
    #8 0x55e2979ec0f2 in mlir::Block::addArgument(mlir::Type, mlir::Location) mlir/lib/IR/Block.cpp:154:13
    #9 0x55e29796e457 in parseRegionBody mlir/lib/AsmParser/Parser.cpp:2172:34
    #10 0x55e29796e457 in (anonymous namespace)::OperationParser::parseRegion(mlir::Region&, llvm::ArrayRef<mlir::OpAsmParser::Argument>, bool) mlir/lib/AsmParser/Parser.cpp:2121:7
    #11 0x55e29796b25e in (anonymous namespace)::CustomOpAsmParser::parseRegion(mlir::Region&, llvm::ArrayRef<mlir::OpAsmParser::Argument>, bool) mlir/lib/AsmParser/Parser.cpp:1785:16
    #12 0x55e297035742 in mlir::scf::ForOp::parse(mlir::OpAsmParser&, mlir::OperationState&) mlir/lib/Dialect/SCF/IR/SCF.cpp:521:14
    #13 0x55e291322c18 in llvm::ParseResult llvm::detail::UniqueFunctionBase<llvm::ParseResult, mlir::OpAsmParser&, mlir::OperationState&>::CallImpl<llvm::ParseResult (*)(mlir::OpAsmParser&, mlir::OperationState&)>(void*, mlir::OpAsmParser&, mlir::OperationState&) llvm/include/llvm/ADT/FunctionExtras.h:220:12
    #14 0x55e29795bea3 in operator() llvm/include/llvm/ADT/FunctionExtras.h:384:12
    #15 0x55e29795bea3 in callback_fn<llvm::unique_function<llvm::ParseResult (mlir::OpAsmParser &, mlir::OperationState &)> > llvm/include/llvm/ADT/STLFunctionalExtras.h:45:12
    #16 0x55e29795bea3 in operator() llvm/include/llvm/ADT/STLFunctionalExtras.h:68:12
    #17 0x55e29795bea3 in parseOperation mlir/lib/AsmParser/Parser.cpp:1521:9
    #18 0x55e29795bea3 in parseCustomOperation mlir/lib/AsmParser/Parser.cpp:2017:19
    #19 0x55e29795bea3 in (anonymous namespace)::OperationParser::parseOperation() mlir/lib/AsmParser/Parser.cpp:1174:10
    #20 0x55e297971d20 in parseBlockBody mlir/lib/AsmParser/Parser.cpp:2296:9
    #21 0x55e297971d20 in (anonymous namespace)::OperationParser::parseBlock(mlir::Block*&) mlir/lib/AsmParser/Parser.cpp:2226:12
    #22 0x55e29796e4f5 in parseRegionBody mlir/lib/AsmParser/Parser.cpp:2184:7
    #23 0x55e29796e4f5 in (anonymous namespace)::OperationParser::parseRegion(mlir::Region&, llvm::ArrayRef<mlir::OpAsmParser::Argument>, bool) mlir/lib/AsmParser/Parser.cpp:2121:7
    #24 0x55e29796b25e in (anonymous namespace)::CustomOpAsmParser::parseRegion(mlir::Region&, llvm::ArrayRef<mlir::OpAsmParser::Argument>, bool) mlir/lib/AsmParser/Parser.cpp:1785:16
    #25 0x55e29796b2cf in (anonymous namespace)::CustomOpAsmParser::parseOptionalRegion(mlir::Region&, llvm::ArrayRef<mlir::OpAsmParser::Argument>, bool) mlir/lib/AsmParser/Parser.cpp:1796:12
    #26 0x55e2978d89ff in mlir::function_interface_impl::parseFunctionOp(mlir::OpAsmParser&, mlir::OperationState&, bool, mlir::StringAttr, llvm::function_ref<mlir::Type (mlir::Builder&, llvm::ArrayRef<mlir::Type>, llvm::ArrayRef<mlir::Type>, mlir::function_interface_impl::VariadicFlag, std::__u::basic_string<char, std::__u::char_traits<char>, std::__u::allocator<char>>&)>, mlir::StringAttr, mlir::StringAttr) mlir/lib/Interfaces/FunctionImplementation.cpp:232:14
    #27 0x55e2969ba41d in mlir::func::FuncOp::parse(mlir::OpAsmParser&, mlir::OperationState&) mlir/lib/Dialect/Func/IR/FuncOps.cpp:203:10
    #28 0x55e291322c18 in llvm::ParseResult llvm::detail::UniqueFunctionBase<llvm::ParseResult, mlir::OpAsmParser&, mlir::OperationState&>::CallImpl<llvm::ParseResult (*)(mlir::OpAsmParser&, mlir::OperationState&)>(void*, mlir::OpAsmParser&, mlir::OperationState&) llvm/include/llvm/ADT/FunctionExtras.h:220:12
    #29 0x55e29795bea3 in operator() llvm/include/llvm/ADT/FunctionExtras.h:384:12
    #30 0x55e29795bea3 in callback_fn<llvm::unique_function<llvm::ParseResult (mlir::OpAsmParser &, mlir::OperationState &)> > llvm/include/llvm/ADT/STLFunctionalExtras.h:45:12
    #31 0x55e29795bea3 in operator() llvm/include/llvm/ADT/STLFunctionalExtras.h:68:12
    #32 0x55e29795bea3 in parseOperation mlir/lib/AsmParser/Parser.cpp:1521:9
    #33 0x55e29795bea3 in parseCustomOperation mlir/lib/AsmParser/Parser.cpp:2017:19
    #34 0x55e29795bea3 in (anonymous namespace)::OperationParser::parseOperation() mlir/lib/AsmParser/Parser.cpp:1174:10
    #35 0x55e297959b78 in parse mlir/lib/AsmParser/Parser.cpp:2725:20
    #36 0x55e297959b78 in mlir::parseAsmSourceFile(llvm::SourceMgr const&, mlir::Block*, mlir::ParserConfig const&, mlir::AsmParserState*, mlir::AsmParserCodeCompleteContext*) mlir/lib/AsmParser/Parser.cpp:2785:41
    #37 0x55e29790d5c2 in mlir::parseSourceFile(std::__u::shared_ptr<llvm::SourceMgr> const&, mlir::Block*, mlir::ParserConfig const&, mlir::LocationAttr*) mlir/lib/Parser/Parser.cpp:46:10
    #38 0x55e291ebbfe2 in parseSourceFile<mlir::ModuleOp, const std::__u::shared_ptr<llvm::SourceMgr> &> mlir/include/mlir/Parser/Parser.h:159:14
    #39 0x55e291ebbfe2 in parseSourceFile<mlir::ModuleOp> mlir/include/mlir/Parser/Parser.h:189:10
    #40 0x55e291ebbfe2 in mlir::parseSourceFileForTool(std::__u::shared_ptr<llvm::SourceMgr> const&, mlir::ParserConfig const&, bool) mlir/include/mlir/Tools/ParseUtilities.h:31:12
    #41 0x55e291ebb263 in performActions(llvm::raw_ostream&, std::__u::shared_ptr<llvm::SourceMgr> const&, mlir::MLIRContext*, mlir::MlirOptMainConfig const&) mlir/lib/Tools/mlir-opt/MlirOptMain.cpp:383:33
    #42 0x55e291ebabd9 in processBuffer mlir/lib/Tools/mlir-opt/MlirOptMain.cpp:481:9
    #43 0x55e291ebabd9 in operator() mlir/lib/Tools/mlir-opt/MlirOptMain.cpp:548:12
    #44 0x55e291ebabd9 in llvm::LogicalResult llvm::function_ref<llvm::LogicalResult (std::__u::unique_ptr<llvm::MemoryBuffer, std::__u::default_delete<llvm::MemoryBuffer>>, llvm::raw_ostream&)>::callback_fn<mlir::MlirOptMain(llvm::raw_ostream&, std::__u::unique_ptr<llvm::MemoryBuffer, std::__u::default_delete<llvm::MemoryBuffer>>, mlir::DialectRegistry&, mlir::MlirOptMainConfig const&)::$_0>(long, std::__u::unique_ptr<llvm::MemoryBuffer, std::__u::default_delete<llvm::MemoryBuffer>>, llvm::raw_ostream&) llvm/include/llvm/ADT/STLFunctionalExtras.h:45:12
    #45 0x55e297b1cffe in operator() llvm/include/llvm/ADT/STLFunctionalExtras.h:68:12
    #46 0x55e297b1cffe in mlir::splitAndProcessBuffer(std::__u::unique_ptr<llvm::MemoryBuffer, std::__u::default_delete<llvm::MemoryBuffer>>, llvm::function_ref<llvm::LogicalResult (std::__u::unique_ptr<llvm::MemoryBuffer, std::__u::default_delete<llvm::MemoryBuffer>>, llvm::raw_ostream&)>, llvm::raw_ostream&, llvm::StringRef, llvm::StringRef)::$_0::operator()(llvm::StringRef) const mlir/lib/Support/ToolUtilities.cpp:86:16
    #47 0x55e297b1c9c5 in interleave<const llvm::StringRef *, (lambda at mlir/lib/Support/ToolUtilities.cpp:79:23), (lambda at llvm/include/llvm/ADT/STLExtras.h:2147:49), void> llvm/include/llvm/ADT/STLExtras.h:2125:3
    #48 0x55e297b1c9c5 in interleave<llvm::SmallVector<llvm::StringRef, 8U>, (lambda at mlir/lib/Support/ToolUtilities.cpp:79:23), llvm::raw_ostream, llvm::StringRef> llvm/include/llvm/ADT/STLExtras.h:2147:3
    #49 0x55e297b1c9c5 in mlir::splitAndProcessBuffer(std::__u::unique_ptr<llvm::MemoryBuffer, std::__u::default_delete<llvm::MemoryBuffer>>, llvm::function_ref<llvm::LogicalResult (std::__u::unique_ptr<llvm::MemoryBuffer, std::__u::default_delete<llvm::MemoryBuffer>>, llvm::raw_ostream&)>, llvm::raw_ostream&, llvm::StringRef, llvm::StringRef) mlir/lib/Support/ToolUtilities.cpp:89:3
    #50 0x55e291eb0cf0 in mlir::MlirOptMain(llvm::raw_ostream&, std::__u::unique_ptr<llvm::MemoryBuffer, std::__u::default_delete<llvm::MemoryBuffer>>, mlir::DialectRegistry&, mlir::MlirOptMainConfig const&) mlir/lib/Tools/mlir-opt/MlirOptMain.cpp:551:10
    #51 0x55e291eb115c in mlir::MlirOptMain(int, char**, llvm::StringRef, llvm::StringRef, mlir::DialectRegistry&) mlir/lib/Tools/mlir-opt/MlirOptMain.cpp:589:14
    #52 0x55e291eb15f8 in mlir::MlirOptMain(int, char**, llvm::StringRef, mlir::DialectRegistry&) mlir/lib/Tools/mlir-opt/MlirOptMain.cpp:605:10
    #53 0x55e29130d1be in main mlir/tools/mlir-opt/mlir-opt.cpp:311:33
    #54 0x7fbcf3fff3d3 in __libc_start_main (/usr/grte/v5/lib64/libc.so.6+0x613d3) (BuildId: 9a996398ce14a94560b0c642eb4f6e94)
    #55 0x55e2912365a9 in _start /usr/grte/v5/debug-src/src/csu/../sysdeps/x86_64/start.S:120

SUMMARY: AddressSanitizer: heap-use-after-free mlir/include/mlir/IR/IRMapping.h:40:11 in map<llvm::MutableArrayRef<mlir::BlockArgument> &, llvm::MutableArrayRef<mlir::BlockArgument>, nullptr>
Shadow bytes around the buggy address:
  0x502000006a00: fa fa 00 fa fa fa 00 00 fa fa 00 fa fa fa 00 fa
  0x502000006a80: fa fa 00 fa fa fa 00 00 fa fa 00 00 fa fa 00 00
  0x502000006b00: fa fa 00 00 fa fa 00 00 fa fa 00 fa fa fa 00 fa
  0x502000006b80: fa fa 00 fa fa fa 00 fa fa fa 00 00 fa fa 00 00
  0x502000006c00: fa fa 00 00 fa fa 00 00 fa fa 00 00 fa fa fd fa
=>0x502000006c80: fa fa fd fa fa fa fd fd fa fa fd[fd]fa fa fd fd
  0x502000006d00: fa fa 00 fa fa fa 00 fa fa fa 00 fa fa fa 00 fa
  0x502000006d80: fa fa 00 fa fa fa 00 fa fa fa 00 fa fa fa 00 fa
  0x502000006e00: fa fa 00 fa fa fa 00 fa fa fa 00 00 fa fa 00 fa
  0x502000006e80: fa fa 00 fa fa fa 00 00 fa fa 00 fa fa fa 00 fa
  0x502000006f00: fa fa 00 fa fa fa 00 fa fa fa 00 fa fa fa 00 fa
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07
  Heap left redzone:       fa
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb
==4320==ABORTING
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backend:aie Code that modifies AIE code llvm:core Modifies non-AIE specific code llvm:instcombine Code that modifies the combiner

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants