Skip to content

Conversation

@alexey-bataev
Copy link
Member

@alexey-bataev alexey-bataev commented May 16, 2025

Adds initial support for copyable elements. This patch only models adds
and model copyable elements as add , 0, i.e. uses identity
constants for missing lanes.
Only support for elements, which do not require scheduling, is added to
reduce size of the patch.

Created using spr 1.3.5
@llvmbot
Copy link
Member

llvmbot commented May 16, 2025

@llvm/pr-subscribers-llvm-transforms

@llvm/pr-subscribers-vectorizers

Author: Alexey Bataev (alexey-bataev)

Changes

Adds initial support for copyable elements. This patch only models adds
and model copyable elements as add <element>, 0.
Only support for elements, which do not require scheduling is added to
reduce size of the patch.


Patch is 37.79 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/140279.diff

6 Files Affected:

  • (modified) llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp (+351-44)
  • (modified) llvm/test/Transforms/SLPVectorizer/X86/buildvector-schedule-for-subvector.ll (+1-4)
  • (modified) llvm/test/Transforms/SLPVectorizer/X86/full-match-with-poison-scalar.ll (+4-10)
  • (modified) llvm/test/Transforms/SLPVectorizer/X86/non-schedulable-instructions-become-schedulable.ll (+7-9)
  • (modified) llvm/test/Transforms/SLPVectorizer/X86/pr47642.ll (+2-7)
  • (modified) llvm/test/Transforms/SLPVectorizer/alternate-non-profitable.ll (+3-3)
diff --git a/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp b/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
index c63f80675fef4..97d6068571918 100644
--- a/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
+++ b/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
@@ -206,6 +206,12 @@ static cl::opt<bool> VectorizeNonPowerOf2(
     "slp-vectorize-non-power-of-2", cl::init(false), cl::Hidden,
     cl::desc("Try to vectorize with non-power-of-2 number of elements."));
 
+/// Enables vectorization of copyable elements.
+static cl::opt<bool> VectorizeCopyableElements(
+    "slp-copyable-elements", cl::init(true), cl::Hidden,
+    cl::desc("Try to replace values with the idempotent instructions for "
+             "better vectorization."));
+
 // Limit the number of alias checks. The limit is chosen so that
 // it has no negative effect on the llvm benchmarks.
 static const unsigned AliasedCheckLimit = 10;
@@ -835,6 +841,13 @@ static std::optional<unsigned> getExtractIndex(const Instruction *E) {
   return *EI->idx_begin();
 }
 
+namespace llvm {
+/// Checks if the specified value does not require scheduling. It does not
+/// require scheduling if all operands and all users do not need to be scheduled
+/// in the current basic block.
+static bool doesNotNeedToBeScheduled(Value *V);
+} // namespace llvm
+
 namespace {
 /// \returns true if \p Opcode is allowed as part of the main/alternate
 /// instruction for SLP vectorization.
@@ -1170,9 +1183,11 @@ class InstructionsState {
     if (!I->isBinaryOp())
       return nullptr;
     BinOpSameOpcodeHelper Converter(MainOp);
-    if (Converter.add(I) && Converter.add(MainOp) && !Converter.hasAltOp())
-      return MainOp;
-    return AltOp;
+    if (!Converter.add(I) || !Converter.add(MainOp))
+      return nullptr;
+    if (Converter.hasAltOp() && !isAltShuffle())
+      return nullptr;
+    return Converter.hasAltOp() ? AltOp : MainOp;
   }
 
   /// Checks if main/alt instructions are shift operations.
@@ -1220,6 +1235,48 @@ class InstructionsState {
   InstructionsState(Instruction *MainOp, Instruction *AltOp)
       : MainOp(MainOp), AltOp(AltOp) {}
   static InstructionsState invalid() { return {nullptr, nullptr}; }
+
+  bool isCopyableElement(Value *V) const {
+    assert(valid() && "InstructionsState is invalid.");
+    if (isAltShuffle() || getOpcode() == Instruction::GetElementPtr)
+      return false;
+    auto *I = dyn_cast<Instruction>(V);
+    if (!I && isa<PoisonValue>(V))
+      return false;
+    // FIXME: remove doesNotNeedToBeScheduled() and isa<PHINode>() check once
+    // scheduling is supported.
+    return !I ||
+           (I->getParent() != MainOp->getParent() &&
+            (!isVectorLikeInstWithConstOps(I) ||
+             !isVectorLikeInstWithConstOps(MainOp))) ||
+           (I->getOpcode() != MainOp->getOpcode() &&
+            (isa<PHINode>(I) || doesNotNeedToBeScheduled(I)) &&
+            (!I->isBinaryOp() || getMatchingMainOpOrAltOp(I) != MainOp));
+  }
+
+  bool areInstructionsWithCopyableElements(ArrayRef<Value *> VL) const {
+    assert(valid() && "InstructionsState is invalid.");
+    bool HasAtLeastOneCopyableElement = false;
+    auto IsCopyableElement = [&](Value *V) {
+      bool IsCopyable = isCopyableElement(V);
+      HasAtLeastOneCopyableElement |= IsCopyable;
+      return IsCopyable;
+    };
+    return !isAltShuffle() && all_of(VL, [&](Value *V) {
+      if (V == MainOp || isa<PoisonValue>(V))
+        return true;
+      if (IsCopyableElement(V))
+        return true;
+      auto *I = dyn_cast<Instruction>(V);
+      if (getOpcode() == Instruction::GetElementPtr && !I)
+        return true;
+      return I->getType() == MainOp->getType() &&
+             (I->getParent() == MainOp->getParent() ||
+              (isVectorLikeInstWithConstOps(I) &&
+               isVectorLikeInstWithConstOps(MainOp))) &&
+             getMatchingMainOpOrAltOp(cast<Instruction>(V)) == MainOp;
+    }) && HasAtLeastOneCopyableElement;
+  }
 };
 
 std::pair<Instruction *, SmallVector<Value *>>
@@ -2878,9 +2935,6 @@ class BoUpSLP {
       for (OperandDataVec &Ops : OpsVec)
         Ops.resize(NumLanes);
       for (unsigned Lane : seq<unsigned>(NumLanes)) {
-        Value *V = VL[Lane];
-        assert((isa<Instruction>(V) || isa<PoisonValue>(V)) &&
-               "Expected instruction or poison value");
         // Our tree has just 3 nodes: the root and two operands.
         // It is therefore trivial to get the APO. We only need to check the
         // opcode of V and whether the operand at OpIdx is the LHS or RHS
@@ -2891,13 +2945,20 @@ class BoUpSLP {
         // Since operand reordering is performed on groups of commutative
         // operations or alternating sequences (e.g., +, -), we can safely tell
         // the inverse operations by checking commutativity.
-        if (isa<PoisonValue>(V)) {
+        auto *I = dyn_cast<Instruction>(VL[Lane]);
+        if (!I && isa<PoisonValue>(VL[Lane])) {
           for (unsigned OpIdx : seq<unsigned>(NumOperands))
             OpsVec[OpIdx][Lane] = {Operands[OpIdx][Lane], true, false};
           continue;
         }
-        auto [SelectedOp, Ops] = convertTo(cast<Instruction>(V), S);
-        bool IsInverseOperation = !isCommutative(SelectedOp);
+        bool IsInverseOperation = false;
+        if (S.isCopyableElement(VL[Lane])) {
+          // The value is a copyable element.
+          IsInverseOperation = !isCommutative(MainOp);
+        } else {
+          auto [SelectedOp, Ops] = convertTo(I, S);
+          IsInverseOperation = !isCommutative(SelectedOp);
+        }
         for (unsigned OpIdx : seq<unsigned>(ArgSize)) {
           bool APO = (OpIdx == 0) ? false : IsInverseOperation;
           OpsVec[OpIdx][Lane] = {Operands[OpIdx][Lane], APO, false};
@@ -3905,6 +3966,14 @@ class BoUpSLP {
 
     bool hasState() const { return S.valid(); }
 
+    /// Returns true if \p V is a copyable element.
+    bool isCopyableElement(Value *V) const { return S.isCopyableElement(V); }
+
+    /// Returns true if any scalar in the list is a copyable element.
+    bool hasCopyableElements() const {
+      return S.areInstructionsWithCopyableElements(Scalars);
+    }
+
     /// When ReuseReorderShuffleIndices is empty it just returns position of \p
     /// V within vector of Scalars. Otherwise, try to remap on its reuse index.
     int findLaneForValue(Value *V) const {
@@ -4153,7 +4222,7 @@ class BoUpSLP {
     } else if (!Last->isGather()) {
       SmallPtrSet<Value *, 4> Processed;
       for (Value *V : VL) {
-        if (isa<PoisonValue>(V))
+        if (isa<PoisonValue>(V) || S.isCopyableElement(V))
           continue;
         auto It = ScalarToTreeEntries.find(V);
         if (It == ScalarToTreeEntries.end()) {
@@ -4168,14 +4237,20 @@ class BoUpSLP {
       // Update the scheduler bundle to point to this TreeEntry.
       assert((!Bundle.getBundle().empty() || isa<PHINode>(S.getMainOp()) ||
               isVectorLikeInstWithConstOps(S.getMainOp()) ||
-              doesNotNeedToSchedule(VL)) &&
+              doesNotNeedToSchedule(VL) ||
+              all_of(VL,
+                     [&](Value *V) {
+                       return S.isCopyableElement(V) ||
+                              doesNotNeedToBeScheduled(V);
+                     })) &&
              "Bundle and VL out of sync");
       if (!Bundle.getBundle().empty()) {
 #if !defined(NDEBUG) || defined(EXPENSIVE_CHECKS)
         auto *BundleMember = Bundle.getBundle().begin();
         SmallPtrSet<Value *, 4> Processed;
         for (Value *V : VL) {
-          if (doesNotNeedToBeScheduled(V) || !Processed.insert(V).second)
+          if (doesNotNeedToBeScheduled(V) || S.isCopyableElement(V) ||
+              !Processed.insert(V).second)
             continue;
           ++BundleMember;
         }
@@ -4284,7 +4359,8 @@ class BoUpSLP {
   /// in general.
   ScalarsVectorizationLegality
   getScalarsVectorizationLegality(ArrayRef<Value *> VL, unsigned Depth,
-                                  const EdgeInfo &UserTreeIdx) const;
+                                  const EdgeInfo &UserTreeIdx,
+                                  bool TryCopyableElementsVectorization) const;
 
   /// Checks if the specified list of the instructions/values can be vectorized
   /// and fills required data before actual scheduling of the instructions.
@@ -4996,7 +5072,8 @@ class BoUpSLP {
 
     /// Build a bundle from the ScheduleData nodes corresponding to the
     /// scalar instruction for each lane.
-    ScheduleBundle &buildBundle(ArrayRef<Value *> VL);
+    ScheduleBundle &buildBundle(ArrayRef<Value *> VL,
+                                const InstructionsState &S);
 
     /// Checks if a bundle of instructions can be scheduled, i.e. has no
     /// cyclic dependencies. This is only a dry-run, no instructions are
@@ -7893,7 +7970,7 @@ void BoUpSLP::buildExternalUses(
     // For each lane:
     for (int Lane = 0, LE = Entry->Scalars.size(); Lane != LE; ++Lane) {
       Value *Scalar = Entry->Scalars[Lane];
-      if (!isa<Instruction>(Scalar))
+      if (!isa<Instruction>(Scalar) || Entry->isCopyableElement(Scalar))
         continue;
       // All uses must be replaced already? No need to do it again.
       auto It = ScalarToExtUses.find(Scalar);
@@ -9617,7 +9694,8 @@ static bool tryToFindDuplicates(SmallVectorImpl<Value *> &VL,
             PoisonValue::get(UniqueValues.front()->getType()));
         // Check that extended with poisons operations are still valid for
         // vectorization (div/rem are not allowed).
-        if (!getSameOpcode(PaddedUniqueValues, TLI).valid()) {
+        if (!S.areInstructionsWithCopyableElements(PaddedUniqueValues) &&
+            !getSameOpcode(PaddedUniqueValues, TLI).valid()) {
           LLVM_DEBUG(dbgs() << "SLP: Scalar used twice in bundle.\n");
           ReuseShuffleIndices.clear();
           return false;
@@ -9766,13 +9844,112 @@ bool BoUpSLP::canBuildSplitNode(ArrayRef<Value *> VL,
 }
 
 namespace {
-/// Class accepts incoming list of values and generates the list of values
-/// for scheduling and list of operands for the new nodes.
+/// Class accepts incoming list of values, checks if it is able to model
+/// "copyable" values as compatible operations, and generates the list of values
+/// for scheduling and list of operands doe the new nodes.
 class InstructionsCompatibilityAnalysis {
   DominatorTree &DT;
   const DataLayout &DL;
   const TargetTransformInfo &TTI;
   const TargetLibraryInfo &TLI;
+  unsigned MainOpcode = 0;
+  Instruction *MainOp = nullptr;
+
+  /// Identifies the best candidate value, which represents main opcode
+  /// operation.
+  /// Currently the best candidate is the Add instruction with the parent
+  /// block with the highest DFS incoming number (block, that dominates other).
+  void findAndSetMainInstruction(ArrayRef<Value *> VL) {
+    BasicBlock *Parent = nullptr;
+    // Checks if the instruction has supported opcode.
+    auto IsSupportedOpcode = [](Instruction *I) {
+      return I && I->getOpcode() == Instruction::Add;
+    };
+    for (Value *V : VL) {
+      auto *I = dyn_cast<Instruction>(V);
+      if (!I)
+        continue;
+      if (!DT.isReachableFromEntry(I->getParent()))
+        continue;
+      if (!MainOp) {
+        MainOp = I;
+        Parent = I->getParent();
+        continue;
+      }
+      if (Parent == I->getParent()) {
+        if (!IsSupportedOpcode(MainOp))
+          MainOp = I;
+        if (MainOp->getOpcode() == I->getOpcode() &&
+            doesNotNeedToBeScheduled(MainOp) && !doesNotNeedToBeScheduled(I))
+          MainOp = I;
+        continue;
+      }
+      auto *NodeA = DT.getNode(Parent);
+      auto *NodeB = DT.getNode(I->getParent());
+      assert(NodeA && "Should only process reachable instructions");
+      assert(NodeB && "Should only process reachable instructions");
+      assert((NodeA == NodeB) ==
+                 (NodeA->getDFSNumIn() == NodeB->getDFSNumIn()) &&
+             "Different nodes should have different DFS numbers");
+      if (NodeA->getDFSNumIn() < NodeB->getDFSNumIn()) {
+        MainOp = I;
+        Parent = I->getParent();
+      }
+    }
+    // FIXME: remove second part of the check, once the scheduling support
+    // for copyable instructions is landed.
+    if (!IsSupportedOpcode(MainOp) || any_of(VL, [&](Value *V) {
+          auto *I = dyn_cast<Instruction>(V);
+          return I && I->getOpcode() != MainOp->getOpcode() &&
+                 I->getParent() == MainOp->getParent() && !isa<PHINode>(I) &&
+                 !doesNotNeedToBeScheduled(I);
+        })) {
+      MainOp = nullptr;
+      return;
+    }
+    MainOpcode = MainOp->getOpcode();
+  }
+
+  /// Returns the idempotent value for the \p MainOp with the detected \p
+  /// MainOpcode. For Add, returns 0. For Or, it should choose between false and
+  /// the operand itself, since V or V == V.
+  Value *selectBestIdempotentValue() const {
+    switch (MainOpcode) {
+    case Instruction::Add:
+      return ConstantInt::getNullValue(MainOp->getType());
+    default:
+      break;
+    }
+    llvm_unreachable("Unsupported opcode");
+  }
+
+  unsigned getNumberOfOperands() const {
+    switch (MainOpcode) {
+    case Instruction::Add:
+      return 2;
+    default:
+      break;
+    }
+    llvm_unreachable("Unsupported opcode");
+  }
+
+  /// Returns the value and operands for the \p V, considering if it is original
+  /// instruction and its actual operands should be returned, or it is a
+  /// copyable element and its should be represented as idempotent instruction.
+  SmallVector<Value *> getOperands(const InstructionsState &S, Value *V) const {
+    bool MatchesMainOp = !S.isCopyableElement(V);
+    switch (MainOpcode) {
+    case Instruction::Add:
+      if (isa<PoisonValue>(V))
+        return {V, V};
+      if (MatchesMainOp)
+        return SmallVector<Value *>(cast<Instruction>(V)->operands());
+      return {V, selectBestIdempotentValue()};
+    default:
+      break;
+    }
+    llvm_unreachable("Unsupported opcode");
+  }
 
   /// Builds operands for the original instructions.
   void
@@ -9933,22 +10110,122 @@ class InstructionsCompatibilityAnalysis {
                                     const TargetLibraryInfo &TLI)
       : DT(DT), DL(DL), TTI(TTI), TLI(TLI) {}
 
+  InstructionsState
+  buildInstructionsState(ArrayRef<Value *> VL, const BoUpSLP &R,
+                         bool TryCopyableElementsVectorization,
+                         bool WithProfitabilityCheck = false) {
+    InstructionsState S = getSameOpcode(VL, TLI);
+    if (S)
+      return S;
+    if (!VectorizeCopyableElements || !TryCopyableElementsVectorization)
+      return S;
+    findAndSetMainInstruction(VL);
+    if (!MainOp)
+      return InstructionsState::invalid();
+    S = InstructionsState(MainOp, MainOp);
+    if (!WithProfitabilityCheck)
+      return S;
+    // Check if it is profitable to vectorize the instruction.
+    SmallVector<BoUpSLP::ValueList> Operands = buildOperands(S, VL);
+    if (VL.size() == 2) {
+      // Check if the operands allow better vectorization.
+      SmallVector<std::pair<Value *, Value *>, 4> Candidates;
+      Candidates.emplace_back(Operands[0][0], Operands[0][1]);
+      Candidates.emplace_back(Operands[1][0], Operands[1][1]);
+      if (isCommutative(MainOp)) {
+        Candidates.emplace_back(Operands[0][0], Operands[1][1]);
+        Candidates.emplace_back(Operands[1][0], Operands[0][1]);
+      }
+      // No good candidates - not profitable.
+      if (!R.findBestRootPair(Candidates,
+                              BoUpSLP::LookAheadHeuristics::ScoreSplat)) {
+        // Deeper analysis for 2 splats/constants.
+        SmallVector<std::pair<Value *, Value *>, 4> Candidates1, Candidates2;
+        Candidates1.emplace_back(Operands[0][0], Operands[0][1]);
+        Candidates2.emplace_back(Operands[1][0], Operands[1][1]);
+        bool Res = R.findBestRootPair(Candidates1) &&
+                   R.findBestRootPair(Candidates2);
+        if (!Res && isCommutative(MainOp)) {
+          Candidates1.clear();
+          Candidates2.clear();
+          Candidates1.emplace_back(Operands[0][0], Operands[1][1]);
+          Candidates2.emplace_back(Operands[1][0], Operands[0][1]);
+          Res = R.findBestRootPair(Candidates1) &&
+                R.findBestRootPair(Candidates2);
+        }
+        if (!Res)
+          return InstructionsState::invalid();
+      }
+    }
+    assert(Operands.size() == 2 && "Unexpected number of operands!");
+    unsigned CopyableNum =
+        count_if(VL, [&](Value *V) { return S.isCopyableElement(V); });
+    if (CopyableNum <= VL.size() / 2)
+      return S;
+    // Check profitability if number of copyables > VL.size() / 2.
+    // 1. Reorder operands for better matching.
+    if (isCommutative(MainOp)) {
+      for (auto &Ops : Operands) {
+        // Make instructions the first operands.
+        if (isa<Instruction>(Ops.back())) {
+          std::swap(Ops.front(), Ops.back());
+          continue;
+        }
+        // Make constants the second operands.
+        if (isa<Constant>(Ops.front())) {
+          std::swap(Ops.front(), Ops.back());
+          continue;
+        }
+      }
+    }
+    // 2. Check, if operands can be vectorized.
+    if (!allConstant(Operands.back()))
+      return InstructionsState::invalid();
+    bool Res = allConstant(Operands.front()) || isSplat(Operands.front());
+    if (!Res) {
+      // First operand not a constant or splat? Last attempt - check for
+      // potential vectorization.
+      InstructionsCompatibilityAnalysis Analysis(DT, DL, TTI, TLI);
+      if (!Analysis.buildInstructionsState(
+              Operands.front(), R,
+              /*TryCopyableElementsVectorization=*/true))
+        return InstructionsState::invalid();
+    }
+
+    return S;
+  }
+
   SmallVector<BoUpSLP::ValueList> buildOperands(const InstructionsState &S,
                                                 ArrayRef<Value *> VL) {
     assert(S && "Invalid state!");
     SmallVector<BoUpSLP::ValueList> Operands;
-    buildOriginalOperands(S, VL, Operands);
+    if (S.areInstructionsWithCopyableElements(VL)) {
+      MainOp = S.getMainOp();
+      MainOpcode = S.getOpcode();
+      Operands.assign(getNumberOfOperands(),
+                      BoUpSLP::ValueList(VL.size(), nullptr));
+      for (auto [Idx, V] : enumerate(VL)) {
+        SmallVector<Value *> OperandsForValue = getOperands(S, V);
+        for (auto [OperandIdx, Operand] : enumerate(OperandsForValue))
+          Operands[OperandIdx][Idx] = Operand;
+      }
+    } else {
+      buildOriginalOperands(S, VL, Operands);
+    }
     return Operands;
   }
 };
 } // namespace
 
-BoUpSLP::ScalarsVectorizationLegality
-BoUpSLP::getScalarsVectorizationLegality(ArrayRef<Value *> VL, unsigned Depth,
-                                         const EdgeInfo &UserTreeIdx) const {
+BoUpSLP::ScalarsVectorizationLegality BoUpSLP::getScalarsVectorizationLegality(
+    ArrayRef<Value *> VL, unsigned Depth, const EdgeInfo &UserTreeIdx,
+    bool TryCopyableElementsVectorization) const {
   assert((allConstant(VL) || allSameType(VL)) && "Invalid types!");
 
-  InstructionsState S = getSameOpcode(VL, *TLI);
+  InstructionsCompatibilityAnalysis Analysis(*DT, *DL, *TTI, *TLI);
+  InstructionsState S = Analysis.buildInstructionsState(
+      VL, *this, TryCopyableElementsVectorization,
+      /*WithProfitabilityCheck=*/true);
 
   // Don't go into catchswitch blocks, which can happen with PHIs.
   // Such blocks can only have PHIs and the catchswitch.  There is no
@@ -10247,9 +10524,9 @@ void BoUpSLP::buildTreeRec(ArrayRef<Value *> VLRef, unsigned Depth,
     return true;
   };
 
-  ScalarsVectorizationLegality Legality =
-      getScalarsVectorizationLegality(VL, Depth, UserTreeIdx);
-  const InstructionsState &S = Legality.getInstructionsState();
+  ScalarsVectorizationLegality Legality = getScalarsVectorizationLegality(
+      VL, Depth, UserTreeIdx, /*TryCopyableElementsVectorization=*/false);
+  InstructionsState S = Legality.getInstructionsState();
   if (!Legality.isLegal()) {
     if (Legality.trySplitVectorize()) {
       auto [MainOp, AltOp] = getMainAltOpsNoStateVL(VL);
@@ -10257,11 +10534,18 @@ void BoUpSLP::buildTreeRec(ArrayRef<Value *> VLRef, unsigned Depth,
       if (MainOp && AltOp && TrySplitNode(InstructionsState(MainOp, AltOp)))
         return;
     }
-    if (Legality.tryToFindD...
[truncated]

@alexey-bataev alexey-bataev requested a review from hiraditya May 16, 2025 16:39
@github-actions
Copy link

github-actions bot commented May 16, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

Created using spr 1.3.5
Created using spr 1.3.5
@hiraditya
Copy link
Collaborator

@preames @david-greene-cb

Copy link
Collaborator

@RKSimon RKSimon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

llvm\test\Transforms\SLPVectorizer\X86\vect_copyable_in_binops.ll has a number of add tests - how come none of them matched?

@alexey-bataev
Copy link
Member Author

llvm\test\Transforms\SLPVectorizer\X86\vect_copyable_in_binops.ll has a number of add tests - how come none of them matched?

These tests require support for schedulable instructions. This is planned for the next patch. Splitting the patches to limit the number of changes

Created using spr 1.3.5
@hassnaaHamdi
Copy link
Member

hassnaaHamdi commented Jun 10, 2025

Hi @alexey-bataev
I have looked at the patch, but to be honest it's not easy to fully understand it specially that I'm a beginner at the SLP.
Last month I was working on something similar but I didn't have the opportunity to publish it and then I saw your patch. Your patch seems much deeper :'D
My patch is handling all cases where an identity operation can be created for the isomorphic instructions so it works for add/lshr and similar instructions.
But I'm not pretty sure about the correctness of the idea.
I have created a draft patch here: #143583
If you have time, could you please give it a look ?

@alexey-bataev
Copy link
Member Author

Hi @alexey-bataev I have looked at the patch, but to be honest it's not easy to fully understand it specially that I'm a beginner at the SLP. Last month I was working on something similar but I didn't have the opportunity to publish it and then I saw your patch. Your patch seems much deeper :'D My patch is handling all cases where an identity operation can be created for the isomorphic instructions so it works for add/lshr and similar instructions. But I'm not pretty sure about the correctness of the idea. I have created a draft patch here: #143583 If you have time, could you please give it a look ?

Your patch solves the problem by inserting new instruction. It breaks tje design of the vectorizer - “model first, then modify”. No need to insert new instructions, it leads to issues with compile time, analysis state, etc. Instead, need to model such values as vslues, which actually represent “virtual” identity instructions without emitting such instructions.

@hassnaaHamdi
Copy link
Member

Hi @alexey-bataev I have looked at the patch, but to be honest it's not easy to fully understand it specially that I'm a beginner at the SLP. Last month I was working on something similar but I didn't have the opportunity to publish it and then I saw your patch. Your patch seems much deeper :'D My patch is handling all cases where an identity operation can be created for the isomorphic instructions so it works for add/lshr and similar instructions. But I'm not pretty sure about the correctness of the idea. I have created a draft patch here: #143583 If you have time, could you please give it a look ?

Your patch solves the problem by inserting new instruction. It breaks tje design of the vectorizer - “model first, then modify”. No need to insert new instructions, it leads to issues with compile time, analysis state, etc. Instead, need to model such values as vslues, which actually represent “virtual” identity instructions without emitting such instructions.

Thanks for the feedback.
About your patch, there will be further work to support other copyable elements like sub/shift, right?

@alexey-bataev
Copy link
Member Author

Yes, this is just the initial patch, the first in a serie

@preames
Copy link
Collaborator

preames commented Jun 12, 2025

Adds initial support for copyable elements. This patch only models adds
and model copyable elements as add , 0.

Terminology wise, this would be much clearer if you described this as using identity constants for missing lanes. The key bit in the above is the 0 is the identity value for add.

Created using spr 1.3.5
Created using spr 1.3.5
Created using spr 1.3.5
Created using spr 1.3.5
Created using spr 1.3.5
@github-actions
Copy link

⚠️ undef deprecator found issues in your code. ⚠️

You can test this locally with the following command:
git diff -U0 --pickaxe-regex -S '([^a-zA-Z0-9#_-]undef[^a-zA-Z0-9_-]|UndefValue::get)' 'HEAD~1' HEAD llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp llvm/test/Transforms/SLPVectorizer/X86/buildvector-schedule-for-subvector.ll llvm/test/Transforms/SLPVectorizer/X86/full-match-with-poison-scalar.ll llvm/test/Transforms/SLPVectorizer/X86/node-outside-used-only.ll llvm/test/Transforms/SLPVectorizer/X86/non-schedulable-instructions-become-schedulable.ll llvm/test/Transforms/SLPVectorizer/X86/pr47642.ll llvm/test/Transforms/SLPVectorizer/alternate-non-profitable.ll

The following files introduce new uses of undef:

  • llvm/test/Transforms/SLPVectorizer/X86/buildvector-schedule-for-subvector.ll

Undef is now deprecated and should only be used in the rare cases where no replacement is possible. For example, a load of uninitialized memory yields undef. You should use poison values for placeholders instead.

In tests, avoid using undef and having tests that trigger undefined behavior. If you need an operand with some unimportant value, you can add a new argument to the function and use that instead.

For example, this is considered a bad practice:

define void @fn() {
  ...
  br i1 undef, ...
}

Please use the following instead:

define void @fn(i1 %cond) {
  ...
  br i1 %cond, ...
}

Please refer to the Undefined Behavior Manual for more information.

Created using spr 1.3.5
Created using spr 1.3.5
@alexey-bataev
Copy link
Member Author

Ping!

Copy link
Collaborator

@RKSimon RKSimon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - cheers

Created using spr 1.3.5
@alexey-bataev alexey-bataev merged commit e202dba into main Jul 21, 2025
6 of 9 checks passed
@alexey-bataev alexey-bataev deleted the users/alexey-bataev/spr/slpinitial-support-for-copyable-elements-non-schedulable-only branch July 21, 2025 18:07
llvm-sync bot pushed a commit to arm/arm-toolchain that referenced this pull request Jul 21, 2025
…e only)

Adds initial support for copyable elements. This patch only models adds
and model copyable elements as add <element>, 0, i.e. uses identity
constants for missing lanes.
Only support for elements, which do not require scheduling, is added to
reduce size of the patch.

Reviewers: RKSimon, hiraditya

Reviewed By: RKSimon

Pull Request: llvm/llvm-project#140279
@nikic
Copy link
Contributor

nikic commented Jul 21, 2025

FYI this has some compile-time impact: https://llvm-compile-time-tracker.com/compare.php?from=36089e5d983fe9ae00f497c2d500f37227f82db1&to=e202dba288edd47f1b370cc43aa8cd36a924e7c1&stat=instructions:u Some files with large impact are libclamav_nsis_LZMADecode.c (+8%) and zlib_inflate.c (+4%).

@alexey-bataev
Copy link
Member Author

FYI this has some compile-time impact: https://llvm-compile-time-tracker.com/compare.php?from=36089e5d983fe9ae00f497c2d500f37227f82db1&to=e202dba288edd47f1b370cc43aa8cd36a924e7c1&stat=instructions:u Some files with large impact are libclamav_nsis_LZMADecode.c (+8%) and zlib_inflate.c (+4%).

Reverted for now in a415d68, will try to fix

alexey-bataev added a commit that referenced this pull request Jul 23, 2025
Adds initial support for copyable elements. This patch only models adds
and model copyable elements as add <element>, 0, i.e. uses identity
constants for missing lanes.
Only support for elements, which do not require scheduling, is added to
reduce size of the patch.

Fixed compile time regressions, updated release notes

Reviewers: RKSimon, hiraditya

Reviewed By: RKSimon

Pull Request: #140279
llvm-sync bot pushed a commit to arm/arm-toolchain that referenced this pull request Jul 23, 2025
…e only)

Adds initial support for copyable elements. This patch only models adds
and model copyable elements as add <element>, 0, i.e. uses identity
constants for missing lanes.
Only support for elements, which do not require scheduling, is added to
reduce size of the patch.

Fixed compile time regressions, updated release notes

Reviewers: RKSimon, hiraditya

Reviewed By: RKSimon

Pull Request: llvm/llvm-project#140279
MarkMurrayARM pushed a commit to MarkMurrayARM/arm-toolchain that referenced this pull request Jul 24, 2025
Adds initial support for copyable elements. This patch only models adds
and model copyable elements as add <element>, 0, i.e. uses identity
constants for missing lanes.
Only support for elements, which do not require scheduling, is added to
reduce size of the patch.

Fixed compile time regressions, updated release notes

Reviewers: RKSimon, hiraditya

Reviewed By: RKSimon

Pull Request: llvm/llvm-project#140279
@mstorsjo
Copy link
Member

This change makes building LLVM with frontend instrumentation to hang on compiling llvm/lib/Analysis/InstCount.cpp.

To reproduce this hang, download https://martin.st/temp/instcount-preproc.cpp and compile it with clang++ --target=aarch64-linux-gnu -fprofile-instr-generate="/tmp/llvm-profile/%4m.profraw" -O3 -std=c++17 -c instcount-preproc.cpp -w. Before this change, this compilation completes in ~26 seconds, after this change, this it hangs indefinitely.

@ZequanWu
Copy link
Contributor

This caused clang to crash on our Mac build:

clang++: /usr/local/google/home/zequanwu/work/llvm-project/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:1204: Instruction *(anonymous namespace)::InstructionsState::getMainOp() const: Assertion `valid() && "InstructionsState is invalid."' failed.
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace, preprocessed source, and associated run script.
Stack dump:
0.      Program arguments: clang++ -cc1 -triple arm64-apple-macosx12.0.0 -O2 -Wundef-prefix=TARGET_OS_ -Werror=undef-prefix -Wdeprecated-objc-isa-usage -Werror=deprecated-objc-isa-usage -emit-obj -disable-free -clear-ast-before-backend -main-file-name BC_Decoder.cpp -mrelocation-model pic -pic-level 2 -fmerge-all-constants -fno-delete-null-pointer-checks -mframe-pointer=non-leaf -relaxed-aliasing -ffp-contract=off -fno-rounding-math -target-sdk-version=15.5 -fcompatibility-qualified-id-block-type-checking -fvisibility-inlines-hidden-static-local-var -fdefine-target-os-macros -fno-modulemap-allow-subdirectory-search -enable-tlsdesc -target-cpu apple-m1 -target-feature +v8.4a -target-feature +aes -target-feature +altnzcv -target-feature +ccdp -target-feature +ccpp -target-feature +complxnum -target-feature +crc -target-feature +dotprod -target-feature +flagm -target-feature +fp-armv8 -target-feature +fp16fml -target-feature +fptoint -target-feature +fullfp16 -target-feature +jsconv -target-feature +lse -target-feature +neon -target-feature +pauth -target-feature +perfmon -target-feature +predres -target-feature +ras -target-feature +rcpc -target-feature +rdm -target-feature +sb -target-feature +sha2 -target-feature +sha3 -target-feature +specrestrict -target-feature +ssbs -target-abi darwinpcs -debug-info-kind=standalone -dwarf-version=4 -debugger-tuning=lldb -mllvm -generate-arange-section -fdebug-compilation-dir=. -target-linker-version 820.1 -mllvm -crash-diagnostics-dir=../../tools/clang/crashreports -fcoverage-compilation-dir=. -nostdinc++ -D SWIFTSHADER_ENABLE_ASTC -D SWIFTSHADER_LEGACY_PRECISION=true -D __ARM_NEON__=1 -D CR_XCODE_VERSION=1640 -D CR_CLANG_REVISION=\"llvmorg-22-init-1245-gd9952a7a-0\" -D _LIBCPP_HARDENING_MODE=_LIBCPP_HARDENING_MODE_EXTENSIVE -D COMPONENT_BUILD -D _LIBCPP_INSTRUMENTED_WITH_ASAN=0 -D CR_LIBCXX_REVISION=ed0f32ee7a8d9481bfd26cfa6f5940b9f296f371 -D DCHECK_ALWAYS_ON=1 -D _DEBUG -D DYNAMIC_ANNOTATIONS_ENABLED=1 -D MARL_USE_PTHREAD_THREAD_LOCAL=1 -D __DATE__= -D __TIME__= -D __TIMESTAMP__= -Wno-builtin-macro-redefined -Wno-uninitialized-const-pointer -Wheader-hygiene -Wstring-conversion -Wtautological-overlap-compare -Wall -Wno-unused-variable -Wno-c++11-narrowing -Wno-unused-but-set-variable -Wunguarded-availability -Wno-missing-field-initializers -Wno-unused-parameter -Wno-psabi -Wloop-analysis -Wno-unneeded-internal-declaration -Wno-cast-function-type -Wno-thread-safety-reference-return -Wno-nontrivial-memcall -Wno-exit-time-destructors -Wno-shadow -Wno-trigraphs -Wno-invalid-offsetof -Wenum-compare-conditional -Wno-nullability-completeness -Wunique-object-duplication -std=c++20 -fdeprecated-macro -ferror-limit 19 -fvisibility=hidden -fvisibility-inlines-hidden -fwrapv -fwrapv-pointer -stack-protector 2 -ftrivial-auto-var-init=pattern -fblocks -fencode-extended-block-signature -fno-rtti -fregister-global-dtors-with-atexit -fgnuc-version=4.2.1 -fno-implicit-modules -fskip-odr-check-in-gmf -fno-sized-deallocation -fmax-type-align=16 -Qn -fcolor-diagnostics -vectorize-loops -vectorize-slp -debug-info-kind=limited -D__GCC_HAVE_DWARF2_CFI_ASM=1 -x c++ BC_Decoder-7b03ca.cpp
1.      <eof> parser at end of file
2.      Optimizer
3.      Running pass "function<eager-inv>(float2int,lower-constant-intrinsics,loop(loop-rotate<header-duplication;no-prepare-for-lto>,loop-deletion),loop-distribute,inject-tli-mappings,loop-vectorize<no-interleave-forced-only;no-vectorize-forced-only;>,infer-alignment,loop-load-elim,instcombine<max-iterations=1;no-verify-fixpoint>,simplifycfg<bonus-inst-threshold=1;forward-switch-cond;switch-range-to-icmp;switch-to-lookup;no-keep-loops;hoist-common-insts;no-hoist-loads-stores-with-cond-faulting;sink-common-insts;speculate-blocks;simplify-cond-branch;no-speculate-unpredictables>,slp-vectorizer,vector-combine,instcombine<max-iterations=1;no-verify-fixpoint>,loop-unroll<O2>,transform-warning,sroa<preserve-cfg>,infer-alignment,instcombine<max-iterations=1;no-verify-fixpoint>,loop-mssa(licm<allowspeculation>),alignment-from-assumptions,loop-sink,instsimplify,div-rem-pairs,tailcallelim,simplifycfg<bonus-inst-threshold=1;no-forward-switch-cond;switch-range-to-icmp;no-switch-to-lookup;keep-loops;no-hoist-common-insts;hoist-loads-stores-with-cond-faulting;no-sink-common-insts;speculate-blocks;simplify-cond-branch;speculate-unpredictables>)" on module "BC_Decoder-7b03ca.cpp"
4.      Running pass "slp-vectorizer" on function "_ZN10BC_Decoder6DecodeEPKhPhiiiiib"
 #0 0x000055587d1faa38 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (/usr/local/google/home/zequanwu/work/llvm-project/out/cmake/bin/clang-21+0x8779a38)
 #1 0x000055587d1f8145 llvm::sys::RunSignalHandlers() (/usr/local/google/home/zequanwu/work/llvm-project/out/cmake/bin/clang-21+0x8777145)
 #2 0x000055587d1fb801 SignalHandler(int, siginfo_t*, void*) Signals.cpp:0:0
 #3 0x00007f3a9cc49df0 (/lib/x86_64-linux-gnu/libc.so.6+0x3fdf0)
 #4 0x00007f3a9cc9e95c __pthread_kill_implementation ./nptl/pthread_kill.c:44:76
 #5 0x00007f3a9cc49cc2 raise ./signal/../sysdeps/posix/raise.c:27:6
 #6 0x00007f3a9cc324ac abort ./stdlib/abort.c:81:3
 #7 0x00007f3a9cc32420 __assert_perror_fail ./assert/assert-perr.c:31:1
 #8 0x000055587e7d4067 llvm::slpvectorizer::BoUpSLP::getTreeCost(llvm::ArrayRef<llvm::Value*>, llvm::InstructionCost) (/usr/local/google/home/zequanwu/work/llvm-project/out/cmake/bin/clang-21+0x9d53067)
 #9 0x000055587e842b93 (anonymous namespace)::HorizontalReduction::tryToReduce(llvm::slpvectorizer::BoUpSLP&, llvm::DataLayout const&, llvm::TargetTransformInfo*, llvm::TargetLibraryInfo const&, llvm::AssumptionCache*) SLPVectorizer.cpp:0:0
#10 0x000055587e80a98b llvm::SLPVectorizerPass::vectorizeHorReduction(llvm::PHINode*, llvm::Instruction*, llvm::BasicBlock*, llvm::slpvectorizer::BoUpSLP&, llvm::SmallVectorImpl<llvm::WeakTrackingVH>&) (/usr/local/google/home/zequanwu/work/llvm-project/out/cmake/bin/clang-21+0x9d8998b)
#11 0x000055587e80be87 llvm::SLPVectorizerPass::vectorizeRootInstruction(llvm::PHINode*, llvm::Instruction*, llvm::BasicBlock*, llvm::slpvectorizer::BoUpSLP&) (/usr/local/google/home/zequanwu/work/llvm-project/out/cmake/bin/clang-21+0x9d8ae87)
#12 0x000055587e800881 llvm::SLPVectorizerPass::vectorizeChainsInBlock(llvm::BasicBlock*, llvm::slpvectorizer::BoUpSLP&) (/usr/local/google/home/zequanwu/work/llvm-project/out/cmake/bin/clang-21+0x9d7f881)
#13 0x000055587e7fd53e llvm::SLPVectorizerPass::runImpl(llvm::Function&, llvm::ScalarEvolution*, llvm::TargetTransformInfo*, llvm::TargetLibraryInfo*, llvm::AAResults*, llvm::LoopInfo*, llvm::DominatorTree*, llvm::AssumptionCache*, llvm::DemandedBits*, llvm::OptimizationRemarkEmitter*) (/usr/local/google/home/zequanwu/work/llvm-project/out/cmake/bin/clang-21+0x9d7c53e)
#14 0x000055587e7fc9c6 llvm::SLPVectorizerPass::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) (/usr/local/google/home/zequanwu/work/llvm-project/out/cmake/bin/clang-21+0x9d7b9c6)
#15 0x000055587e396d35 llvm::detail::PassModel<llvm::Function, llvm::SLPVectorizerPass, llvm::AnalysisManager<llvm::Function>>::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) PassBuilder.cpp:0:0
#16 0x000055587cbf5bd7 llvm::PassManager<llvm::Function, llvm::AnalysisManager<llvm::Function>>::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) (/usr/local/google/home/zequanwu/work/llvm-project/out/cmake/bin/clang-21+0x8174bd7)
#17 0x000055587aac0e79 llvm::detail::PassModel<llvm::Function, llvm::PassManager<llvm::Function, llvm::AnalysisManager<llvm::Function>>, llvm::AnalysisManager<llvm::Function>>::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) AMDGPUTargetMachine.cpp:0:0
#18 0x000055587cbf9c41 llvm::ModuleToFunctionPassAdaptor::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) (/usr/local/google/home/zequanwu/work/llvm-project/out/cmake/bin/clang-21+0x8178c41)
#19 0x000055587aac1539 llvm::detail::PassModel<llvm::Module, llvm::ModuleToFunctionPassAdaptor, llvm::AnalysisManager<llvm::Module>>::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) AMDGPUTargetMachine.cpp:0:0
#20 0x000055587cbf4c77 llvm::PassManager<llvm::Module, llvm::AnalysisManager<llvm::Module>>::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) (/usr/local/google/home/zequanwu/work/llvm-project/out/cmake/bin/clang-21+0x8173c77)
#21 0x000055587d9580df (anonymous namespace)::EmitAssemblyHelper::RunOptimizationPipeline(clang::BackendAction, std::unique_ptr<llvm::raw_pwrite_stream, std::default_delete<llvm::raw_pwrite_stream>>&, std::unique_ptr<llvm::ToolOutputFile, std::default_delete<llvm::ToolOutputFile>>&, clang::BackendConsumer*) BackendUtil.cpp:0:0
#22 0x000055587d94f5a6 clang::emitBackendOutput(clang::CompilerInstance&, clang::CodeGenOptions&, llvm::StringRef, llvm::Module*, clang::BackendAction, llvm::IntrusiveRefCntPtr<llvm::vfs::FileSystem>, std::unique_ptr<llvm::raw_pwrite_stream, std::default_delete<llvm::raw_pwrite_stream>>, clang::BackendConsumer*) (/usr/local/google/home/zequanwu/work/llvm-project/out/cmake/bin/clang-21+0x8ece5a6)
#23 0x000055587d961353 clang::BackendConsumer::HandleTranslationUnit(clang::ASTContext&) (/usr/local/google/home/zequanwu/work/llvm-project/out/cmake/bin/clang-21+0x8ee0353)
#24 0x000055587f38ac29 clang::ParseAST(clang::Sema&, bool, bool) (/usr/local/google/home/zequanwu/work/llvm-project/out/cmake/bin/clang-21+0xa909c29)
#25 0x000055587deee44a clang::FrontendAction::Execute() (/usr/local/google/home/zequanwu/work/llvm-project/out/cmake/bin/clang-21+0x946d44a)
#26 0x000055587de5ebcd clang::CompilerInstance::ExecuteAction(clang::FrontendAction&) (/usr/local/google/home/zequanwu/work/llvm-project/out/cmake/bin/clang-21+0x93ddbcd)
#27 0x000055587dfd7552 clang::ExecuteCompilerInvocation(clang::CompilerInstance*) (/usr/local/google/home/zequanwu/work/llvm-project/out/cmake/bin/clang-21+0x9556552)
#28 0x000055587a6de361 cc1_main(llvm::ArrayRef<char const*>, char const*, void*) (/usr/local/google/home/zequanwu/work/llvm-project/out/cmake/bin/clang-21+0x5c5d361)
#29 0x000055587a6da21f ExecuteCC1Tool(llvm::SmallVectorImpl<char const*>&, llvm::ToolContext const&) driver.cpp:0:0
#30 0x000055587a6d92b2 clang_main(int, char**, llvm::ToolContext const&) (/usr/local/google/home/zequanwu/work/llvm-project/out/cmake/bin/clang-21+0x5c582b2)
#31 0x000055587a6ea547 main (/usr/local/google/home/zequanwu/work/llvm-project/out/cmake/bin/clang-21+0x5c69547)
#32 0x00007f3a9cc33ca8 __libc_start_call_main ./csu/../sysdeps/nptl/libc_start_call_main.h:74:3
#33 0x00007f3a9cc33d65 call_init ./csu/../csu/libc-start.c:128:20
#34 0x00007f3a9cc33d65 __libc_start_main ./csu/../csu/libc-start.c:347:5
#35 0x000055587a6d7ca1 _start (/usr/local/google/home/zequanwu/work/llvm-project/out/cmake/bin/clang-21+0x5c56ca1)
./BC_Decoder-7b03ca.sh: line 1: 772522 Aborted                 (core dumped) "clang++" "-cc1" "-triple" "arm64-apple-macosx12.0.0" "-O2" "-Wundef-prefix=TARGET_OS_" "-Werror=undef-prefix" "-Wdeprecated-objc-isa-usage" "-Werror=deprecated-objc-isa-usage" "-emit-obj" "-disable-free" "-clear-ast-before-backend" "-main-file-name" "BC_Decoder.cpp" "-mrelocation-model" "pic" "-pic-level" "2" "-fmerge-all-constants" "-fno-delete-null-pointer-checks" "-mframe-pointer=non-leaf" "-relaxed-aliasing" "-ffp-contract=off" "-fno-rounding-math" "-target-sdk-version=15.5" "-fcompatibility-qualified-id-block-type-checking" "-fvisibility-inlines-hidden-static-local-var" "-fdefine-target-os-macros" "-fno-modulemap-allow-subdirectory-search" "-enable-tlsdesc" "-target-cpu" "apple-m1" "-target-feature" "+v8.4a" "-target-feature" "+aes" "-target-feature" "+altnzcv" "-target-feature" "+ccdp" "-target-feature" "+ccpp" "-target-feature" "+complxnum" "-target-feature" "+crc" "-target-feature" "+dotprod" "-target-feature" "+flagm" "-target-feature" "+fp-armv8" "-target-feature" "+fp16fml" "-target-feature" "+fptoint" "-target-feature" "+fullfp16" "-target-feature" "+jsconv" "-target-feature" "+lse" "-target-feature" "+neon" "-target-feature" "+pauth" "-target-feature" "+perfmon" "-target-feature" "+predres" "-target-feature" "+ras" "-target-feature" "+rcpc" "-target-feature" "+rdm" "-target-feature" "+sb" "-target-feature" "+sha2" "-target-feature" "+sha3" "-target-feature" "+specrestrict" "-target-feature" "+ssbs" "-target-abi" "darwinpcs" "-debug-info-kind=standalone" "-dwarf-version=4" "-debugger-tuning=lldb" "-mllvm" "-generate-arange-section" "-fdebug-compilation-dir=." "-target-linker-version" "820.1" "-mllvm" "-crash-diagnostics-dir=../../tools/clang/crashreports" "-fcoverage-compilation-dir=." "-nostdinc++" "-D" "SWIFTSHADER_ENABLE_ASTC" "-D" "SWIFTSHADER_LEGACY_PRECISION=true" "-D" "__ARM_NEON__=1" "-D" "CR_XCODE_VERSION=1640" "-D" "CR_CLANG_REVISION=\"llvmorg-22-init-1245-gd9952a7a-0\"" "-D" "_LIBCPP_HARDENING_MODE=_LIBCPP_HARDENING_MODE_EXTENSIVE" "-D" "COMPONENT_BUILD" "-D" "_LIBCPP_INSTRUMENTED_WITH_ASAN=0" "-D" "CR_LIBCXX_REVISION=ed0f32ee7a8d9481bfd26cfa6f5940b9f296f371" "-D" "DCHECK_ALWAYS_ON=1" "-D" "_DEBUG" "-D" "DYNAMIC_ANNOTATIONS_ENABLED=1" "-D" "MARL_USE_PTHREAD_THREAD_LOCAL=1" "-D" "__DATE__=" "-D" "__TIME__=" "-D" "__TIMESTAMP__=" "-Wno-builtin-macro-redefined" "-Wno-uninitialized-const-pointer" "-Wheader-hygiene" "-Wstring-conversion" "-Wtautological-overlap-compare" "-Wall" "-Wno-unused-variable" "-Wno-c++11-narrowing" "-Wno-unused-but-set-variable" "-Wunguarded-availability" "-Wno-missing-field-initializers" "-Wno-unused-parameter" "-Wno-psabi" "-Wloop-analysis" "-Wno-unneeded-internal-declaration" "-Wno-cast-function-type" "-Wno-thread-safety-reference-return" "-Wno-nontrivial-memcall" "-Wno-exit-time-destructors" "-Wno-shadow" "-Wno-trigraphs" "-Wno-invalid-offsetof" "-Wenum-compare-conditional" "-Wno-nullability-completeness" "-Wunique-object-duplication" "-std=c++20" "-fdeprecated-macro" "-ferror-limit" "19" "-fvisibility=hidden" "-fvisibility-inlines-hidden" "-fwrapv" "-fwrapv-pointer" "-stack-protector" "2" "-ftrivial-auto-var-init=pattern" "-fblocks" "-fencode-extended-block-signature" "-fno-rtti" "-fregister-global-dtors-with-atexit" "-fgnuc-version=4.2.1" "-fno-implicit-modules" "-fskip-odr-check-in-gmf" "-fno-sized-deallocation" "-fmax-type-align=16" "-Qn" "-fcolor-diagnostics" "-vectorize-loops" "-vectorize-slp" "-debug-info-kind=limited" "-D__GCC_HAVE_DWARF2_CFI_ASM=1" "-x" "c++" "BC_Decoder-7b03ca.cpp"

Attached repro. Please revert if it takes a while to fix it.
BC_Decoder-7b03ca.cpp.txt
BC_Decoder-7b03ca.sh.txt

mstorsjo added a commit that referenced this pull request Jul 24, 2025
…nly)"

This reverts commit 898bba3.

This change caused hangs and crashes, see
#140279 (comment).
@mstorsjo
Copy link
Member

I pushed a revert now. I had to revert c9cea24 as well as the problematic commit didn't revert cleanly initially.

llvm-sync bot pushed a commit to arm/arm-toolchain that referenced this pull request Jul 24, 2025
…hedulable only)"

This reverts commit 898bba3.

This change caused hangs and crashes, see
llvm/llvm-project#140279 (comment).
alexey-bataev added a commit that referenced this pull request Jul 25, 2025
Adds initial support for copyable elements. This patch only models adds
and model copyable elements as add <element>, 0, i.e. uses identity
constants for missing lanes.
Only support for elements, which do not require scheduling, is added to
reduce size of the patch.

Fixed compile time regressions, reported crashes, updated release notes

Reviewers: RKSimon, hiraditya

Reviewed By: RKSimon

Pull Request: #140279
llvm-sync bot pushed a commit to arm/arm-toolchain that referenced this pull request Jul 25, 2025
…e only)

Adds initial support for copyable elements. This patch only models adds
and model copyable elements as add <element>, 0, i.e. uses identity
constants for missing lanes.
Only support for elements, which do not require scheduling, is added to
reduce size of the patch.

Fixed compile time regressions, reported crashes, updated release notes

Reviewers: RKSimon, hiraditya

Reviewed By: RKSimon

Pull Request: llvm/llvm-project#140279
mahesh-attarde pushed a commit to mahesh-attarde/llvm-project that referenced this pull request Jul 28, 2025
Adds initial support for copyable elements. This patch only models adds
and model copyable elements as add <element>, 0, i.e. uses identity
constants for missing lanes.
Only support for elements, which do not require scheduling, is added to
reduce size of the patch.

Reviewers: RKSimon, hiraditya

Reviewed By: RKSimon

Pull Request: llvm#140279
mahesh-attarde pushed a commit to mahesh-attarde/llvm-project that referenced this pull request Jul 28, 2025
Adds initial support for copyable elements. This patch only models adds
and model copyable elements as add <element>, 0, i.e. uses identity
constants for missing lanes.
Only support for elements, which do not require scheduling, is added to
reduce size of the patch.

Fixed compile time regressions, updated release notes

Reviewers: RKSimon, hiraditya

Reviewed By: RKSimon

Pull Request: llvm#140279
mahesh-attarde pushed a commit to mahesh-attarde/llvm-project that referenced this pull request Jul 28, 2025
…nly)"

This reverts commit 898bba3.

This change caused hangs and crashes, see
llvm#140279 (comment).
mahesh-attarde pushed a commit to mahesh-attarde/llvm-project that referenced this pull request Jul 28, 2025
Adds initial support for copyable elements. This patch only models adds
and model copyable elements as add <element>, 0, i.e. uses identity
constants for missing lanes.
Only support for elements, which do not require scheduling, is added to
reduce size of the patch.

Fixed compile time regressions, reported crashes, updated release notes

Reviewers: RKSimon, hiraditya

Reviewed By: RKSimon

Pull Request: llvm#140279
@zmodem
Copy link
Collaborator

zmodem commented Sep 17, 2025

This change makes building LLVM with frontend instrumentation to hang on compiling llvm/lib/Analysis/InstCount.cpp

We're hitting a similar issue which bisects to the re-land (ef98e24). Compiling the attached source with frontend instrumentation hangs -- or hits an asserts when enabled:
version_edit.ii.gz

$ build/bin/clang.bad -cc1 -triple thumbv7-unknown-linux-android29 -Os -emit-obj -fprofile-instrument=llvm -std=c++20 -vectorize-slp /tmp/version_edit.ii -o /dev
/null -w
clang.bad: /work/llvm-project/llvm/lib/CodeGen/VirtRegMap.cpp:733: void (anonymous namespace)::VirtRegRewriter::rewrite(): Assertion `PhysReg.isValid() && "Invalid SubReg for physical register"' failed.
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace, preprocessed source, and associated run script.
Stack dump:
0.      Program arguments: build/bin/clang.bad -cc1 -triple thumbv7-unknown-linux-android29 -Os -emit-obj -fprofile-instrument=llvm -std=c++20 -vectorize-slp /tmp/version_edit.ii -o /dev/null -w
1.      <eof> parser at end of file
2.      Code generation
3.      Running pass 'Function Pass Manager' on module '../../third_party/leveldatabase/src/db/version_edit.cc'.
4.      Running pass 'Virtual Register Rewriter' on function '@_ZN7leveldb11VersionEdit10DecodeFromERKNS_5SliceE'
 #0 0x000055f15b2b1f38 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (build/bin/clang.bad+0x2560f38)
 #1 0x000055f15b2af665 llvm::sys::RunSignalHandlers() (build/bin/clang.bad+0x255e665)
 #2 0x000055f15b2b2d51 SignalHandler(int, siginfo_t*, void*) Signals.cpp:0:0
 #3 0x00007fd68d649df0 (/lib/x86_64-linux-gnu/libc.so.6+0x3fdf0)
 #4 0x00007fd68d69e95c __pthread_kill_implementation ./nptl/pthread_kill.c:44:76
 #5 0x00007fd68d649cc2 raise ./signal/../sysdeps/posix/raise.c:27:6
 #6 0x00007fd68d6324ac abort ./stdlib/abort.c:81:3
 #7 0x00007fd68d632420 __assert_perror_fail ./assert/assert-perr.c:31:1
 #8 0x000055f15aafede4 (anonymous namespace)::VirtRegRewriter::run(llvm::MachineFunction&) VirtRegMap.cpp:0:0
 #9 0x000055f15aaff8ae (anonymous namespace)::VirtRegRewriterLegacy::runOnMachineFunction(llvm::MachineFunction&) VirtRegMap.cpp:0:0
#10 0x000055f15a834903 llvm::MachineFunctionPass::runOnFunction(llvm::Function&) (build/bin/clang.bad+0x1ae3903)
#11 0x000055f15adc8835 llvm::FPPassManager::runOnFunction(llvm::Function&) (build/bin/clang.bad+0x2077835)
#12 0x000055f15add0b32 llvm::FPPassManager::runOnModule(llvm::Module&) (build/bin/clang.bad+0x207fb32)
#13 0x000055f15adc9286 llvm::legacy::PassManagerImpl::run(llvm::Module&) (build/bin/clang.bad+0x2078286)
#14 0x000055f15b4bf239 clang::emitBackendOutput(clang::CompilerInstance&, clang::CodeGenOptions&, llvm::StringRef, llvm::Module*, clang::BackendAction, llvm::IntrusiveRefCntPtr<llvm::vfs::FileSystem>, std::unique_ptr<llvm::raw_pwrite_stream, std::default_delete<llvm::raw_pwrite_stream>>, clang::BackendConsumer*) (build/bin/clang.bad+0x276e239)
#15 0x000055f15baa02e3 clang::BackendConsumer::HandleTranslationUnit(clang::ASTContext&) (build/bin/clang.bad+0x2d4f2e3)
#16 0x000055f15d1f0029 clang::ParseAST(clang::Sema&, bool, bool) (build/bin/clang.bad+0x449f029)
#17 0x000055f15bd2da16 clang::FrontendAction::Execute() (build/bin/clang.bad+0x2fdca16)
#18 0x000055f15bc9b36d clang::CompilerInstance::ExecuteAction(clang::FrontendAction&) (build/bin/clang.bad+0x2f4a36d)
#19 0x000055f15be0fab2 clang::ExecuteCompilerInvocation(clang::CompilerInstance*) (build/bin/clang.bad+0x30beab2)
#20 0x000055f159a094e6 cc1_main(llvm::ArrayRef<char const*>, char const*, void*) (build/bin/clang.bad+0xcb84e6)
#21 0x000055f159a054e9 ExecuteCC1Tool(llvm::SmallVectorImpl<char const*>&, llvm::ToolContext const&) driver.cpp:0:0
#22 0x000055f159a045ca clang_main(int, char**, llvm::ToolContext const&) (build/bin/clang.bad+0xcb35ca)
#23 0x000055f159a14c77 main (build/bin/clang.bad+0xcc3c77)
#24 0x00007fd68d633ca8 __libc_start_call_main ./csu/../sysdeps/nptl/libc_start_call_main.h:74:3
#25 0x00007fd68d633d65 call_init ./csu/../csu/libc-start.c:128:20
#26 0x00007fd68d633d65 __libc_start_main ./csu/../csu/libc-start.c:347:5
#27 0x000055f159a02f71 _start (build/bin/clang.bad+0xcb1f71)
Aborted (core dumped)

I'll work on getting a smaller reproducer as well.

@zmodem
Copy link
Collaborator

zmodem commented Sep 17, 2025

A reduced C++ repro: reduced.ii.gz

$ build/bin/clang -cc1 -triple thumbv7-unknown-linux-android29 -Os -emit-obj -fprofile-instrument=llvm -vectorize-slp /tmp/reduced.ii

An IR reproducer based on the above: reduced.ll.gz

$ build/bin/opt -passes='default<O3>,slp-vectorizer' /tmp/reduced.ll -o - | build/bin/llc -filetype=asm -o -

Could you take a look?

@alexey-bataev
Copy link
Member Author

A reduced C++ repro: reduced.ii.gz

$ build/bin/clang -cc1 -triple thumbv7-unknown-linux-android29 -Os -emit-obj -fprofile-instrument=llvm -vectorize-slp /tmp/reduced.ii

An IR reproducer based on the above: reduced.ll.gz

$ build/bin/opt -passes='default<O3>,slp-vectorizer' /tmp/reduced.ll -o - | build/bin/llc -filetype=asm -o -

Could you take a look?

I checked the original reproducer and the reduced one, used the debug version of the compiler. It does not look like SLP vectorizer is an issue here, even with the debug version of the compiler it passed fast. Looks like a backend issue, llc crashes after peephole-opt pass.

@zmodem
Copy link
Collaborator

zmodem commented Sep 17, 2025

even with the debug version of the compiler it passed fast.

But it asserts in the backend, right?

"Invalid SubReg for physical register" does sound like a backend problem. Could it be caused by the vectorizer somehow generating bad IR, or do you think it just uncovered a pre-existing backend bug?

@alexey-bataev
Copy link
Member Author

Yes, looks like you run into a new bug in the backend (codegen), revealed by the changes in SLP Vectorizer (middle-end optimization).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

10 participants