[VPlan] Add ExtractLane VPInst to extract across multiple parts. #148817

fhahn · 2025-07-15T09:53:32Z

This patch adds a new ExtractLane VPInstruction which extracts across multiple parts using a wide index, to be used in combination with FirstActiveLane.

The patch updates early-exit codegen to use it instead ExtractElement, which is only per-part. With this change, interleaving should work correctly with early-exit loops.

The patch removes the restrictions added in 6f43754 (#145877), but does not yet automatically select interleave counts > 1 for early-exit loops.

I'll share a patch as follow-up. The cost of extracting a lane adds non-trivial overhead in the exit block, so that should be considered when picking the interleave count.

llvmbot · 2025-07-15T09:54:05Z

@llvm/pr-subscribers-llvm-transforms

@llvm/pr-subscribers-vectorizers

Author: Florian Hahn (fhahn)

Changes

This patch adds a new ExtractLane VPInstruction which extracts across multiple parts using a wide index, to be used in combination with FirstActiveLane.

The patch updates early-exit codegen to use it instead ExtractElement, which is only per-part. With this change, interleaving should work correctly with early-exit loops.

The patch removes the restrictions added in 6f43754 (#145877), but does not yet automatically select interleave counts > 1 for early-exit loops.

I'll share a patch as follow-up. The cost of extracting a lane adds non-trivial overhead in the exit block, so that should be considered when picking the interleave count.

Patch is 66.75 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/148817.diff

10 Files Affected:

(modified) llvm/lib/Transforms/Vectorize/LoopVectorize.cpp (-6)
(modified) llvm/lib/Transforms/Vectorize/VPlan.h (+4)
(modified) llvm/lib/Transforms/Vectorize/VPlanAnalysis.cpp (+2)
(modified) llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp (+33-1)
(modified) llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp (+4-4)
(modified) llvm/lib/Transforms/Vectorize/VPlanUnroll.cpp (+7)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/single-early-exit-interleave.ll (+56-6)
(modified) llvm/test/Transforms/LoopVectorize/single-early-exit-interleave-hint.ll (+15-8)
(modified) llvm/test/Transforms/LoopVectorize/single-early-exit-interleave.ll (+334-32)
(modified) llvm/test/Transforms/LoopVectorize/vector-loop-backedge-elimination-early-exit.ll (+25-22)

diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
index 06db89a89bc38..f4e1ddf58a9b3 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
@@ -10069,12 +10069,6 @@ bool LoopVectorizePass::processLoop(Loop *L) {
   // Get user vectorization factor and interleave count.
   ElementCount UserVF = Hints.getWidth();
   unsigned UserIC = Hints.getInterleave();
-  if (LVL.hasUncountableEarlyExit() && UserIC != 1) {
-    UserIC = 1;
-    reportVectorizationInfo("Interleaving not supported for loops "
-                            "with uncountable early exits",
-                            "InterleaveEarlyExitDisabled", ORE, L);
-  }
 
   // Plan how to best vectorize.
   LVP.plan(UserVF, UserIC);
diff --git a/llvm/lib/Transforms/Vectorize/VPlan.h b/llvm/lib/Transforms/Vectorize/VPlan.h
index 703cfe969577d..a81dc0bb0bef6 100644
--- a/llvm/lib/Transforms/Vectorize/VPlan.h
+++ b/llvm/lib/Transforms/Vectorize/VPlan.h
@@ -1012,6 +1012,10 @@ class LLVM_ABI_FOR_TEST VPInstruction : public VPRecipeWithIRFlags,
     ReductionStartVector,
     // Creates a step vector starting from 0 to VF with a step of 1.
     StepVector,
+    /// Extracts a single lane (first operand) from a set of vector operands.
+    /// The lane specifies an index into a vector formed by combining all vector
+    /// operands (all operands after the first one).
+    ExtractLane,
 
   };
 
diff --git a/llvm/lib/Transforms/Vectorize/VPlanAnalysis.cpp b/llvm/lib/Transforms/Vectorize/VPlanAnalysis.cpp
index b27a7ffeed208..a0f5f10beb9fa 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanAnalysis.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanAnalysis.cpp
@@ -109,6 +109,8 @@ Type *VPTypeAnalysis::inferScalarTypeForRecipe(const VPInstruction *R) {
   case VPInstruction::BuildStructVector:
   case VPInstruction::BuildVector:
     return SetResultTyFromOp();
+  case VPInstruction::ExtractLane:
+    return inferScalarType(R->getOperand(1));
   case VPInstruction::FirstActiveLane:
     return Type::getIntNTy(Ctx, 64);
   case VPInstruction::ExtractLastElement:
diff --git a/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp b/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
index 7dd844aec1242..967adbac56c87 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
@@ -860,6 +860,31 @@ Value *VPInstruction::generate(VPTransformState &State) {
       Res = Builder.CreateOr(Res, State.get(Op));
     return Builder.CreateOrReduce(Res);
   }
+  case VPInstruction::ExtractLane: {
+    Value *LaneToExtract = State.get(getOperand(0), true);
+    Type *IdxTy = State.TypeAnalysis.inferScalarType(getOperand(0));
+    Value *Res = nullptr;
+    Value *RuntimeVF = getRuntimeVF(State.Builder, IdxTy, State.VF);
+
+    for (unsigned Idx = 1; Idx != getNumOperands(); ++Idx) {
+      Value *VectorStart =
+          Builder.CreateMul(RuntimeVF, ConstantInt::get(IdxTy, Idx - 1));
+      Value *VectorIdx = Idx == 1
+                             ? LaneToExtract
+                             : Builder.CreateSub(LaneToExtract, VectorStart);
+      Value *Ext = State.VF.isScalar()
+                       ? State.get(getOperand(Idx))
+                       : Builder.CreateExtractElement(
+                             State.get(getOperand(Idx)), VectorIdx);
+      if (Res) {
+        Value *Cmp = Builder.CreateICmpUGE(LaneToExtract, VectorStart);
+        Res = Builder.CreateSelect(Cmp, Ext, Res);
+      } else {
+        Res = Ext;
+      }
+    }
+    return Res;
+  }
   case VPInstruction::FirstActiveLane: {
     if (getNumOperands() == 1) {
       Value *Mask = State.get(getOperand(0));
@@ -918,7 +943,8 @@ InstructionCost VPInstruction::computeCost(ElementCount VF,
   }
 
   switch (getOpcode()) {
-  case Instruction::ExtractElement: {
+  case Instruction::ExtractElement:
+  case VPInstruction::ExtractLane: {
     // Add on the cost of extracting the element.
     auto *VecTy = toVectorTy(Ctx.Types.inferScalarType(getOperand(0)), VF);
     return Ctx.TTI.getVectorInstrCost(Instruction::ExtractElement, VecTy,
@@ -980,6 +1006,7 @@ bool VPInstruction::isVectorToScalar() const {
   return getOpcode() == VPInstruction::ExtractLastElement ||
          getOpcode() == VPInstruction::ExtractPenultimateElement ||
          getOpcode() == Instruction::ExtractElement ||
+         getOpcode() == VPInstruction::ExtractLane ||
          getOpcode() == VPInstruction::FirstActiveLane ||
          getOpcode() == VPInstruction::ComputeAnyOfResult ||
          getOpcode() == VPInstruction::ComputeFindIVResult ||
@@ -1038,6 +1065,7 @@ bool VPInstruction::opcodeMayReadOrWriteFromMemory() const {
   case VPInstruction::BuildVector:
   case VPInstruction::CalculateTripCountMinusVF:
   case VPInstruction::CanonicalIVIncrementForPart:
+  case VPInstruction::ExtractLane:
   case VPInstruction::ExtractLastElement:
   case VPInstruction::ExtractPenultimateElement:
   case VPInstruction::FirstActiveLane:
@@ -1063,6 +1091,7 @@ bool VPInstruction::onlyFirstLaneUsed(const VPValue *Op) const {
   default:
     return false;
   case Instruction::ExtractElement:
+  case VPInstruction::ExtractLane:
     return Op == getOperand(1);
   case Instruction::PHI:
     return true;
@@ -1164,6 +1193,9 @@ void VPInstruction::print(raw_ostream &O, const Twine &Indent,
   case VPInstruction::BuildVector:
     O << "buildvector";
     break;
+  case VPInstruction::ExtractLane:
+    O << "extract-lane";
+    break;
   case VPInstruction::ExtractLastElement:
     O << "extract-last-element";
     break;
diff --git a/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp b/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
index 6a3b3e6e41955..338001820d593 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
@@ -774,10 +774,10 @@ static VPValue *optimizeEarlyExitInductionUser(VPlan &Plan,
   using namespace VPlanPatternMatch;
 
   VPValue *Incoming, *Mask;
-  if (!match(Op, m_VPInstruction<Instruction::ExtractElement>(
-                     m_VPValue(Incoming),
+  if (!match(Op, m_VPInstruction<VPInstruction::ExtractLane>(
                      m_VPInstruction<VPInstruction::FirstActiveLane>(
-                         m_VPValue(Mask)))))
+                         m_VPValue(Mask)),
+                     m_VPValue(Incoming))))
     return nullptr;
 
   auto *WideIV = getOptimizableIVOf(Incoming);
@@ -2831,7 +2831,7 @@ void VPlanTransforms::handleUncountableEarlyExit(
           VPInstruction::FirstActiveLane, {CondToEarlyExit}, nullptr,
           "first.active.lane");
       IncomingFromEarlyExit = EarlyExitB.createNaryOp(
-          Instruction::ExtractElement, {IncomingFromEarlyExit, FirstActiveLane},
+          VPInstruction::ExtractLane, {FirstActiveLane, IncomingFromEarlyExit},
           nullptr, "early.exit.value");
       ExitIRI->setOperand(EarlyExitIdx, IncomingFromEarlyExit);
     }
diff --git a/llvm/lib/Transforms/Vectorize/VPlanUnroll.cpp b/llvm/lib/Transforms/Vectorize/VPlanUnroll.cpp
index b89cd21595efd..871e37ef3966a 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanUnroll.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanUnroll.cpp
@@ -363,6 +363,13 @@ void UnrollState::unrollBlock(VPBlockBase *VPB) {
       continue;
     }
     VPValue *Op0;
+    if (match(&R, m_VPInstruction<VPInstruction::ExtractLane>(
+                      m_VPValue(Op0), m_VPValue(Op1)))) {
+      addUniformForAllParts(cast<VPInstruction>(&R));
+      for (unsigned Part = 1; Part != UF; ++Part)
+        R.addOperand(getValueForPart(Op1, Part));
+      continue;
+    }
     if (match(&R, m_VPInstruction<VPInstruction::ExtractLastElement>(
                       m_VPValue(Op0))) ||
         match(&R, m_VPInstruction<VPInstruction::ExtractPenultimateElement>(
diff --git a/llvm/test/Transforms/LoopVectorize/AArch64/single-early-exit-interleave.ll b/llvm/test/Transforms/LoopVectorize/AArch64/single-early-exit-interleave.ll
index 61ef3cef603fa..c7be4593c6a9c 100644
--- a/llvm/test/Transforms/LoopVectorize/AArch64/single-early-exit-interleave.ll
+++ b/llvm/test/Transforms/LoopVectorize/AArch64/single-early-exit-interleave.ll
@@ -14,15 +14,16 @@ define i64 @same_exit_block_pre_inc_use1() #0 {
 ; CHECK-NEXT:    call void @init_mem(ptr [[P1]], i64 1024)
 ; CHECK-NEXT:    call void @init_mem(ptr [[P2]], i64 1024)
 ; CHECK-NEXT:    [[TMP0:%.*]] = call i64 @llvm.vscale.i64()
-; CHECK-NEXT:    [[TMP1:%.*]] = mul nuw i64 [[TMP0]], 16
-; CHECK-NEXT:    br i1 false, label [[SCALAR_PH:%.*]], label [[VECTOR_PH:%.*]]
+; CHECK-NEXT:    [[TMP1:%.*]] = mul nuw i64 [[TMP0]], 64
+; CHECK-NEXT:    [[MIN_ITERS_CHECK:%.*]] = icmp ult i64 510, [[TMP1]]
+; CHECK-NEXT:    br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.*]], label [[VECTOR_PH:%.*]]
 ; CHECK:       vector.ph:
 ; CHECK-NEXT:    [[TMP2:%.*]] = call i64 @llvm.vscale.i64()
-; CHECK-NEXT:    [[TMP3:%.*]] = mul nuw i64 [[TMP2]], 16
+; CHECK-NEXT:    [[TMP3:%.*]] = mul nuw i64 [[TMP2]], 64
 ; CHECK-NEXT:    [[N_MOD_VF:%.*]] = urem i64 510, [[TMP3]]
 ; CHECK-NEXT:    [[N_VEC:%.*]] = sub i64 510, [[N_MOD_VF]]
 ; CHECK-NEXT:    [[TMP4:%.*]] = call i64 @llvm.vscale.i64()
-; CHECK-NEXT:    [[TMP5:%.*]] = mul nuw i64 [[TMP4]], 16
+; CHECK-NEXT:    [[TMP5:%.*]] = mul nuw i64 [[TMP4]], 64
 ; CHECK-NEXT:    [[INDEX_NEXT:%.*]] = add i64 3, [[N_VEC]]
 ; CHECK-NEXT:    br label [[LOOP:%.*]]
 ; CHECK:       vector.body:
@@ -30,13 +31,43 @@ define i64 @same_exit_block_pre_inc_use1() #0 {
 ; CHECK-NEXT:    [[OFFSET_IDX:%.*]] = add i64 3, [[INDEX1]]
 ; CHECK-NEXT:    [[TMP7:%.*]] = getelementptr inbounds i8, ptr [[P1]], i64 [[OFFSET_IDX]]
 ; CHECK-NEXT:    [[TMP8:%.*]] = getelementptr inbounds i8, ptr [[TMP7]], i32 0
+; CHECK-NEXT:    [[TMP18:%.*]] = call i64 @llvm.vscale.i64()
+; CHECK-NEXT:    [[TMP19:%.*]] = mul nuw i64 [[TMP18]], 16
+; CHECK-NEXT:    [[TMP11:%.*]] = getelementptr inbounds i8, ptr [[TMP7]], i64 [[TMP19]]
+; CHECK-NEXT:    [[TMP29:%.*]] = call i64 @llvm.vscale.i64()
+; CHECK-NEXT:    [[TMP36:%.*]] = mul nuw i64 [[TMP29]], 32
+; CHECK-NEXT:    [[TMP37:%.*]] = getelementptr inbounds i8, ptr [[TMP7]], i64 [[TMP36]]
+; CHECK-NEXT:    [[TMP15:%.*]] = call i64 @llvm.vscale.i64()
+; CHECK-NEXT:    [[TMP38:%.*]] = mul nuw i64 [[TMP15]], 48
+; CHECK-NEXT:    [[TMP54:%.*]] = getelementptr inbounds i8, ptr [[TMP7]], i64 [[TMP38]]
 ; CHECK-NEXT:    [[WIDE_LOAD4:%.*]] = load <vscale x 16 x i8>, ptr [[TMP8]], align 1
+; CHECK-NEXT:    [[WIDE_LOAD2:%.*]] = load <vscale x 16 x i8>, ptr [[TMP11]], align 1
+; CHECK-NEXT:    [[WIDE_LOAD3:%.*]] = load <vscale x 16 x i8>, ptr [[TMP37]], align 1
+; CHECK-NEXT:    [[WIDE_LOAD5:%.*]] = load <vscale x 16 x i8>, ptr [[TMP54]], align 1
 ; CHECK-NEXT:    [[TMP9:%.*]] = getelementptr inbounds i8, ptr [[P2]], i64 [[OFFSET_IDX]]
 ; CHECK-NEXT:    [[TMP10:%.*]] = getelementptr inbounds i8, ptr [[TMP9]], i32 0
+; CHECK-NEXT:    [[TMP20:%.*]] = call i64 @llvm.vscale.i64()
+; CHECK-NEXT:    [[TMP21:%.*]] = mul nuw i64 [[TMP20]], 16
+; CHECK-NEXT:    [[TMP22:%.*]] = getelementptr inbounds i8, ptr [[TMP9]], i64 [[TMP21]]
+; CHECK-NEXT:    [[TMP23:%.*]] = call i64 @llvm.vscale.i64()
+; CHECK-NEXT:    [[TMP24:%.*]] = mul nuw i64 [[TMP23]], 32
+; CHECK-NEXT:    [[TMP25:%.*]] = getelementptr inbounds i8, ptr [[TMP9]], i64 [[TMP24]]
+; CHECK-NEXT:    [[TMP26:%.*]] = call i64 @llvm.vscale.i64()
+; CHECK-NEXT:    [[TMP27:%.*]] = mul nuw i64 [[TMP26]], 48
+; CHECK-NEXT:    [[TMP28:%.*]] = getelementptr inbounds i8, ptr [[TMP9]], i64 [[TMP27]]
 ; CHECK-NEXT:    [[WIDE_LOAD8:%.*]] = load <vscale x 16 x i8>, ptr [[TMP10]], align 1
+; CHECK-NEXT:    [[WIDE_LOAD6:%.*]] = load <vscale x 16 x i8>, ptr [[TMP22]], align 1
+; CHECK-NEXT:    [[WIDE_LOAD7:%.*]] = load <vscale x 16 x i8>, ptr [[TMP25]], align 1
+; CHECK-NEXT:    [[WIDE_LOAD9:%.*]] = load <vscale x 16 x i8>, ptr [[TMP28]], align 1
 ; CHECK-NEXT:    [[TMP32:%.*]] = icmp ne <vscale x 16 x i8> [[WIDE_LOAD4]], [[WIDE_LOAD8]]
+; CHECK-NEXT:    [[TMP30:%.*]] = icmp ne <vscale x 16 x i8> [[WIDE_LOAD2]], [[WIDE_LOAD6]]
+; CHECK-NEXT:    [[TMP31:%.*]] = icmp ne <vscale x 16 x i8> [[WIDE_LOAD3]], [[WIDE_LOAD7]]
+; CHECK-NEXT:    [[TMP59:%.*]] = icmp ne <vscale x 16 x i8> [[WIDE_LOAD5]], [[WIDE_LOAD9]]
 ; CHECK-NEXT:    [[INDEX_NEXT3]] = add nuw i64 [[INDEX1]], [[TMP5]]
-; CHECK-NEXT:    [[TMP12:%.*]] = call i1 @llvm.vector.reduce.or.nxv16i1(<vscale x 16 x i1> [[TMP32]])
+; CHECK-NEXT:    [[TMP33:%.*]] = or <vscale x 16 x i1> [[TMP32]], [[TMP30]]
+; CHECK-NEXT:    [[TMP34:%.*]] = or <vscale x 16 x i1> [[TMP33]], [[TMP31]]
+; CHECK-NEXT:    [[TMP35:%.*]] = or <vscale x 16 x i1> [[TMP34]], [[TMP59]]
+; CHECK-NEXT:    [[TMP12:%.*]] = call i1 @llvm.vector.reduce.or.nxv16i1(<vscale x 16 x i1> [[TMP35]])
 ; CHECK-NEXT:    [[TMP13:%.*]] = icmp eq i64 [[INDEX_NEXT3]], [[N_VEC]]
 ; CHECK-NEXT:    [[TMP14:%.*]] = or i1 [[TMP12]], [[TMP13]]
 ; CHECK-NEXT:    br i1 [[TMP14]], label [[MIDDLE_SPLIT:%.*]], label [[LOOP]], !llvm.loop [[LOOP0:![0-9]+]]
@@ -46,8 +77,27 @@ define i64 @same_exit_block_pre_inc_use1() #0 {
 ; CHECK-NEXT:    [[CMP_N:%.*]] = icmp eq i64 510, [[N_VEC]]
 ; CHECK-NEXT:    br i1 [[CMP_N]], label [[LOOP_END:%.*]], label [[SCALAR_PH]]
 ; CHECK:       vector.early.exit:
+; CHECK-NEXT:    [[TMP39:%.*]] = call i64 @llvm.vscale.i64()
+; CHECK-NEXT:    [[TMP40:%.*]] = mul nuw i64 [[TMP39]], 16
+; CHECK-NEXT:    [[TMP41:%.*]] = call i64 @llvm.experimental.cttz.elts.i64.nxv16i1(<vscale x 16 x i1> [[TMP59]], i1 true)
+; CHECK-NEXT:    [[TMP42:%.*]] = mul i64 [[TMP40]], 3
+; CHECK-NEXT:    [[TMP43:%.*]] = add i64 [[TMP42]], [[TMP41]]
+; CHECK-NEXT:    [[TMP44:%.*]] = call i64 @llvm.experimental.cttz.elts.i64.nxv16i1(<vscale x 16 x i1> [[TMP31]], i1 true)
+; CHECK-NEXT:    [[TMP45:%.*]] = mul i64 [[TMP40]], 2
+; CHECK-NEXT:    [[TMP46:%.*]] = add i64 [[TMP45]], [[TMP44]]
+; CHECK-NEXT:    [[TMP47:%.*]] = icmp ne i64 [[TMP44]], [[TMP40]]
+; CHECK-NEXT:    [[TMP48:%.*]] = select i1 [[TMP47]], i64 [[TMP46]], i64 [[TMP43]]
+; CHECK-NEXT:    [[TMP49:%.*]] = call i64 @llvm.experimental.cttz.elts.i64.nxv16i1(<vscale x 16 x i1> [[TMP30]], i1 true)
+; CHECK-NEXT:    [[TMP50:%.*]] = mul i64 [[TMP40]], 1
+; CHECK-NEXT:    [[TMP51:%.*]] = add i64 [[TMP50]], [[TMP49]]
+; CHECK-NEXT:    [[TMP52:%.*]] = icmp ne i64 [[TMP49]], [[TMP40]]
+; CHECK-NEXT:    [[TMP53:%.*]] = select i1 [[TMP52]], i64 [[TMP51]], i64 [[TMP48]]
 ; CHECK-NEXT:    [[TMP61:%.*]] = call i64 @llvm.experimental.cttz.elts.i64.nxv16i1(<vscale x 16 x i1> [[TMP32]], i1 true)
-; CHECK-NEXT:    [[TMP16:%.*]] = add i64 [[INDEX1]], [[TMP61]]
+; CHECK-NEXT:    [[TMP55:%.*]] = mul i64 [[TMP40]], 0
+; CHECK-NEXT:    [[TMP56:%.*]] = add i64 [[TMP55]], [[TMP61]]
+; CHECK-NEXT:    [[TMP57:%.*]] = icmp ne i64 [[TMP61]], [[TMP40]]
+; CHECK-NEXT:    [[TMP58:%.*]] = select i1 [[TMP57]], i64 [[TMP56]], i64 [[TMP53]]
+; CHECK-NEXT:    [[TMP16:%.*]] = add i64 [[INDEX1]], [[TMP58]]
 ; CHECK-NEXT:    [[TMP17:%.*]] = add i64 3, [[TMP16]]
 ; CHECK-NEXT:    br label [[LOOP_END]]
 ; CHECK:       scalar.ph:
diff --git a/llvm/test/Transforms/LoopVectorize/single-early-exit-interleave-hint.ll b/llvm/test/Transforms/LoopVectorize/single-early-exit-interleave-hint.ll
index de8a3c5a8eaf2..6c2ae2048cf7f 100644
--- a/llvm/test/Transforms/LoopVectorize/single-early-exit-interleave-hint.ll
+++ b/llvm/test/Transforms/LoopVectorize/single-early-exit-interleave-hint.ll
@@ -1,13 +1,8 @@
 ; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 5
-; REQUIRES: asserts
-; RUN: opt -p loop-vectorize -enable-early-exit-vectorization -force-vector-width=4 \
-; RUN:   -debug-only=loop-vectorize -S %s 2>%t | FileCheck --check-prefix=VF4IC4 %s
-; RUN: cat %t | FileCheck --check-prefix=DEBUG %s
+; RUN: opt -p loop-vectorize -enable-early-exit-vectorization -force-vector-width=4 -S %s | FileCheck --check-prefix=VF4IC4 %s
 
 declare void @init_mem(ptr, i64);
 
-; DEBUG: Interleaving not supported for loops with uncountable early exits
-
 define i64 @multi_exiting_to_different_exits_live_in_exit_values() {
 ; VF4IC4-LABEL: define i64 @multi_exiting_to_different_exits_live_in_exit_values() {
 ; VF4IC4-NEXT:  [[ENTRY:.*]]:
@@ -20,10 +15,22 @@ define i64 @multi_exiting_to_different_exits_live_in_exit_values() {
 ; VF4IC4-NEXT:    [[INDEX:%.*]] = phi i64 [ 0, %[[VECTOR_PH]] ], [ [[INDEX_NEXT:%.*]], %[[VECTOR_BODY]] ]
 ; VF4IC4-NEXT:    [[TMP0:%.*]] = getelementptr inbounds i32, ptr [[SRC]], i64 [[INDEX]]
 ; VF4IC4-NEXT:    [[TMP1:%.*]] = getelementptr inbounds i32, ptr [[TMP0]], i32 0
+; VF4IC4-NEXT:    [[TMP12:%.*]] = getelementptr inbounds i32, ptr [[TMP0]], i32 4
+; VF4IC4-NEXT:    [[TMP13:%.*]] = getelementptr inbounds i32, ptr [[TMP0]], i32 8
+; VF4IC4-NEXT:    [[TMP14:%.*]] = getelementptr inbounds i32, ptr [[TMP0]], i32 12
 ; VF4IC4-NEXT:    [[WIDE_LOAD:%.*]] = load <4 x i32>, ptr [[TMP1]], align 4
+; VF4IC4-NEXT:    [[WIDE_LOAD1:%.*]] = load <4 x i32>, ptr [[TMP12]], align 4
+; VF4IC4-NEXT:    [[WIDE_LOAD2:%.*]] = load <4 x i32>, ptr [[TMP13]], align 4
+; VF4IC4-NEXT:    [[WIDE_LOAD3:%.*]] = load <4 x i32>, ptr [[TMP14]], align 4
 ; VF4IC4-NEXT:    [[TMP2:%.*]] = icmp eq <4 x i32> [[WIDE_LOAD]], splat (i32 10)
-; VF4IC4-NEXT:    [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4
-; VF4IC4-NEXT:    [[TMP3:%.*]] = call i1 @llvm.vector.reduce.or.v4i1(<4 x i1> [[TMP2]])
+; VF4IC4-NEXT:    [[TMP6:%.*]] = icmp eq <4 x i32> [[WIDE_LOAD1]], splat (i32 10)
+; VF4IC4-NEXT:    [[TMP7:%.*]] = icmp eq <4 x i32> [[WIDE_LOAD2]], splat (i32 10)
+; VF4IC4-NEXT:    [[TMP8:%.*]] = icmp eq <4 x i32> [[WIDE_LOAD3]], splat (i32 10)
+; VF4IC4-NEXT:    [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 16
+; VF4IC4-NEXT:    [[TMP9:%.*]] = or <4 x i1> [[TMP2]], [[TMP6]]
+; VF4IC4-NEXT:    [[TMP10:%.*]] = or <4 x i1> [[TMP9]], [[TMP7]]
+; VF4IC4-NEXT:    [[TMP11:%.*]] = or <4 x i1> [[TMP10]], [[TMP8]]
+; VF4IC4-NEXT:    [[TMP3:%.*]] = call i1 @llvm.vector.reduce.or.v4i1(<4 x i1> [[TMP11]])
 ; VF4IC4-NEXT:    [[TMP4:%.*]] = icmp eq i64 [[INDEX_NEXT]], 128
 ; VF4IC4-NEXT:    [[TMP5:%.*]] = or i1 [[TMP3]], [[TMP4]]
 ; VF4IC4-NEXT:    br i1 [[TMP5]], label %[[MIDDLE_SPLIT:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]
diff --git a/llvm/test/Transforms/LoopVectorize/single-early-exit-interleave.ll b/llvm/test/Transforms/LoopVectorize/single-early-exit-interleave.ll
index 0f99ed576f1fe..6b758785f2512 100644
--- a/llvm/test/Transforms/LoopVectorize/single-early-exit-interleave.ll
+++ b/llvm/test/Transforms/LoopVectorize/single-early-exit-interleave.ll
@@ -15,10 +15,22 @@ define i64 @multi_exiting_to_different_exits_live_in_exit_values() {
 ; VF4IC4-NEXT:    [[INDEX:%.*]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.*]], [[VECTOR_BODY]] ]
 ; VF4IC4-NEXT:    [[TMP0:%.*]] = getelementptr inbounds i32, ptr [[SRC]], i64 [[INDEX]]
 ; VF4IC4-NEXT:    [[TMP1:%.*]] = getelementptr inbounds i32, ptr [[TMP0]], i32 0
+; VF4IC4-NEXT:    [[TMP2:%.*]] = getelementptr inbounds i32, ptr [[TMP0]], i32 4
+; VF4IC4-NEXT:    [[TMP12:%.*]] = getelementptr inbounds i32, ptr [[TMP0]], i32 8
+; VF4IC4-NEXT:    [[TMP13:%.*]] = getelementptr inbounds i32, ptr [[TMP0]], i32 12
 ; VF4IC4-NEXT:    [[WIDE_LOAD3:%.*]] = load <4 x i32>, ptr [[TMP1]], align 4
+; VF4IC4-NEXT:    [[WIDE_LOAD1:%.*]] = load <4 x i32>, ptr [[TMP2]], align 4
+; VF4IC4-NEXT:    [[WIDE_LOAD2:%.*]] = load <4 x i32>, ptr [[TMP12]], align 4
+; VF4IC4-NEXT:    [[WIDE_LOAD4:%.*]] = load <4 x i32>, ptr [[TMP13]], align 4
 ; VF4IC4-NEXT:    [[TMP8:%.*]] = icmp eq <4 x i32> [[WIDE_LOAD3]], splat (i32 10)
-; VF4IC4-NEXT:    [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4
-; VF4IC4-NEXT:    [[TMP3:%.*]] = call i1 @llvm.vector.reduce.or.v4i1(<4 x i1> [[TMP8]])
+; VF4IC4-NEXT:    [[TMP6:%.*]] = icmp eq <4 x i32> [[WIDE_LOAD1]], splat (i32 10)
+; VF4IC4-NEXT:    [[TMP7:%.*]] = icmp eq <4 x i32> [[WIDE_LOAD2]], splat (i32 10)
+; VF4IC4-NEXT:    [[TMP14:%.*]] = icmp eq <4 x i32> [[WIDE_LOAD4]], splat (i32 10)
+; VF4IC4-NEXT:    [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 16
+; VF4IC4-NEXT:    [[TMP9:%.*]] = or <4 x i1> [[TMP8]], [[TMP6]]
+; VF4IC4-NEXT:    [[TMP10:%.*]] = or <4 x i1> [[TMP9]], [[TMP7]]
+; VF4IC4-NEXT:    [[TMP11:%.*]] = or <4 x i1> [[TMP10]], [[TMP14]]
+; VF4IC4-NEXT:    [[TMP3:%.*]] = call i1 @llvm.vector.reduce.or.v4i1(<4 x i1> [[TMP11]])
 ; VF4IC4-NEXT:    [[TMP4:%.*]] = icmp eq i64 [[INDEX_NEXT]], 128
 ; VF4IC4-NEXT:    [[TMP5:%.*]] = or i1 [[TMP3]], [[TMP4]]
 ; VF4IC4-NEXT:    br i1 [[TMP5]], ...
[truncated]

llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp

david-arm

Hi @fhahn, sorry to do this but I think the lane calculation when interleaving is broken at the moment.

david-arm · 2025-07-23T12:44:48Z

llvm/test/Transforms/LoopVectorize/AArch64/single-early-exit-interleave.ll

I just realised this IR is broken I think. Since the cttz intrinsics have been split up into parts you can have situations like this:

[[TMP41:%.*]] = call i64 @llvm.experimental.cttz.elts.i64.nxv16i1(<vscale x 16 x i1> [[TMP59]], i1 true) -> 3
and
[[TMP44:%.*]] = call i64 @llvm.experimental.cttz.elts.i64.nxv16i1(<vscale x 16 x i1> [[TMP31]], i1 true) -> 7

which is because there can be multiple triggering exit conditions, but the first exit condition was in part 1. Therefore, it's not valid to use the condition [[TMP44]] != [[TMP40]] for selecting between [[TMP46]] and [[TMP43]]. I think really this should be

; CHECK-NEXT: [[TMP47:%.*]] = icmp eq i64 [[TMP41]], [[TMP40]] ; CHECK-NEXT: [[TMP48:%.*]] = select i1 [[TMP47]], i64 [[TMP46]], i64 [[TMP43]]

Or something like that!

Hmm I might be missing something, but I think we should check & select the parts in reverse order, so that the final select (TMP58), picks the offset based on part 0, if part 0 has an active lane, otherwise select offset for part 1 if it has an active lane and so on. Some of the check pattern variables don't have consecutive numbers, can be a bit confusing.

For the example, below is what I think should happen, but I am not sure if I missed anything.

[[TMP41] should handle the mask for part 3 ([[TMP59]] should be the compare for the last part)
[[TMP44]] should handle the mask for part 2 ([[TMP31]] should be the compare for the second-to-last part)

[[TMP47]] should be true if there is an active lane in part 2. (llvm.experimental.cttz.elts should return the number of vector elements if no lane is active)

[[TMP47:%.*]] = icmp ne i64 [[TMP44]], [[TMP40]] [[TMP48:%.*]] = select i1 [[TMP47]], i64 [[TMP46]], i64 [[TMP43]]

So [[TMP48]] should select [[TMP46], if there's an active lane for part 2, which is the count based on the active lanes in part 2. Only if there's no active lane in part 2 we should select the one based on part 3.

Ah sorry, my bad. I got confused because the IR for each part is being generated in reverse order. I'll remove the "request changes"!

llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp

lukel97

LGTM. With only one vector this behaves like a ExtractElement VPInstruction right? Is it possible to replace ExtractElement with this later?

This patch adds a new ExtractLane VPInstruction which extracts across multiple parts using a wide index, to be used in combination with FirstActiveLane. The patch updates early-exit codegen to use it instead ExtractElement, which is only per-part. With this change, interleaving should work correctly with early-exit loops. The patch removes the restrictions added in 6f43754 (llvm#145877), but does not yet automatically select interleave counts > 1 for early-exit loops. I'll share a patch as follow-up. The cost of extracting a lane adds non-trivial overhead in the exit block, so that should be considered when picking the interleave count.

fhahn · 2025-07-27T07:08:01Z

LGTM. With only one vector this behaves like a ExtractElement VPInstruction right? Is it possible to replace ExtractElement with this later?

Yep that should be possible, will look into this a potential follow-up, thanks!

llvm-ci · 2025-07-27T07:22:54Z

LLVM Buildbot has detected a new failure on builder fuchsia-x86_64-linux running on fuchsia-debian-64-us-central1-b-1 while building llvm at step 4 "annotate".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/11/builds/20470

Here is the relevant piece of the build log for the reference

Step 4 (annotate) failure: 'python ../llvm-zorg/zorg/buildbot/builders/annotated/fuchsia-linux.py ...' (failure)
...
[473/2522] Building CXX object libc/src/math/generic/CMakeFiles/libc.src.math.generic.issignaling.dir/issignaling.cpp.obj
[474/2522] Building CXX object libc/src/stdbit/CMakeFiles/libc.src.stdbit.stdc_bit_width_uc.dir/stdc_bit_width_uc.cpp.obj
[475/2522] Building CXX object libc/src/stdbit/CMakeFiles/libc.src.stdbit.stdc_bit_width_ui.dir/stdc_bit_width_ui.cpp.obj
[476/2522] Building CXX object libc/src/inttypes/CMakeFiles/libc.src.inttypes.imaxdiv.dir/imaxdiv.cpp.obj
[477/2522] Building CXX object libc/src/math/generic/CMakeFiles/libc.src.math.generic.issignalingf.dir/issignalingf.cpp.obj
[478/2522] Building CXX object libc/src/stdbit/CMakeFiles/libc.src.stdbit.stdc_bit_floor_uc.dir/stdc_bit_floor_uc.cpp.obj
[479/2522] Building CXX object libc/src/stdbit/CMakeFiles/libc.src.stdbit.stdc_bit_floor_us.dir/stdc_bit_floor_us.cpp.obj
[480/2522] Building CXX object libc/src/__support/CMakeFiles/libc.src.__support.freelist.dir/freelist.cpp.obj
[481/2522] Building CXX object libc/src/stdbit/CMakeFiles/libc.src.stdbit.stdc_bit_floor_ul.dir/stdc_bit_floor_ul.cpp.obj
[482/2522] Building CXX object libc/src/stdio/CMakeFiles/libc.src.stdio.sscanf.dir/sscanf.cpp.obj
FAILED: libc/src/stdio/CMakeFiles/libc.src.stdio.sscanf.dir/sscanf.cpp.obj 
/var/lib/buildbot/fuchsia-x86_64-linux/build/llvm-build-hblaw1l3/./bin/clang++ --target=armv8m.main-none-eabi -DLIBC_NAMESPACE=__llvm_libc_22_0_0_git -I/var/lib/buildbot/fuchsia-x86_64-linux/llvm-project/libc -isystem /var/lib/buildbot/fuchsia-x86_64-linux/build/llvm-build-hblaw1l3/include/armv8m.main-unknown-none-eabi --target=armv8m.main-none-eabi -Wno-atomic-alignment "-Dvfprintf(stream, format, vlist)=vprintf(format, vlist)" "-Dfprintf(stream, format, ...)=printf(format)" "-Dfputs(string, stream)=puts(string)" -D_LIBCPP_PRINT=1 -mthumb -mfloat-abi=softfp -march=armv8m.main+fp+dsp -mcpu=cortex-m33 -fPIC -fno-semantic-interposition -fvisibility-inlines-hidden -Werror=date-time -Werror=unguarded-availability-new -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -Wimplicit-fallthrough -Wcovered-switch-default -Wno-noexcept-type -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -Wsuggest-override -Wstring-conversion -Wmisleading-indentation -Wctad-maybe-unsupported -ffunction-sections -fdata-sections -ffile-prefix-map=/var/lib/buildbot/fuchsia-x86_64-linux/build/llvm-build-hblaw1l3/runtimes/runtimes-armv8m.main-none-eabi-bins=../../../../llvm-project -ffile-prefix-map=/var/lib/buildbot/fuchsia-x86_64-linux/llvm-project/= -no-canonical-prefixes -Os -DNDEBUG --target=armv8m.main-none-eabi -DLIBC_QSORT_IMPL=LIBC_QSORT_HEAP_SORT -DLIBC_TYPES_TIME_T_IS_32_BIT -DLIBC_ADD_NULL_CHECKS "-DLIBC_MATH=(LIBC_MATH_SKIP_ACCURATE_PASS | LIBC_MATH_SMALL_TABLES)" -DLIBC_ERRNO_MODE=LIBC_ERRNO_MODE_EXTERNAL -fpie -ffreestanding -DLIBC_FULL_BUILD -nostdlibinc -ffixed-point -fno-builtin -fno-exceptions -fno-lax-vector-conversions -fno-unwind-tables -fno-asynchronous-unwind-tables -fno-rtti -ftrivial-auto-var-init=pattern -fno-omit-frame-pointer -Wall -Wextra -Werror -Wconversion -Wno-sign-conversion -Wdeprecated -Wno-c99-extensions -Wno-gnu-imaginary-constant -Wno-pedantic -Wimplicit-fallthrough -Wwrite-strings -Wextra-semi -Wnewline-eof -Wnonportable-system-include-path -Wstrict-prototypes -Wthread-safety -Wglobal-constructors -DLIBC_COPT_PUBLIC_PACKAGING -MD -MT libc/src/stdio/CMakeFiles/libc.src.stdio.sscanf.dir/sscanf.cpp.obj -MF libc/src/stdio/CMakeFiles/libc.src.stdio.sscanf.dir/sscanf.cpp.obj.d -o libc/src/stdio/CMakeFiles/libc.src.stdio.sscanf.dir/sscanf.cpp.obj -c /var/lib/buildbot/fuchsia-x86_64-linux/llvm-project/libc/src/stdio/sscanf.cpp
In file included from /var/lib/buildbot/fuchsia-x86_64-linux/llvm-project/libc/src/stdio/sscanf.cpp:14:
In file included from /var/lib/buildbot/fuchsia-x86_64-linux/llvm-project/libc/src/stdio/scanf_core/scanf_main.h:14:
In file included from /var/lib/buildbot/fuchsia-x86_64-linux/llvm-project/libc/src/stdio/scanf_core/converter.h:15:
In file included from /var/lib/buildbot/fuchsia-x86_64-linux/llvm-project/libc/src/stdio/scanf_core/core_structs.h:16:
/var/lib/buildbot/fuchsia-x86_64-linux/build/llvm-build-hblaw1l3/./lib/clang/22/include/inttypes.h:24:15: fatal error: 'inttypes.h' file not found
   24 | #include_next <inttypes.h>
      |               ^~~~~~~~~~~~
1 error generated.
[483/2522] Building CXX object libc/src/inttypes/CMakeFiles/libc.src.inttypes.imaxabs.dir/imaxabs.cpp.obj
[484/2522] Building CXX object libc/src/math/generic/CMakeFiles/libc.src.math.generic.issignalingl.dir/issignalingl.cpp.obj
[485/2522] Building CXX object libc/src/stdbit/CMakeFiles/libc.src.stdbit.stdc_bit_width_ull.dir/stdc_bit_width_ull.cpp.obj
[486/2522] Building CXX object libc/src/stdbit/CMakeFiles/libc.src.stdbit.stdc_bit_floor_ui.dir/stdc_bit_floor_ui.cpp.obj
[487/2522] Building CXX object libc/src/stdbit/CMakeFiles/libc.src.stdbit.stdc_bit_floor_ull.dir/stdc_bit_floor_ull.cpp.obj
[488/2522] Building CXX object libc/src/__support/CMakeFiles/libc.src.__support.freetrie.dir/freetrie.cpp.obj
[489/2522] Building CXX object libc/src/stdbit/CMakeFiles/libc.src.stdbit.stdc_bit_ceil_uc.dir/stdc_bit_ceil_uc.cpp.obj
[490/2522] Building CXX object libc/src/stdbit/CMakeFiles/libc.src.stdbit.stdc_bit_ceil_ull.dir/stdc_bit_ceil_ull.cpp.obj
[491/2522] Building CXX object libc/src/stdbit/CMakeFiles/libc.src.stdbit.stdc_bit_ceil_ui.dir/stdc_bit_ceil_ui.cpp.obj
[492/2522] Building CXX object libc/src/math/generic/CMakeFiles/libc.src.math.generic.iscanonicalf.dir/iscanonicalf.cpp.obj
[493/2522] Building CXX object libc/src/stdbit/CMakeFiles/libc.src.stdbit.stdc_first_leading_zero_uc.dir/stdc_first_leading_zero_uc.cpp.obj
[494/2522] Building CXX object libc/src/stdbit/CMakeFiles/libc.src.stdbit.stdc_first_leading_zero_us.dir/stdc_first_leading_zero_us.cpp.obj
[495/2522] Building CXX object libc/src/stdbit/CMakeFiles/libc.src.stdbit.stdc_bit_ceil_ul.dir/stdc_bit_ceil_ul.cpp.obj
[496/2522] Building CXX object libc/src/stdbit/CMakeFiles/libc.src.stdbit.stdc_first_leading_zero_ul.dir/stdc_first_leading_zero_ul.cpp.obj
[497/2522] Building CXX object libc/src/math/generic/CMakeFiles/libc.src.math.generic.iscanonical.dir/iscanonical.cpp.obj
[498/2522] Building CXX object libc/src/stdbit/CMakeFiles/libc.src.stdbit.stdc_bit_ceil_us.dir/stdc_bit_ceil_us.cpp.obj
[499/2522] Building CXX object libc/src/stdbit/CMakeFiles/libc.src.stdbit.stdc_first_leading_zero_ui.dir/stdc_first_leading_zero_ui.cpp.obj
[500/2522] Building CXX object libc/src/stdbit/CMakeFiles/libc.src.stdbit.stdc_first_leading_one_us.dir/stdc_first_leading_one_us.cpp.obj
[501/2522] Building CXX object libc/src/stdbit/CMakeFiles/libc.src.stdbit.stdc_first_leading_one_ui.dir/stdc_first_leading_one_ui.cpp.obj
[502/2522] Building CXX object libc/src/math/generic/CMakeFiles/libc.src.math.generic.iscanonicall.dir/iscanonicall.cpp.obj
[503/2522] Building CXX object libc/src/stdbit/CMakeFiles/libc.src.stdbit.stdc_first_trailing_zero_ui.dir/stdc_first_trailing_zero_ui.cpp.obj
[504/2522] Building CXX object libc/src/stdbit/CMakeFiles/libc.src.stdbit.stdc_first_leading_one_ull.dir/stdc_first_leading_one_ull.cpp.obj
[505/2522] Building CXX object libc/src/stdbit/CMakeFiles/libc.src.stdbit.stdc_first_leading_one_uc.dir/stdc_first_leading_one_uc.cpp.obj
[506/2522] Building CXX object libc/src/stdbit/CMakeFiles/libc.src.stdbit.stdc_first_leading_zero_ull.dir/stdc_first_leading_zero_ull.cpp.obj
[507/2522] Building CXX object libc/src/stdbit/CMakeFiles/libc.src.stdbit.stdc_first_trailing_zero_us.dir/stdc_first_trailing_zero_us.cpp.obj
[508/2522] Building CXX object libc/src/stdbit/CMakeFiles/libc.src.stdbit.stdc_first_trailing_zero_uc.dir/stdc_first_trailing_zero_uc.cpp.obj
[509/2522] Building CXX object libc/src/stdbit/CMakeFiles/libc.src.stdbit.stdc_first_leading_one_ul.dir/stdc_first_leading_one_ul.cpp.obj
[510/2522] Building CXX object libc/src/__support/CMakeFiles/libc.src.__support.freelist_heap.dir/freelist_heap.cpp.obj
[511/2522] Building CXX object libc/src/stdbit/CMakeFiles/libc.src.stdbit.stdc_first_trailing_one_us.dir/stdc_first_trailing_one_us.cpp.obj
Step 6 (build) failure: build (failure)
...
[473/2522] Building CXX object libc/src/math/generic/CMakeFiles/libc.src.math.generic.issignaling.dir/issignaling.cpp.obj
[474/2522] Building CXX object libc/src/stdbit/CMakeFiles/libc.src.stdbit.stdc_bit_width_uc.dir/stdc_bit_width_uc.cpp.obj
[475/2522] Building CXX object libc/src/stdbit/CMakeFiles/libc.src.stdbit.stdc_bit_width_ui.dir/stdc_bit_width_ui.cpp.obj
[476/2522] Building CXX object libc/src/inttypes/CMakeFiles/libc.src.inttypes.imaxdiv.dir/imaxdiv.cpp.obj
[477/2522] Building CXX object libc/src/math/generic/CMakeFiles/libc.src.math.generic.issignalingf.dir/issignalingf.cpp.obj
[478/2522] Building CXX object libc/src/stdbit/CMakeFiles/libc.src.stdbit.stdc_bit_floor_uc.dir/stdc_bit_floor_uc.cpp.obj
[479/2522] Building CXX object libc/src/stdbit/CMakeFiles/libc.src.stdbit.stdc_bit_floor_us.dir/stdc_bit_floor_us.cpp.obj
[480/2522] Building CXX object libc/src/__support/CMakeFiles/libc.src.__support.freelist.dir/freelist.cpp.obj
[481/2522] Building CXX object libc/src/stdbit/CMakeFiles/libc.src.stdbit.stdc_bit_floor_ul.dir/stdc_bit_floor_ul.cpp.obj
[482/2522] Building CXX object libc/src/stdio/CMakeFiles/libc.src.stdio.sscanf.dir/sscanf.cpp.obj
FAILED: libc/src/stdio/CMakeFiles/libc.src.stdio.sscanf.dir/sscanf.cpp.obj 
/var/lib/buildbot/fuchsia-x86_64-linux/build/llvm-build-hblaw1l3/./bin/clang++ --target=armv8m.main-none-eabi -DLIBC_NAMESPACE=__llvm_libc_22_0_0_git -I/var/lib/buildbot/fuchsia-x86_64-linux/llvm-project/libc -isystem /var/lib/buildbot/fuchsia-x86_64-linux/build/llvm-build-hblaw1l3/include/armv8m.main-unknown-none-eabi --target=armv8m.main-none-eabi -Wno-atomic-alignment "-Dvfprintf(stream, format, vlist)=vprintf(format, vlist)" "-Dfprintf(stream, format, ...)=printf(format)" "-Dfputs(string, stream)=puts(string)" -D_LIBCPP_PRINT=1 -mthumb -mfloat-abi=softfp -march=armv8m.main+fp+dsp -mcpu=cortex-m33 -fPIC -fno-semantic-interposition -fvisibility-inlines-hidden -Werror=date-time -Werror=unguarded-availability-new -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -Wimplicit-fallthrough -Wcovered-switch-default -Wno-noexcept-type -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -Wsuggest-override -Wstring-conversion -Wmisleading-indentation -Wctad-maybe-unsupported -ffunction-sections -fdata-sections -ffile-prefix-map=/var/lib/buildbot/fuchsia-x86_64-linux/build/llvm-build-hblaw1l3/runtimes/runtimes-armv8m.main-none-eabi-bins=../../../../llvm-project -ffile-prefix-map=/var/lib/buildbot/fuchsia-x86_64-linux/llvm-project/= -no-canonical-prefixes -Os -DNDEBUG --target=armv8m.main-none-eabi -DLIBC_QSORT_IMPL=LIBC_QSORT_HEAP_SORT -DLIBC_TYPES_TIME_T_IS_32_BIT -DLIBC_ADD_NULL_CHECKS "-DLIBC_MATH=(LIBC_MATH_SKIP_ACCURATE_PASS | LIBC_MATH_SMALL_TABLES)" -DLIBC_ERRNO_MODE=LIBC_ERRNO_MODE_EXTERNAL -fpie -ffreestanding -DLIBC_FULL_BUILD -nostdlibinc -ffixed-point -fno-builtin -fno-exceptions -fno-lax-vector-conversions -fno-unwind-tables -fno-asynchronous-unwind-tables -fno-rtti -ftrivial-auto-var-init=pattern -fno-omit-frame-pointer -Wall -Wextra -Werror -Wconversion -Wno-sign-conversion -Wdeprecated -Wno-c99-extensions -Wno-gnu-imaginary-constant -Wno-pedantic -Wimplicit-fallthrough -Wwrite-strings -Wextra-semi -Wnewline-eof -Wnonportable-system-include-path -Wstrict-prototypes -Wthread-safety -Wglobal-constructors -DLIBC_COPT_PUBLIC_PACKAGING -MD -MT libc/src/stdio/CMakeFiles/libc.src.stdio.sscanf.dir/sscanf.cpp.obj -MF libc/src/stdio/CMakeFiles/libc.src.stdio.sscanf.dir/sscanf.cpp.obj.d -o libc/src/stdio/CMakeFiles/libc.src.stdio.sscanf.dir/sscanf.cpp.obj -c /var/lib/buildbot/fuchsia-x86_64-linux/llvm-project/libc/src/stdio/sscanf.cpp
In file included from /var/lib/buildbot/fuchsia-x86_64-linux/llvm-project/libc/src/stdio/sscanf.cpp:14:
In file included from /var/lib/buildbot/fuchsia-x86_64-linux/llvm-project/libc/src/stdio/scanf_core/scanf_main.h:14:
In file included from /var/lib/buildbot/fuchsia-x86_64-linux/llvm-project/libc/src/stdio/scanf_core/converter.h:15:
In file included from /var/lib/buildbot/fuchsia-x86_64-linux/llvm-project/libc/src/stdio/scanf_core/core_structs.h:16:
/var/lib/buildbot/fuchsia-x86_64-linux/build/llvm-build-hblaw1l3/./lib/clang/22/include/inttypes.h:24:15: fatal error: 'inttypes.h' file not found
   24 | #include_next <inttypes.h>
      |               ^~~~~~~~~~~~
1 error generated.
[483/2522] Building CXX object libc/src/inttypes/CMakeFiles/libc.src.inttypes.imaxabs.dir/imaxabs.cpp.obj
[484/2522] Building CXX object libc/src/math/generic/CMakeFiles/libc.src.math.generic.issignalingl.dir/issignalingl.cpp.obj
[485/2522] Building CXX object libc/src/stdbit/CMakeFiles/libc.src.stdbit.stdc_bit_width_ull.dir/stdc_bit_width_ull.cpp.obj
[486/2522] Building CXX object libc/src/stdbit/CMakeFiles/libc.src.stdbit.stdc_bit_floor_ui.dir/stdc_bit_floor_ui.cpp.obj
[487/2522] Building CXX object libc/src/stdbit/CMakeFiles/libc.src.stdbit.stdc_bit_floor_ull.dir/stdc_bit_floor_ull.cpp.obj
[488/2522] Building CXX object libc/src/__support/CMakeFiles/libc.src.__support.freetrie.dir/freetrie.cpp.obj
[489/2522] Building CXX object libc/src/stdbit/CMakeFiles/libc.src.stdbit.stdc_bit_ceil_uc.dir/stdc_bit_ceil_uc.cpp.obj
[490/2522] Building CXX object libc/src/stdbit/CMakeFiles/libc.src.stdbit.stdc_bit_ceil_ull.dir/stdc_bit_ceil_ull.cpp.obj
[491/2522] Building CXX object libc/src/stdbit/CMakeFiles/libc.src.stdbit.stdc_bit_ceil_ui.dir/stdc_bit_ceil_ui.cpp.obj
[492/2522] Building CXX object libc/src/math/generic/CMakeFiles/libc.src.math.generic.iscanonicalf.dir/iscanonicalf.cpp.obj
[493/2522] Building CXX object libc/src/stdbit/CMakeFiles/libc.src.stdbit.stdc_first_leading_zero_uc.dir/stdc_first_leading_zero_uc.cpp.obj
[494/2522] Building CXX object libc/src/stdbit/CMakeFiles/libc.src.stdbit.stdc_first_leading_zero_us.dir/stdc_first_leading_zero_us.cpp.obj
[495/2522] Building CXX object libc/src/stdbit/CMakeFiles/libc.src.stdbit.stdc_bit_ceil_ul.dir/stdc_bit_ceil_ul.cpp.obj
[496/2522] Building CXX object libc/src/stdbit/CMakeFiles/libc.src.stdbit.stdc_first_leading_zero_ul.dir/stdc_first_leading_zero_ul.cpp.obj
[497/2522] Building CXX object libc/src/math/generic/CMakeFiles/libc.src.math.generic.iscanonical.dir/iscanonical.cpp.obj
[498/2522] Building CXX object libc/src/stdbit/CMakeFiles/libc.src.stdbit.stdc_bit_ceil_us.dir/stdc_bit_ceil_us.cpp.obj
[499/2522] Building CXX object libc/src/stdbit/CMakeFiles/libc.src.stdbit.stdc_first_leading_zero_ui.dir/stdc_first_leading_zero_ui.cpp.obj
[500/2522] Building CXX object libc/src/stdbit/CMakeFiles/libc.src.stdbit.stdc_first_leading_one_us.dir/stdc_first_leading_one_us.cpp.obj
[501/2522] Building CXX object libc/src/stdbit/CMakeFiles/libc.src.stdbit.stdc_first_leading_one_ui.dir/stdc_first_leading_one_ui.cpp.obj
[502/2522] Building CXX object libc/src/math/generic/CMakeFiles/libc.src.math.generic.iscanonicall.dir/iscanonicall.cpp.obj
[503/2522] Building CXX object libc/src/stdbit/CMakeFiles/libc.src.stdbit.stdc_first_trailing_zero_ui.dir/stdc_first_trailing_zero_ui.cpp.obj
[504/2522] Building CXX object libc/src/stdbit/CMakeFiles/libc.src.stdbit.stdc_first_leading_one_ull.dir/stdc_first_leading_one_ull.cpp.obj
[505/2522] Building CXX object libc/src/stdbit/CMakeFiles/libc.src.stdbit.stdc_first_leading_one_uc.dir/stdc_first_leading_one_uc.cpp.obj
[506/2522] Building CXX object libc/src/stdbit/CMakeFiles/libc.src.stdbit.stdc_first_leading_zero_ull.dir/stdc_first_leading_zero_ull.cpp.obj
[507/2522] Building CXX object libc/src/stdbit/CMakeFiles/libc.src.stdbit.stdc_first_trailing_zero_us.dir/stdc_first_trailing_zero_us.cpp.obj
[508/2522] Building CXX object libc/src/stdbit/CMakeFiles/libc.src.stdbit.stdc_first_trailing_zero_uc.dir/stdc_first_trailing_zero_uc.cpp.obj
[509/2522] Building CXX object libc/src/stdbit/CMakeFiles/libc.src.stdbit.stdc_first_leading_one_ul.dir/stdc_first_leading_one_ul.cpp.obj
[510/2522] Building CXX object libc/src/__support/CMakeFiles/libc.src.__support.freelist_heap.dir/freelist_heap.cpp.obj
[511/2522] Building CXX object libc/src/stdbit/CMakeFiles/libc.src.stdbit.stdc_first_trailing_one_us.dir/stdc_first_trailing_one_us.cpp.obj

…parts. (#148817) This patch adds a new ExtractLane VPInstruction which extracts across multiple parts using a wide index, to be used in combination with FirstActiveLane. The patch updates early-exit codegen to use it instead ExtractElement, which is only per-part. With this change, interleaving should work correctly with early-exit loops. The patch removes the restrictions added in 6f43754 (#145877), but does not yet automatically select interleave counts > 1 for early-exit loops. I'll share a patch as follow-up. The cost of extracting a lane adds non-trivial overhead in the exit block, so that should be considered when picking the interleave count. PR: llvm/llvm-project#148817

…m#148817) This patch adds a new ExtractLane VPInstruction which extracts across multiple parts using a wide index, to be used in combination with FirstActiveLane. The patch updates early-exit codegen to use it instead ExtractElement, which is only per-part. With this change, interleaving should work correctly with early-exit loops. The patch removes the restrictions added in 6f43754 (llvm#145877), but does not yet automatically select interleave counts > 1 for early-exit loops. I'll share a patch as follow-up. The cost of extracting a lane adds non-trivial overhead in the exit block, so that should be considered when picking the interleave count. PR: llvm#148817

…149042) Building on top of #148817, introduce a new abstract LastActiveLane opcode that gets lowered to Not(Mask) → FirstActiveLane(NotMask) → Sub(result, 1). When folding the tail, update all extracts for uses outside the loop the extract the value of the last actice lane. See also #148603 PR: #149042

…l-folding. (#149042) Building on top of llvm/llvm-project#148817, introduce a new abstract LastActiveLane opcode that gets lowered to Not(Mask) → FirstActiveLane(NotMask) → Sub(result, 1). When folding the tail, update all extracts for uses outside the loop the extract the value of the last actice lane. See also llvm/llvm-project#148603 PR: llvm/llvm-project#149042

…lvm#149042) Building on top of llvm#148817, introduce a new abstract LastActiveLane opcode that gets lowered to Not(Mask) → FirstActiveLane(NotMask) → Sub(result, 1). When folding the tail, update all extracts for uses outside the loop the extract the value of the last actice lane. See also llvm#148603 PR: llvm#149042

…folding. (#149042)" This reverts commit a6edeed. The following fixes have landed, addressing issues causing the original revert: * #169298 * #167897 * #168949 Original message: Building on top of #148817, introduce a new abstract LastActiveLane opcode that gets lowered to Not(Mask) → FirstActiveLane(NotMask) → Sub(result, 1). When folding the tail, update all extracts for uses outside the loop the extract the value of the last actice lane. See also #148603 PR: #149042

… when tail-folding. (#149042)" This reverts commit a6edeed. The following fixes have landed, addressing issues causing the original revert: * llvm/llvm-project#169298 * llvm/llvm-project#167897 * llvm/llvm-project#168949 Original message: Building on top of llvm/llvm-project#148817, introduce a new abstract LastActiveLane opcode that gets lowered to Not(Mask) → FirstActiveLane(NotMask) → Sub(result, 1). When folding the tail, update all extracts for uses outside the loop the extract the value of the last actice lane. See also llvm/llvm-project#148603 PR: llvm/llvm-project#149042

…folding. (#149042)" This reverts commit a6edeed. The following fixes have landed, addressing issues causing the original revert: * #169298 * #167897 * #168949 Original message: Building on top of #148817, introduce a new abstract LastActiveLane opcode that gets lowered to Not(Mask) → FirstActiveLane(NotMask) → Sub(result, 1). When folding the tail, update all extracts for uses outside the loop the extract the value of the last actice lane. See also #148603 PR: #149042

… when tail-folding. (#149042)" This reverts commit a6edeed. The following fixes have landed, addressing issues causing the original revert: * llvm/llvm-project#169298 * llvm/llvm-project#167897 * llvm/llvm-project#168949 Original message: Building on top of llvm/llvm-project#148817, introduce a new abstract LastActiveLane opcode that gets lowered to Not(Mask) → FirstActiveLane(NotMask) → Sub(result, 1). When folding the tail, update all extracts for uses outside the loop the extract the value of the last actice lane. See also llvm/llvm-project#148603 PR: llvm/llvm-project#149042

…folding. (llvm#149042)" This reverts commit a6edeed. The following fixes have landed, addressing issues causing the original revert: * llvm#169298 * llvm#167897 * llvm#168949 Original message: Building on top of llvm#148817, introduce a new abstract LastActiveLane opcode that gets lowered to Not(Mask) → FirstActiveLane(NotMask) → Sub(result, 1). When folding the tail, update all extracts for uses outside the loop the extract the value of the last actice lane. See also llvm#148603 PR: llvm#149042

fhahn requested review from aniragil, ayalz and david-arm July 15, 2025 09:53

llvmbot added vectorizers llvm:transforms labels Jul 15, 2025

This was referenced Jul 16, 2025

[LV] Use ExtractLane(LastActiveLane, V) live outs when tail-folding. #149042

Merged

Vectorize loops with exit users with tail folding #148603

Closed

lukel97 reviewed Jul 17, 2025

View reviewed changes

llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp Outdated Show resolved Hide resolved

david-arm requested changes Jul 23, 2025

View reviewed changes

lukel97 reviewed Jul 23, 2025

View reviewed changes

llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp Outdated Show resolved Hide resolved

fhahn force-pushed the vplan-extract-lane-early-exit-interleave branch from 0b5ce1a to f71b4ec Compare July 23, 2025 14:18

david-arm self-requested a review July 23, 2025 14:35

lukel97 reviewed Jul 24, 2025

View reviewed changes

llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp Outdated Show resolved Hide resolved

fhahn force-pushed the vplan-extract-lane-early-exit-interleave branch from f71b4ec to efa987c Compare July 25, 2025 09:32

lukel97 approved these changes Jul 25, 2025

View reviewed changes

fhahn added 3 commits July 26, 2025 20:33

!fixup fix onlyFirstLaneUsed for ExtractLane

4da0b7a

!fixup update tests after rebase

58b39b0

fhahn force-pushed the vplan-extract-lane-early-exit-interleave branch from efa987c to 58b39b0 Compare July 26, 2025 19:42

fhahn merged commit 80c43b6 into llvm:main Jul 27, 2025
9 checks passed

fhahn deleted the vplan-extract-lane-early-exit-interleave branch July 27, 2025 07:08

fhahn mentioned this pull request Jul 27, 2025

[LV] Vectorize FMax via OrderedFCmpSelect w/o fast-math flags. #146711

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[VPlan] Add ExtractLane VPInst to extract across multiple parts. #148817

[VPlan] Add ExtractLane VPInst to extract across multiple parts. #148817

Uh oh!

fhahn commented Jul 15, 2025

Uh oh!

llvmbot commented Jul 15, 2025 •

edited

Loading

Uh oh!

Uh oh!

david-arm left a comment

Uh oh!

david-arm Jul 23, 2025 •

edited

Loading

Uh oh!

david-arm Jul 23, 2025

Uh oh!

fhahn Jul 23, 2025

Uh oh!

david-arm Jul 23, 2025

Uh oh!

Uh oh!

Uh oh!

lukel97 left a comment

Uh oh!

fhahn commented Jul 27, 2025

Uh oh!

Uh oh!

llvm-ci commented Jul 27, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

[VPlan] Add ExtractLane VPInst to extract across multiple parts. #148817

[VPlan] Add ExtractLane VPInst to extract across multiple parts. #148817

Uh oh!

Conversation

fhahn commented Jul 15, 2025

Uh oh!

llvmbot commented Jul 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

david-arm left a comment

Choose a reason for hiding this comment

Uh oh!

david-arm Jul 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

david-arm Jul 23, 2025

Choose a reason for hiding this comment

Uh oh!

fhahn Jul 23, 2025

Choose a reason for hiding this comment

Uh oh!

david-arm Jul 23, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

lukel97 left a comment

Choose a reason for hiding this comment

Uh oh!

fhahn commented Jul 27, 2025

Uh oh!

Uh oh!

llvm-ci commented Jul 27, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

llvmbot commented Jul 15, 2025 •

edited

Loading

david-arm Jul 23, 2025 •

edited

Loading