[TTI][Vectorize] Migrate masked/gather-scatter/strided/expand-compress costing (NFCI) #165532

arcbbb · 2025-10-29T09:43:00Z

In #160470, there is a discussion about the possibility to explored a general approach for handling memory intrinsics.

API changes:

Remove getMaskedMemoryOpCost, getGatherScatterOpCost, getExpandCompressMemoryOpCost, getStridedMemoryOpCost from Analysis/TargetTransformInfo.
Add getMemIntrinsicInstrCost.

In BasicTTIImpl, map intrinsic IDs to existing target implementation until the legacy TTI hooks are retired.

masked_load/store → getMaskedMemoryOpCost
masked_/vp_gather/scatter → getGatherScatterOpCost
masked_expandload/compressstore → getExpandCompressMemoryOpCost
experimental_vp_strided_{load,store} → getStridedMemoryOpCost
TODO: add support for vp_load_ff.

No functional change intended; costs continue to route to the same target-specific hooks.

github-actions · 2025-10-29T09:44:45Z

✅ With the latest revision this PR passed the C/C++ code formatter.

arcbbb · 2025-10-31T01:59:04Z

This approach doesn’t work for getGatherScatterOpCost: ARMTTIImpl::getGatherScatterOpCost walks the use chain of a provided LoadInst/StoreInst, whereas ICA currently expects an IntrinsicInst. Consider introducing getMemIntrinsicInstrCost to cover this scenario..

llvmbot · 2025-11-03T06:45:48Z

@llvm/pr-subscribers-llvm-transforms
@llvm/pr-subscribers-vectorizers
@llvm/pr-subscribers-llvm-analysis

@llvm/pr-subscribers-backend-risc-v

Author: Shih-Po Hung (arcbbb)

Changes

In #160470, there is a discussion about the possibility to explored a general approach for handling memory intrinsics.

This patch adds alignment to IntrinsicCostAttributes for type-based cost queries used by SLP and Loop Vectorizer before IR is materialized.

Candidates to adopt this are getMaskedMemoryOpCost, getGatherScatterOpCost, getExpandCompressMemoryOpCost, and getStridedMemoryOpCost.

Full diff: https://github.com/llvm/llvm-project/pull/165532.diff

7 Files Affected:

(modified) llvm/include/llvm/Analysis/TargetTransformInfo.h (+9-14)
(modified) llvm/include/llvm/Analysis/TargetTransformInfoImpl.h (-8)
(modified) llvm/include/llvm/CodeGen/BasicTTIImpl.h (+18-27)
(modified) llvm/lib/Analysis/TargetTransformInfo.cpp (+8-9)
(modified) llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp (+37-23)
(modified) llvm/lib/Target/RISCV/RISCVTargetTransformInfo.h (-6)
(modified) llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp (+35-16)

diff --git a/llvm/include/llvm/Analysis/TargetTransformInfo.h b/llvm/include/llvm/Analysis/TargetTransformInfo.h
index 7b7dc1b46dd80..0270a65eac776 100644
--- a/llvm/include/llvm/Analysis/TargetTransformInfo.h
+++ b/llvm/include/llvm/Analysis/TargetTransformInfo.h
@@ -135,6 +135,9 @@ class IntrinsicCostAttributes {
   InstructionCost ScalarizationCost = InstructionCost::getInvalid();
   TargetLibraryInfo const *LibInfo = nullptr;
 
+  MaybeAlign Alignment;
+  bool VariableMask = false;
+
 public:
   LLVM_ABI IntrinsicCostAttributes(
       Intrinsic::ID Id, const CallBase &CI,
@@ -146,6 +149,10 @@ class IntrinsicCostAttributes {
       FastMathFlags Flags = FastMathFlags(), const IntrinsicInst *I = nullptr,
       InstructionCost ScalarCost = InstructionCost::getInvalid());
 
+  LLVM_ABI IntrinsicCostAttributes(Intrinsic::ID Id, Type *RTy,
+                                   ArrayRef<Type *> Tys, Align Alignment,
+                                   bool VariableMask = false);
+
   LLVM_ABI IntrinsicCostAttributes(Intrinsic::ID Id, Type *RTy,
                                    ArrayRef<const Value *> Args);
 
@@ -160,6 +167,8 @@ class IntrinsicCostAttributes {
   const IntrinsicInst *getInst() const { return II; }
   Type *getReturnType() const { return RetTy; }
   FastMathFlags getFlags() const { return FMF; }
+  MaybeAlign getAlign() const { return Alignment; }
+  bool getVariableMask() const { return VariableMask; }
   InstructionCost getScalarizationCost() const { return ScalarizationCost; }
   const SmallVectorImpl<const Value *> &getArgs() const { return Arguments; }
   const SmallVectorImpl<Type *> &getArgTypes() const { return ParamTys; }
@@ -1586,20 +1595,6 @@ class TargetTransformInfo {
       TTI::TargetCostKind CostKind = TTI::TCK_RecipThroughput,
       const Instruction *I = nullptr) const;
 
-  /// \return The cost of strided memory operations.
-  /// \p Opcode - is a type of memory access Load or Store
-  /// \p DataTy - a vector type of the data to be loaded or stored
-  /// \p Ptr - pointer [or vector of pointers] - address[es] in memory
-  /// \p VariableMask - true when the memory access is predicated with a mask
-  ///                   that is not a compile-time constant
-  /// \p Alignment - alignment of single element
-  /// \p I - the optional original context instruction, if one exists, e.g. the
-  ///        load/store to transform or the call to the gather/scatter intrinsic
-  LLVM_ABI InstructionCost getStridedMemoryOpCost(
-      unsigned Opcode, Type *DataTy, const Value *Ptr, bool VariableMask,
-      Align Alignment, TTI::TargetCostKind CostKind = TTI::TCK_RecipThroughput,
-      const Instruction *I = nullptr) const;
-
   /// \return The cost of the interleaved memory operation.
   /// \p Opcode is the memory operation code
   /// \p VecTy is the vector type of the interleaved access.
diff --git a/llvm/include/llvm/Analysis/TargetTransformInfoImpl.h b/llvm/include/llvm/Analysis/TargetTransformInfoImpl.h
index 4cd607c0d0c8d..a9591704c9d14 100644
--- a/llvm/include/llvm/Analysis/TargetTransformInfoImpl.h
+++ b/llvm/include/llvm/Analysis/TargetTransformInfoImpl.h
@@ -862,14 +862,6 @@ class TargetTransformInfoImplBase {
     return 1;
   }
 
-  virtual InstructionCost
-  getStridedMemoryOpCost(unsigned Opcode, Type *DataTy, const Value *Ptr,
-                         bool VariableMask, Align Alignment,
-                         TTI::TargetCostKind CostKind,
-                         const Instruction *I = nullptr) const {
-    return InstructionCost::getInvalid();
-  }
-
   virtual InstructionCost getInterleavedMemoryOpCost(
       unsigned Opcode, Type *VecTy, unsigned Factor, ArrayRef<unsigned> Indices,
       Align Alignment, unsigned AddressSpace, TTI::TargetCostKind CostKind,
diff --git a/llvm/include/llvm/CodeGen/BasicTTIImpl.h b/llvm/include/llvm/CodeGen/BasicTTIImpl.h
index 76b6c8ec68c72..fbd481d1c794f 100644
--- a/llvm/include/llvm/CodeGen/BasicTTIImpl.h
+++ b/llvm/include/llvm/CodeGen/BasicTTIImpl.h
@@ -1574,18 +1574,6 @@ class BasicTTIImplBase : public TargetTransformInfoImplCRTPBase<T> {
                                        /*IsGatherScatter*/ true, CostKind);
   }
 
-  InstructionCost getStridedMemoryOpCost(unsigned Opcode, Type *DataTy,
-                                         const Value *Ptr, bool VariableMask,
-                                         Align Alignment,
-                                         TTI::TargetCostKind CostKind,
-                                         const Instruction *I) const override {
-    // For a target without strided memory operations (or for an illegal
-    // operation type on one which does), assume we lower to a gather/scatter
-    // operation.  (Which may in turn be scalarized.)
-    return thisT()->getGatherScatterOpCost(Opcode, DataTy, Ptr, VariableMask,
-                                           Alignment, CostKind, I);
-  }
-
   InstructionCost getInterleavedMemoryOpCost(
       unsigned Opcode, Type *VecTy, unsigned Factor, ArrayRef<unsigned> Indices,
       Align Alignment, unsigned AddressSpace, TTI::TargetCostKind CostKind,
@@ -1958,27 +1946,26 @@ class BasicTTIImplBase : public TargetTransformInfoImplCRTPBase<T> {
     }
     case Intrinsic::experimental_vp_strided_store: {
       const Value *Data = Args[0];
-      const Value *Ptr = Args[1];
       const Value *Mask = Args[3];
       const Value *EVL = Args[4];
       bool VarMask = !isa<Constant>(Mask) || !isa<Constant>(EVL);
       Type *EltTy = cast<VectorType>(Data->getType())->getElementType();
       Align Alignment =
           I->getParamAlign(1).value_or(thisT()->DL.getABITypeAlign(EltTy));
-      return thisT()->getStridedMemoryOpCost(Instruction::Store,
-                                             Data->getType(), Ptr, VarMask,
-                                             Alignment, CostKind, I);
+      return thisT()->getCommonMaskedMemoryOpCost(
+          Instruction::Store, Data->getType(), Alignment, VarMask,
+          /*IsGatherScatter*/ true, CostKind);
     }
     case Intrinsic::experimental_vp_strided_load: {
-      const Value *Ptr = Args[0];
       const Value *Mask = Args[2];
       const Value *EVL = Args[3];
       bool VarMask = !isa<Constant>(Mask) || !isa<Constant>(EVL);
       Type *EltTy = cast<VectorType>(RetTy)->getElementType();
       Align Alignment =
           I->getParamAlign(0).value_or(thisT()->DL.getABITypeAlign(EltTy));
-      return thisT()->getStridedMemoryOpCost(Instruction::Load, RetTy, Ptr,
-                                             VarMask, Alignment, CostKind, I);
+      return thisT()->getCommonMaskedMemoryOpCost(
+          Instruction::Load, RetTy, Alignment, VarMask,
+          /*IsGatherScatter*/ true, CostKind);
     }
     case Intrinsic::stepvector: {
       if (isa<ScalableVectorType>(RetTy))
@@ -2418,17 +2405,21 @@ class BasicTTIImplBase : public TargetTransformInfoImplCRTPBase<T> {
     }
     case Intrinsic::experimental_vp_strided_store: {
       auto *Ty = cast<VectorType>(ICA.getArgTypes()[0]);
-      Align Alignment = thisT()->DL.getABITypeAlign(Ty->getElementType());
-      return thisT()->getStridedMemoryOpCost(
-          Instruction::Store, Ty, /*Ptr=*/nullptr, /*VariableMask=*/true,
-          Alignment, CostKind, ICA.getInst());
+      Align Alignment = ICA.getAlign().value_or(
+          thisT()->DL.getABITypeAlign(Ty->getElementType()));
+      return thisT()->getCommonMaskedMemoryOpCost(
+          Instruction::Store, Ty, Alignment,
+          /*VariableMask=*/true,
+          /*IsGatherScatter*/ true, CostKind);
     }
     case Intrinsic::experimental_vp_strided_load: {
       auto *Ty = cast<VectorType>(ICA.getReturnType());
-      Align Alignment = thisT()->DL.getABITypeAlign(Ty->getElementType());
-      return thisT()->getStridedMemoryOpCost(
-          Instruction::Load, Ty, /*Ptr=*/nullptr, /*VariableMask=*/true,
-          Alignment, CostKind, ICA.getInst());
+      Align Alignment = ICA.getAlign().value_or(
+          thisT()->DL.getABITypeAlign(Ty->getElementType()));
+      return thisT()->getCommonMaskedMemoryOpCost(
+          Instruction::Load, Ty, Alignment,
+          /*VariableMask=*/true,
+          /*IsGatherScatter*/ true, CostKind);
     }
     case Intrinsic::vector_reduce_add:
     case Intrinsic::vector_reduce_mul:
diff --git a/llvm/lib/Analysis/TargetTransformInfo.cpp b/llvm/lib/Analysis/TargetTransformInfo.cpp
index c47a1c1b23a37..821017e985cc2 100644
--- a/llvm/lib/Analysis/TargetTransformInfo.cpp
+++ b/llvm/lib/Analysis/TargetTransformInfo.cpp
@@ -96,6 +96,14 @@ IntrinsicCostAttributes::IntrinsicCostAttributes(Intrinsic::ID Id, Type *RTy,
   ParamTys.insert(ParamTys.begin(), Tys.begin(), Tys.end());
 }
 
+IntrinsicCostAttributes::IntrinsicCostAttributes(Intrinsic::ID Id, Type *RTy,
+                                                 ArrayRef<Type *> Tys,
+                                                 Align Alignment,
+                                                 bool VariableMask)
+    : RetTy(RTy), IID(Id), Alignment(Alignment), VariableMask(VariableMask) {
+  ParamTys.insert(ParamTys.begin(), Tys.begin(), Tys.end());
+}
+
 IntrinsicCostAttributes::IntrinsicCostAttributes(Intrinsic::ID Id, Type *Ty,
                                                  ArrayRef<const Value *> Args)
     : RetTy(Ty), IID(Id) {
@@ -1210,15 +1218,6 @@ InstructionCost TargetTransformInfo::getExpandCompressMemoryOpCost(
   return Cost;
 }
 
-InstructionCost TargetTransformInfo::getStridedMemoryOpCost(
-    unsigned Opcode, Type *DataTy, const Value *Ptr, bool VariableMask,
-    Align Alignment, TTI::TargetCostKind CostKind, const Instruction *I) const {
-  InstructionCost Cost = TTIImpl->getStridedMemoryOpCost(
-      Opcode, DataTy, Ptr, VariableMask, Alignment, CostKind, I);
-  assert(Cost >= 0 && "TTI should not produce negative costs!");
-  return Cost;
-}
-
 InstructionCost TargetTransformInfo::getInterleavedMemoryOpCost(
     unsigned Opcode, Type *VecTy, unsigned Factor, ArrayRef<unsigned> Indices,
     Align Alignment, unsigned AddressSpace, TTI::TargetCostKind CostKind,
diff --git a/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp b/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp
index 7bc0b5b394828..da74320af0821 100644
--- a/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp
+++ b/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp
@@ -1172,29 +1172,6 @@ InstructionCost RISCVTTIImpl::getExpandCompressMemoryOpCost(
          LT.first * getRISCVInstructionCost(Opcodes, LT.second, CostKind);
 }
 
-InstructionCost RISCVTTIImpl::getStridedMemoryOpCost(
-    unsigned Opcode, Type *DataTy, const Value *Ptr, bool VariableMask,
-    Align Alignment, TTI::TargetCostKind CostKind, const Instruction *I) const {
-  if (((Opcode == Instruction::Load || Opcode == Instruction::Store) &&
-       !isLegalStridedLoadStore(DataTy, Alignment)) ||
-      (Opcode != Instruction::Load && Opcode != Instruction::Store))
-    return BaseT::getStridedMemoryOpCost(Opcode, DataTy, Ptr, VariableMask,
-                                         Alignment, CostKind, I);
-
-  if (CostKind == TTI::TCK_CodeSize)
-    return TTI::TCC_Basic;
-
-  // Cost is proportional to the number of memory operations implied.  For
-  // scalable vectors, we use an estimate on that number since we don't
-  // know exactly what VL will be.
-  auto &VTy = *cast<VectorType>(DataTy);
-  InstructionCost MemOpCost =
-      getMemoryOpCost(Opcode, VTy.getElementType(), Alignment, 0, CostKind,
-                      {TTI::OK_AnyValue, TTI::OP_None}, I);
-  unsigned NumLoads = getEstimatedVLFor(&VTy);
-  return NumLoads * MemOpCost;
-}
-
 InstructionCost
 RISCVTTIImpl::getCostOfKeepingLiveOverCall(ArrayRef<Type *> Tys) const {
   // FIXME: This is a property of the default vector convention, not
@@ -1561,6 +1538,43 @@ RISCVTTIImpl::getIntrinsicInstrCost(const IntrinsicCostAttributes &ICA,
                           cast<VectorType>(ICA.getArgTypes()[0]), {}, CostKind,
                           0, cast<VectorType>(ICA.getReturnType()));
   }
+  case Intrinsic::experimental_vp_strided_load:
+  case Intrinsic::experimental_vp_strided_store: {
+    if (CostKind == TTI::TCK_CodeSize)
+      return TTI::TCC_Basic;
+
+    auto *DataTy = (ICA.getID() == Intrinsic::experimental_vp_strided_load)
+                       ? cast<VectorType>(ICA.getReturnType())
+                       : cast<VectorType>(ICA.getArgTypes()[0]);
+    Type *EltTy = DataTy->getElementType();
+
+    Align ABITyAlign = DL.getABITypeAlign(EltTy);
+
+    const IntrinsicInst *I = ICA.getInst();
+    Align Alignment;
+    if (ICA.isTypeBasedOnly())
+      Alignment = ICA.getAlign().value_or(ABITyAlign);
+    else {
+      unsigned Index =
+          (ICA.getID() == Intrinsic::experimental_vp_strided_load) ? 0 : 1;
+      Alignment = I->getParamAlign(Index).value_or(ABITyAlign);
+    }
+
+    if (!isLegalStridedLoadStore(DataTy, Alignment))
+      return BaseT::getIntrinsicInstrCost(ICA, CostKind);
+
+    unsigned Opcode = ICA.getID() == Intrinsic::experimental_vp_strided_load
+                          ? Instruction::Load
+                          : Instruction::Store;
+    // Cost is proportional to the number of memory operations implied.  For
+    // scalable vectors, we use an estimate on that number since we don't
+    // know exactly what VL will be.
+    InstructionCost MemOpCost =
+        getMemoryOpCost(Opcode, EltTy, Alignment, 0, CostKind,
+                        {TTI::OK_AnyValue, TTI::OP_None}, I);
+    unsigned NumLoads = getEstimatedVLFor(DataTy);
+    return NumLoads * MemOpCost;
+  }
   case Intrinsic::fptoui_sat:
   case Intrinsic::fptosi_sat: {
     InstructionCost Cost = 0;
diff --git a/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.h b/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.h
index 6886e8964e29e..af456cdd41bd7 100644
--- a/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.h
+++ b/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.h
@@ -202,12 +202,6 @@ class RISCVTTIImpl final : public BasicTTIImplBase<RISCVTTIImpl> {
                                 Align Alignment, TTI::TargetCostKind CostKind,
                                 const Instruction *I = nullptr) const override;
 
-  InstructionCost getStridedMemoryOpCost(unsigned Opcode, Type *DataTy,
-                                         const Value *Ptr, bool VariableMask,
-                                         Align Alignment,
-                                         TTI::TargetCostKind CostKind,
-                                         const Instruction *I) const override;
-
   InstructionCost
   getCostOfKeepingLiveOverCall(ArrayRef<Type *> Tys) const override;
 
diff --git a/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp b/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
index 4fcaf6dabb513..262201dac131a 100644
--- a/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
+++ b/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
@@ -7224,10 +7224,13 @@ BoUpSLP::LoadsState BoUpSLP::canVectorizeLoads(
               VectorGEPCost;
           break;
         case LoadsState::StridedVectorize:
-          VecLdCost += TTI.getStridedMemoryOpCost(Instruction::Load, SubVecTy,
-                                                  LI0->getPointerOperand(),
-                                                  /*VariableMask=*/false,
-                                                  CommonAlignment, CostKind) +
+          VecLdCost += TTI.getIntrinsicInstrCost(
+                           {Intrinsic::experimental_vp_strided_load,
+                            SubVecTy,
+                            {},
+                            CommonAlignment,
+                            /*VariableMask=*/false},
+                           CostKind) +
                        VectorGEPCost;
           break;
         case LoadsState::CompressVectorize:
@@ -13191,9 +13194,13 @@ void BoUpSLP::transformNodes() {
                                  BaseLI->getPointerAddressSpace(), CostKind,
                                  TTI::OperandValueInfo()) +
             ::getShuffleCost(*TTI, TTI::SK_Reverse, VecTy, Mask, CostKind);
-        InstructionCost StridedCost = TTI->getStridedMemoryOpCost(
-            Instruction::Load, VecTy, BaseLI->getPointerOperand(),
-            /*VariableMask=*/false, CommonAlignment, CostKind, BaseLI);
+        InstructionCost StridedCost =
+            TTI->getIntrinsicInstrCost({Intrinsic::experimental_vp_strided_load,
+                                        VecTy,
+                                        {},
+                                        CommonAlignment,
+                                        /*VariableMask=*/false},
+                                       CostKind);
         if (StridedCost < OriginalVecCost || ForceStridedLoads) {
           // Strided load is more profitable than consecutive load + reverse -
           // transform the node to strided load.
@@ -13226,9 +13233,13 @@ void BoUpSLP::transformNodes() {
                                  BaseSI->getPointerAddressSpace(), CostKind,
                                  TTI::OperandValueInfo()) +
             ::getShuffleCost(*TTI, TTI::SK_Reverse, VecTy, Mask, CostKind);
-        InstructionCost StridedCost = TTI->getStridedMemoryOpCost(
-            Instruction::Store, VecTy, BaseSI->getPointerOperand(),
-            /*VariableMask=*/false, CommonAlignment, CostKind, BaseSI);
+        InstructionCost StridedCost = TTI->getIntrinsicInstrCost(
+            {Intrinsic::experimental_vp_strided_store,
+             Type::getVoidTy(VecTy->getContext()),
+             {VecTy},
+             CommonAlignment,
+             /*VariableMask=*/false},
+            CostKind);
         if (StridedCost < OriginalVecCost)
           // Strided store is more profitable than reverse + consecutive store -
           // transform the node to strided store.
@@ -14991,9 +15002,13 @@ BoUpSLP::getEntryCost(const TreeEntry *E, ArrayRef<Value *> VectorizedVals,
       case TreeEntry::StridedVectorize: {
         Align CommonAlignment =
             computeCommonAlignment<LoadInst>(UniqueValues.getArrayRef());
-        VecLdCost = TTI->getStridedMemoryOpCost(
-            Instruction::Load, VecTy, LI0->getPointerOperand(),
-            /*VariableMask=*/false, CommonAlignment, CostKind);
+        VecLdCost =
+            TTI->getIntrinsicInstrCost({Intrinsic::experimental_vp_strided_load,
+                                        VecTy,
+                                        {},
+                                        CommonAlignment,
+                                        /*VariableMask=*/false},
+                                       CostKind);
         break;
       }
       case TreeEntry::CompressVectorize: {
@@ -15084,9 +15099,13 @@ BoUpSLP::getEntryCost(const TreeEntry *E, ArrayRef<Value *> VectorizedVals,
       if (E->State == TreeEntry::StridedVectorize) {
         Align CommonAlignment =
             computeCommonAlignment<StoreInst>(UniqueValues.getArrayRef());
-        VecStCost = TTI->getStridedMemoryOpCost(
-            Instruction::Store, VecTy, BaseSI->getPointerOperand(),
-            /*VariableMask=*/false, CommonAlignment, CostKind);
+        VecStCost = TTI->getIntrinsicInstrCost(
+            {Intrinsic::experimental_vp_strided_store,
+             Type::getVoidTy(VecTy->getContext()),
+             {VecTy},
+             CommonAlignment,
+             /*VariableMask=*/false},
+            CostKind);
       } else {
         assert(E->State == TreeEntry::Vectorize &&
                "Expected either strided or consecutive stores.");

lukel97 · 2025-11-03T06:56:17Z

llvm/include/llvm/CodeGen/BasicTTIImpl.h

+      return thisT()->getCommonMaskedMemoryOpCost(
+          Instruction::Store, Data->getType(), Alignment, VarMask,
+          /*IsGatherScatter*/ true, CostKind);


I don't think we currently expand vp.strided.load/store if it's not supported by the target. I think we should probably just return an invalid cost in BasicTTIImpl?

Thanks to catching this. I am reworking this to use the new getMemIntrinsicInstrCost and will address it in my next update.

…ormInfo

fhahn

This approach doesn’t work for getGatherScatterOpCost: ARMTTIImpl::getGatherScatterOpCost walks the use chain of a provided LoadInst/StoreInst, whereas ICA currently expects an IntrinsicInst. Consider introducing getMemIntrinsicInstrCost to cover this scenario..

Thanks for pushing to unify this. I think the description is currently out-of-date with the implementation and would need updating

fhahn · 2025-11-04T09:52:55Z

llvm/include/llvm/CodeGen/BasicTTIImpl.h

+      unsigned Opcode = (Id == Intrinsic::experimental_vp_strided_load)
+                            ? Instruction::Load
+                            : Instruction::Store;
+      return thisT()->getStridedMemoryOpCost(Opcode, DataTy, Ptr, VariableMask,


should those also be migrated, possibly not in the initial PR but as follow-ups?

Yes, getStridedMemoryOpCost was straightforward (RISCV-only). The others are implemented across several backends, so I’ll stage those as follow-ups with the relevant target reviewers

Sounds great, thanks!

lukel97

The getMemIntrinsicInstrCost hook is fine by me. But I also wonder if it would be possible to store the Instruction + VariableMask inside IntrinsicCostAttributes and reuse getIntrinsicInstCost. Is that difficult to do? Not strongly opinionated on this either way

llvm/include/llvm/CodeGen/BasicTTIImpl.h

lukel97 · 2025-11-04T09:57:58Z

llvm/include/llvm/Analysis/TargetTransformInfo.h

+  /// \p I - the optional original context instruction, if one exists, e.g. the
+  ///        load/store to transform or the call to the gather/scatter intrinsic


We should probably find a better way to model the extending/truncating logic that ARM's MVE TTI needs. I don't think plumbing through the instruction is ideal, but for another PR :)

llvm/include/llvm/CodeGen/BasicTTIImpl.h

llvm/include/llvm/Analysis/TargetTransformInfo.h

davemgreen · 2025-11-04T10:32:43Z

What is the benefit of this approach, over having separate functions? It would seem that adding intrinsic-specific information would be simpler to pass to individual cost functions if needed.

There is also getInterleavedMemoryOpCost that is conceptually similar but doesn't have an intrinsic at the moment. (Although I think it might be worth adding one).

arcbbb · 2025-11-05T12:11:02Z

The getMemIntrinsicInstrCost hook is fine by me. But I also wonder if it would be possible to store the Instruction + VariableMask inside IntrinsicCostAttributes and reuse getIntrinsicInstCost. Is that difficult to do? Not strongly opinionated on this either way

It is tempting to reuse getIntrinsicInstrCost by stuffing the memory extras into IntrinsicCostAttributes, but I only figure out two imperfect ways to do it:

Option 1: Reuse ParamTys and Arguments in ICA
- At callsite, Store takes something like: getIntrinsicInstrCost({IID, getVoidTy(), {Data, Ptr}, VariableMask, Align}, ...)
- Load: getIntrinsicInstrCost({IID, DataTy, {Ptr}, VariableMask, Align}, ...)
- Downside: Not ergonomic and now users have to fabricate pseudo-args and the tuple differs for loads vs stores.
Option 2: Ignore the ParamTys and Arguments in ICA
- At callsite it will be getIntrinsicInstrCost({Ptr, DataTy, Align, VariableMask}, ...)
- Downside: Need a exclusive mode in ICA because ParamTys & Arguments are not used. and the shared parts in ICA are little.

And so far I am not satisfied with either of these.

arcbbb · 2025-11-05T12:54:52Z

What is the benefit of this approach, over having separate functions? It would seem that adding intrinsic-specific information would be simpler to pass to individual cost functions if needed.

When I added the vp.load.ff cost hooks (#160470), I noticed vp_load, vp_load_ff, vp_gather, masked_load, and masked_expandload share almost the same interface.
it would be good to unify them behind a single interface to reduce boilerplate.

davemgreen · 2025-11-05T18:40:47Z

When I added the vp.load.ff cost hooks (#160470), I noticed vp_load, vp_load_ff, vp_gather, masked_load, and masked_expandload share almost the same interface. it would be good to unify them behind a single interface to reduce boilerplate.

OK, sounds good. I'm not against this, I was just thinking of things like what the stride was in a StridedMemoryOp (constants / small values could be cheaper on some hypothetical architecture). GatherScatters could do with better cost modelling, including possible whether the addresses are base + vector offset or vector-base + constant offset, etc. This combined interface might make passing that information harder.

arcbbb · 2025-11-06T16:11:44Z

OK, sounds good. I'm not against this, I was just thinking of things like what the stride was in a StridedMemoryOp (constants / small values could be cheaper on some hypothetical architecture). GatherScatters could do with better cost modelling, including possible whether the addresses are base + vector offset or vector-base + constant offset, etc. This combined interface might make passing that information harder.

I agree that a constant stride helps the cost model.
In that cases I’d extend getMemIntrinsicInstrCost to take a MemAttr bag which has a constructor specific for target intrinsic.
Callers would look like:

getMemIntrinsicInstrCost(Intrinsic::masked_load, MemAttr(...), K)
getMemIntrinsicInstrCost(Intrinsic::vp_strided_load, MemAttr(..., Stride, ...), K)

This way just resembles getIntrinsicInstrCost. And how to fold MemAttr(...) into IntrinsicCostAttributes is a open question as noted in the reply above to @lukel97.

arcbbb · 2025-11-10T09:21:19Z

Added MemIntrinsicCostAttributes for flexible mem‑intrinsic costing.

fhahn

Another advantage of a single entry-point is that it should hopefully be easier for people to discover when new cases are queried/supported

fhahn · 2025-11-10T09:25:08Z

llvm/include/llvm/CodeGen/BasicTTIImpl.h

+      unsigned Opcode = (Id == Intrinsic::experimental_vp_strided_load)
+                            ? Instruction::Load
+                            : Instruction::Store;
+      return thisT()->getStridedMemoryOpCost(Opcode, DataTy, Ptr, VariableMask,


Sounds great, thanks!

davemgreen · 2025-11-10T12:37:19Z

OK, sounds good. I'm not against this, I was just thinking of things like what the stride was in a StridedMemoryOp (constants / small values could be cheaper on some hypothetical architecture). GatherScatters could do with better cost modelling, including possible whether the addresses are base + vector offset or vector-base + constant offset, etc. This combined interface might make passing that information harder.

I agree that a constant stride helps the cost model. In that cases I’d extend getMemIntrinsicInstrCost to take a MemAttr bag which has a constructor specific for target intrinsic. Callers would look like:

getMemIntrinsicInstrCost(Intrinsic::masked_load, MemAttr(...), K)

getMemIntrinsicInstrCost(Intrinsic::vp_strided_load, MemAttr(..., Stride, ...), K)

This way just resembles getIntrinsicInstrCost. And how to fold MemAttr(...) into IntrinsicCostAttributes is a open question as noted in the reply above to @lukel97.

The constant stride to StridedMemoryOp was just a hypothetical - I don't have a real place that would be useful, so don't worry about it too much. It was more of a general question about higher-level information being passed through to each of the functions and whether this is really better.

preames · 2025-11-10T17:40:59Z

I was initial skeptical from the review description, but reading over the code this doesn't seem unreasonable. My main worry with the change is arranging the staging of changes to be actually NFC. Towards that end, could we consider introducing MemIntrinsicCostAttributes in a separate patch (using the prior API names), and updating each routine in it's own commit with the update going all the way through to backend impl? Doing that would make it easier to lean on the compiler and audit the results. Once each was updated, we could then do a final change which does only the API name update and the dispatch in TTI.

Please take my suggestion as only a suggestion; please don't consider the review blocked. If someone else approves the current direction, I have no problems with that.

arcbbb requested review from alexey-bataev, fhahn and luke957 October 29, 2025 09:43

arcbbb force-pushed the mem-intrinsic-cost branch from 7551383 to 47d0b7a Compare October 30, 2025 06:13

arcbbb requested review from lukel97 and removed request for luke957 November 3, 2025 06:34

arcbbb added backend:RISC-V llvm:analysis Includes value tracking, cost tables and constant folding labels Nov 3, 2025

arcbbb added the vectorizers label Nov 3, 2025

lukel97 reviewed Nov 3, 2025

View reviewed changes

fhahn requested review from RKSimon and arsenm November 3, 2025 11:03

arcbbb force-pushed the mem-intrinsic-cost branch from 47d0b7a to 2c2e772 Compare November 4, 2025 09:22

arcbbb added 6 commits November 4, 2025 01:23

Add getMemIntrinsicInstrCost

f9ea255

[TTI] Remove getStridedMemoryOpCost from Analysis/TargetTransformInfo

d913f52

[TTI] Remove getGatherScatterOpCost from Analysis/TargetTransformInfo

5dfe7e7

[TTI] Remove getMaskedMemoryOpCost from Analysis/TargetTransformInfo

cacfa70

[TTI] Remove getExpandCompressMemoryOpCost from Analysis/TargetTransf…

c6a9488

…ormInfo

Use true for default variable mask for masked load/store

5481271

arcbbb force-pushed the mem-intrinsic-cost branch from 2c2e772 to 5481271 Compare November 4, 2025 09:41

arcbbb changed the title ~~[WIP][TTI] Replace getStridedMemoryOpCost with getIntrinsicInstrCost~~ [TTI][Vectorize] Migrate masked/gather-scatter/strided/expand-compress costing (NFCI) Nov 4, 2025

fhahn requested a review from davemgreen November 4, 2025 09:53

fhahn reviewed Nov 4, 2025

View reviewed changes

arcbbb marked this pull request as ready for review November 4, 2025 09:54

llvmbot added the llvm:transforms label Nov 4, 2025

lukel97 reviewed Nov 4, 2025

View reviewed changes

arcbbb added 2 commits November 4, 2025 03:20

Fix build error

65e71a2

Address comments

8104387

Add MemIntrinsicCostAttributes

00bbfbb

fhahn reviewed Nov 10, 2025

View reviewed changes

		/// \p I - the optional original context instruction, if one exists, e.g. the
		/// load/store to transform or the call to the gather/scatter intrinsic

[TTI][Vectorize] Migrate masked/gather-scatter/strided/expand-compress costing (NFCI) #165532

Are you sure you want to change the base?

[TTI][Vectorize] Migrate masked/gather-scatter/strided/expand-compress costing (NFCI) #165532

Conversation

arcbbb commented Oct 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Oct 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

arcbbb commented Oct 31, 2025

Uh oh!

llvmbot commented Nov 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lukel97 Nov 3, 2025

Choose a reason for hiding this comment

Uh oh!

arcbbb Nov 3, 2025

Choose a reason for hiding this comment

Uh oh!

fhahn left a comment

Choose a reason for hiding this comment

Uh oh!

fhahn Nov 4, 2025

Choose a reason for hiding this comment

Uh oh!

arcbbb Nov 4, 2025

Choose a reason for hiding this comment

Uh oh!

fhahn Nov 10, 2025

Choose a reason for hiding this comment

Uh oh!

lukel97 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

lukel97 Nov 4, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

davemgreen commented Nov 4, 2025

Uh oh!

arcbbb commented Nov 5, 2025

Uh oh!

arcbbb commented Nov 5, 2025

Uh oh!

davemgreen commented Nov 5, 2025

Uh oh!

arcbbb commented Nov 6, 2025

Uh oh!

arcbbb commented Nov 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fhahn left a comment

Choose a reason for hiding this comment

Uh oh!

fhahn Nov 10, 2025

Choose a reason for hiding this comment

Uh oh!

davemgreen commented Nov 10, 2025

Uh oh!

preames commented Nov 10, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

arcbbb commented Oct 29, 2025 •

edited

Loading

github-actions bot commented Oct 29, 2025 •

edited

Loading

llvmbot commented Nov 3, 2025 •

edited

Loading

arcbbb commented Nov 10, 2025 •

edited

Loading