Skip to content

Commit 768c654

Browse files
committed
[VPlan] Get Addr computation cost with scalar type if it is uniform for gather/scatter.
This patch query `getAddressComputationCost()` with scalar type if the address is uniform. This can help the cost for gather/scatter more accurate. In current LV, non consecutive VPWidenMemoryRecipe (gather/scatter) will account the cost of address computation. But there are some cases that the addr is uniform accross lanes, that makes the address can be calculated with scalar type and broadcast. I have a follow optimization that try to converts gather/scatter with uniform memory acces to scalar load/store + broadcast. With this optimization, we can remove this temporary change.
1 parent 65c2e93 commit 768c654

File tree

2 files changed

+12
-5
lines changed

2 files changed

+12
-5
lines changed

llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp

Lines changed: 9 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -3106,10 +3106,17 @@ InstructionCost VPWidenMemoryRecipe::computeCost(ElementCount VF,
31063106
// TODO: Using the original IR may not be accurate.
31073107
// Currently, ARM will use the underlying IR to calculate gather/scatter
31083108
// instruction cost.
3109-
const Value *Ptr = getLoadStorePointerOperand(&Ingredient);
3110-
Type *PtrTy = toVectorTy(Ptr->getType(), VF);
31113109
assert(!Reverse &&
31123110
"Inconsecutive memory access should not have the order.");
3111+
3112+
const Value *Ptr = getLoadStorePointerOperand(&Ingredient);
3113+
Type *PtrTy = Ptr->getType();
3114+
3115+
// If the address value is uniform across all lane, then the address can be
3116+
// calculated with scalar type and broacast.
3117+
if (!vputils::isSingleScalar(getAddr()))
3118+
PtrTy = toVectorTy(PtrTy, VF);
3119+
31133120
return Ctx.TTI.getAddressComputationCost(PtrTy) +
31143121
Ctx.TTI.getGatherScatterOpCost(Opcode, Ty, Ptr, IsMasked, Alignment,
31153122
Ctx.CostKind, &Ingredient);

llvm/test/Transforms/LoopVectorize/RISCV/truncate-to-minimal-bitwidth-evl-crash.ll

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -12,13 +12,13 @@ define void @truncate_to_minimal_bitwidths_widen_cast_recipe(ptr %src) {
1212
; CHECK-NEXT: br i1 false, label %[[SCALAR_PH:.*]], label %[[VECTOR_PH:.*]]
1313
; CHECK: [[VECTOR_PH]]:
1414
; CHECK-NEXT: [[TMP3:%.*]] = call i64 @llvm.vscale.i64()
15-
; CHECK-NEXT: [[TMP1:%.*]] = mul nuw i64 [[TMP3]], 2
15+
; CHECK-NEXT: [[TMP1:%.*]] = mul nuw i64 [[TMP3]], 8
1616
; CHECK-NEXT: br label %[[VECTOR_BODY:.*]]
1717
; CHECK: [[VECTOR_BODY]]:
1818
; CHECK-NEXT: [[EVL_BASED_IV:%.*]] = phi i64 [ 0, %[[VECTOR_PH]] ], [ [[INDEX_EVL_NEXT:%.*]], %[[VECTOR_BODY]] ]
1919
; CHECK-NEXT: [[AVL:%.*]] = phi i64 [ 9, %[[VECTOR_PH]] ], [ [[AVL_NEXT:%.*]], %[[VECTOR_BODY]] ]
20-
; CHECK-NEXT: [[TMP7:%.*]] = call i32 @llvm.experimental.get.vector.length.i64(i64 [[AVL]], i32 2, i1 true)
21-
; CHECK-NEXT: call void @llvm.vp.scatter.nxv2i8.nxv2p0(<vscale x 2 x i8> zeroinitializer, <vscale x 2 x ptr> align 1 zeroinitializer, <vscale x 2 x i1> splat (i1 true), i32 [[TMP7]])
20+
; CHECK-NEXT: [[TMP7:%.*]] = call i32 @llvm.experimental.get.vector.length.i64(i64 [[AVL]], i32 8, i1 true)
21+
; CHECK-NEXT: call void @llvm.vp.scatter.nxv8i8.nxv8p0(<vscale x 8 x i8> zeroinitializer, <vscale x 8 x ptr> align 1 zeroinitializer, <vscale x 8 x i1> splat (i1 true), i32 [[TMP7]])
2222
; CHECK-NEXT: [[TMP9:%.*]] = zext i32 [[TMP7]] to i64
2323
; CHECK-NEXT: [[INDEX_EVL_NEXT]] = add nuw i64 [[TMP9]], [[EVL_BASED_IV]]
2424
; CHECK-NEXT: [[AVL_NEXT]] = sub nuw i64 [[AVL]], [[TMP9]]

0 commit comments

Comments
 (0)