-
Notifications
You must be signed in to change notification settings - Fork 15.2k
[VPlan] Use pointer to member 0 as VPInterleaveRecipe's pointer arg. #106431
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 1 commit
ea52fe4
f23772b
5ad888a
d2fd005
3c896cb
0f2bdc0
e7c09a9
3f64c75
0b10868
45a5ea1
8a3cd0e
a361393
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
@@ -646,7 +646,9 @@ Value *VPInstruction::generatePerPart(VPTransformState &State, unsigned Part) { | |||||||||||||
| "can only generate first lane for PtrAdd"); | ||||||||||||||
| Value *Ptr = State.get(getOperand(0), Part, /* IsScalar */ true); | ||||||||||||||
| Value *Addend = State.get(getOperand(1), Part, /* IsScalar */ true); | ||||||||||||||
| return Builder.CreatePtrAdd(Ptr, Addend, Name); | ||||||||||||||
| return Builder.CreatePtrAdd(Ptr, Addend, Name, | ||||||||||||||
| isInBounds() ? GEPNoWrapFlags::inBounds() | ||||||||||||||
| : GEPNoWrapFlags::none()); | ||||||||||||||
|
||||||||||||||
| return Builder.CreatePtrAdd(Ptr, Addend, Name, | |
| isInBounds() ? GEPNoWrapFlags::inBounds() | |
| : GEPNoWrapFlags::none()); | |
| auto Flags = isInBounds() ? GEPNoWrapFlags::inBounds() | |
| : GEPNoWrapFlags::none()); | |
| return Builder.CreatePtrAdd(Ptr, Addend, Name, Flags); |
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated to use CreateInBoundsPtrAdd/CreatePtrAdd.
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does the above comment of adjusting the index from the first vector lane (lane zero?) rather than using the pointer of the last lane, still hold? If so, this adjusting is invariant and better be placed in the preheader, possibly later by licm?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I think so, we just don't have to compute the index for the first lane (used to be done via the removed add)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, better keep Idx null if not needed, to refrain from generating a redundant gep with zero.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could pull in here at the cost of a number of additional test changes or land separately.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your call; generating gep with zero also causes some test discrepancies.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I'll leave it as is for now, then drop separately.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dropped in 3ec6f80
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: can now use Index instead of Idx, if preferred.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated, thanks!
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should the "Notice current instruction ..." explanation below be updated?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It can be dropped now I think, done, thanks!
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
|
|
@@ -596,8 +596,7 @@ static void legalizeAndOptimizeInductions(VPlan &Plan, ScalarEvolution &SE) { | |||||
| Plan, InductionDescriptor::IK_IntInduction, Instruction::Add, nullptr, | ||||||
| SE, nullptr, StartV, StepV, InsertPt); | ||||||
|
|
||||||
| auto *Recipe = new VPInstruction(VPInstruction::PtrAdd, | ||||||
| {PtrIV->getStartValue(), Steps}, | ||||||
| auto *Recipe = new VPInstruction(PtrIV->getStartValue(), Steps, false, | ||||||
| PtrIV->getDebugLoc(), "next.gep"); | ||||||
|
|
||||||
| Recipe->insertAfter(Steps); | ||||||
|
|
@@ -1522,14 +1521,19 @@ void VPlanTransforms::dropPoisonGeneratingRecipes( | |||||
| } | ||||||
|
|
||||||
| void VPlanTransforms::createInterleaveGroups( | ||||||
| const SmallPtrSetImpl<const InterleaveGroup<Instruction> *> &InterleaveGroups, | ||||||
| VPlan &Plan, | ||||||
| const SmallPtrSetImpl<const InterleaveGroup<Instruction> *> | ||||||
| &InterleaveGroups, | ||||||
| VPRecipeBuilder &RecipeBuilder, bool ScalarEpilogueAllowed) { | ||||||
| if (InterleaveGroups.empty()) | ||||||
| return; | ||||||
|
|
||||||
| // Interleave memory: for each Interleave Group we marked earlier as relevant | ||||||
| // for this VPlan, replace the Recipes widening its memory instructions with a | ||||||
| // single VPInterleaveRecipe at its insertion point. | ||||||
| VPDominatorTree VPDT; | ||||||
| VPDT.recalculate(Plan); | ||||||
| for (const auto *IG : InterleaveGroups) { | ||||||
| auto *Recipe = | ||||||
| cast<VPWidenMemoryRecipe>(RecipeBuilder.getRecipe(IG->getInsertPos())); | ||||||
| SmallVector<VPValue *, 4> StoredValues; | ||||||
| for (unsigned i = 0; i < IG->getFactor(); ++i) | ||||||
| if (auto *SI = dyn_cast_or_null<StoreInst>(IG->getMember(i))) { | ||||||
|
|
@@ -1539,9 +1543,39 @@ void VPlanTransforms::createInterleaveGroups( | |||||
|
|
||||||
| bool NeedsMaskForGaps = | ||||||
| IG->requiresScalarEpilogue() && !ScalarEpilogueAllowed; | ||||||
| auto *VPIG = new VPInterleaveRecipe(IG, Recipe->getAddr(), StoredValues, | ||||||
| Recipe->getMask(), NeedsMaskForGaps); | ||||||
| VPIG->insertBefore(Recipe); | ||||||
|
|
||||||
| Instruction *IRInsertPos = IG->getInsertPos(); | ||||||
| auto *InsertPos = | ||||||
| cast<VPWidenMemoryRecipe>(RecipeBuilder.getRecipe(IRInsertPos)); | ||||||
| VPRecipeBase *IP = InsertPos; | ||||||
artagnon marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||||||
|
|
||||||
| // Get or create the start address for the interleave group. | ||||||
| auto *Start = | ||||||
| cast<VPWidenMemoryRecipe>(RecipeBuilder.getRecipe(IG->getMember(0))); | ||||||
| VPValue *Addr = Start->getAddr(); | ||||||
| if (!VPDT.properlyDominates(Addr->getDefiningRecipe(), InsertPos)) { | ||||||
|
||||||
| bool InBounds = false; | ||||||
| if (auto *gep = dyn_cast<GetElementPtrInst>( | ||||||
artagnon marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||||||
| getLoadStorePointerOperand(IRInsertPos)->stripPointerCasts())) | ||||||
| InBounds = gep->isInBounds(); | ||||||
|
|
||||||
| // We cannot re-use the address of the first member because it does not | ||||||
|
||||||
| // We cannot re-use the address of the first member because it does not | |
| // We cannot re-use the address of member zero because it does not |
(may be confusing: insertion point appears "first" (for loads, last for stores); here we mean "first" in terms of memory addresses.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated, thanks!
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| // dominate the insert position. Use the address of the insert position | |
| // dominate the insert position. Instead, use the address of the insert position |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated thanks.
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| // and create a PtrAdd to adjust the index to start at the first member. | |
| // and create a PtrAdd adjusting it to the address of member zero. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated, thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Worth asserting Offset or index of IRInsertPos is non zero?
I.e., if insert pos is the first member, its address operand must dominate it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added assert, thanks!
artagnon marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -1419,13 +1419,11 @@ define dso_local void @masked_strided2(ptr noalias nocapture readonly %p, ptr no | |
| ; ENABLED_MASKED_STRIDED-NEXT: [[WIDE_MASKED_VEC:%.*]] = call <16 x i8> @llvm.masked.load.v16i8.p0(ptr [[TMP2]], i32 1, <16 x i1> [[INTERLEAVED_MASK]], <16 x i8> poison) | ||
| ; ENABLED_MASKED_STRIDED-NEXT: [[STRIDED_VEC:%.*]] = shufflevector <16 x i8> [[WIDE_MASKED_VEC]], <16 x i8> poison, <8 x i32> <i32 0, i32 2, i32 4, i32 6, i32 8, i32 10, i32 12, i32 14> | ||
| ; ENABLED_MASKED_STRIDED-NEXT: [[STRIDED_VEC1:%.*]] = shufflevector <16 x i8> [[WIDE_MASKED_VEC]], <16 x i8> poison, <8 x i32> <i32 1, i32 3, i32 5, i32 7, i32 9, i32 11, i32 13, i32 15> | ||
| ; ENABLED_MASKED_STRIDED-NEXT: [[TMP3:%.*]] = or disjoint i32 [[TMP1]], 1 | ||
| ; ENABLED_MASKED_STRIDED-NEXT: [[TMP4:%.*]] = call <8 x i8> @llvm.smax.v8i8(<8 x i8> [[STRIDED_VEC]], <8 x i8> [[STRIDED_VEC1]]) | ||
| ; ENABLED_MASKED_STRIDED-NEXT: [[TMP6:%.*]] = getelementptr i8, ptr [[Q:%.*]], i32 [[TMP1]] | ||
| ; ENABLED_MASKED_STRIDED-NEXT: [[TMP5:%.*]] = sub <8 x i8> zeroinitializer, [[TMP4]] | ||
| ; ENABLED_MASKED_STRIDED-NEXT: [[TMP6:%.*]] = getelementptr i8, ptr [[Q:%.*]], i32 [[TMP3]] | ||
| ; ENABLED_MASKED_STRIDED-NEXT: [[TMP7:%.*]] = getelementptr i8, ptr [[TMP6]], i32 -1 | ||
| ; ENABLED_MASKED_STRIDED-NEXT: [[INTERLEAVED_VEC:%.*]] = shufflevector <8 x i8> [[TMP4]], <8 x i8> [[TMP5]], <16 x i32> <i32 0, i32 8, i32 1, i32 9, i32 2, i32 10, i32 3, i32 11, i32 4, i32 12, i32 5, i32 13, i32 6, i32 14, i32 7, i32 15> | ||
| ; ENABLED_MASKED_STRIDED-NEXT: call void @llvm.masked.store.v16i8.p0(<16 x i8> [[INTERLEAVED_VEC]], ptr [[TMP7]], i32 1, <16 x i1> [[INTERLEAVED_MASK]]) | ||
| ; ENABLED_MASKED_STRIDED-NEXT: call void @llvm.masked.store.v16i8.p0(<16 x i8> [[INTERLEAVED_VEC]], ptr [[TMP6]], i32 1, <16 x i1> [[INTERLEAVED_MASK]]) | ||
| ; ENABLED_MASKED_STRIDED-NEXT: [[INDEX_NEXT]] = add nuw i32 [[INDEX]], 8 | ||
| ; ENABLED_MASKED_STRIDED-NEXT: [[VEC_IND_NEXT]] = add <8 x i32> [[VEC_IND]], <i32 8, i32 8, i32 8, i32 8, i32 8, i32 8, i32 8, i32 8> | ||
| ; ENABLED_MASKED_STRIDED-NEXT: [[TMP8:%.*]] = icmp eq i32 [[INDEX_NEXT]], 1024 | ||
|
|
@@ -2555,13 +2553,11 @@ define dso_local void @masked_strided2_unknown_tc(ptr noalias nocapture readonly | |
| ; ENABLED_MASKED_STRIDED-NEXT: [[WIDE_MASKED_VEC:%.*]] = call <16 x i8> @llvm.masked.load.v16i8.p0(ptr [[TMP3]], i32 1, <16 x i1> [[INTERLEAVED_MASK]], <16 x i8> poison) | ||
| ; ENABLED_MASKED_STRIDED-NEXT: [[STRIDED_VEC:%.*]] = shufflevector <16 x i8> [[WIDE_MASKED_VEC]], <16 x i8> poison, <8 x i32> <i32 0, i32 2, i32 4, i32 6, i32 8, i32 10, i32 12, i32 14> | ||
| ; ENABLED_MASKED_STRIDED-NEXT: [[STRIDED_VEC3:%.*]] = shufflevector <16 x i8> [[WIDE_MASKED_VEC]], <16 x i8> poison, <8 x i32> <i32 1, i32 3, i32 5, i32 7, i32 9, i32 11, i32 13, i32 15> | ||
| ; ENABLED_MASKED_STRIDED-NEXT: [[TMP5:%.*]] = or disjoint i32 [[TMP2]], 1 | ||
| ; ENABLED_MASKED_STRIDED-NEXT: [[TMP6:%.*]] = call <8 x i8> @llvm.smax.v8i8(<8 x i8> [[STRIDED_VEC]], <8 x i8> [[STRIDED_VEC3]]) | ||
| ; ENABLED_MASKED_STRIDED-NEXT: [[TMP8:%.*]] = getelementptr i8, ptr [[Q:%.*]], i32 [[TMP2]] | ||
| ; ENABLED_MASKED_STRIDED-NEXT: [[TMP7:%.*]] = sub <8 x i8> zeroinitializer, [[TMP6]] | ||
| ; ENABLED_MASKED_STRIDED-NEXT: [[TMP8:%.*]] = getelementptr i8, ptr [[Q:%.*]], i32 [[TMP5]] | ||
| ; ENABLED_MASKED_STRIDED-NEXT: [[TMP9:%.*]] = getelementptr i8, ptr [[TMP8]], i32 -1 | ||
| ; ENABLED_MASKED_STRIDED-NEXT: [[INTERLEAVED_VEC:%.*]] = shufflevector <8 x i8> [[TMP6]], <8 x i8> [[TMP7]], <16 x i32> <i32 0, i32 8, i32 1, i32 9, i32 2, i32 10, i32 3, i32 11, i32 4, i32 12, i32 5, i32 13, i32 6, i32 14, i32 7, i32 15> | ||
| ; ENABLED_MASKED_STRIDED-NEXT: call void @llvm.masked.store.v16i8.p0(<16 x i8> [[INTERLEAVED_VEC]], ptr [[TMP9]], i32 1, <16 x i1> [[INTERLEAVED_MASK]]) | ||
| ; ENABLED_MASKED_STRIDED-NEXT: call void @llvm.masked.store.v16i8.p0(<16 x i8> [[INTERLEAVED_VEC]], ptr [[TMP8]], i32 1, <16 x i1> [[INTERLEAVED_MASK]]) | ||
| ; ENABLED_MASKED_STRIDED-NEXT: [[INDEX_NEXT]] = add i32 [[INDEX]], 8 | ||
| ; ENABLED_MASKED_STRIDED-NEXT: [[VEC_IND_NEXT]] = add <8 x i32> [[VEC_IND]], <i32 8, i32 8, i32 8, i32 8, i32 8, i32 8, i32 8, i32 8> | ||
| ; ENABLED_MASKED_STRIDED-NEXT: [[TMP10:%.*]] = icmp eq i32 [[INDEX_NEXT]], [[N_VEC]] | ||
|
|
@@ -2989,13 +2985,11 @@ define dso_local void @unconditional_masked_strided2_unknown_tc(ptr noalias noca | |
| ; ENABLED_MASKED_STRIDED-NEXT: [[WIDE_MASKED_VEC:%.*]] = call <16 x i8> @llvm.masked.load.v16i8.p0(ptr [[TMP2]], i32 1, <16 x i1> [[INTERLEAVED_MASK]], <16 x i8> poison) | ||
| ; ENABLED_MASKED_STRIDED-NEXT: [[STRIDED_VEC:%.*]] = shufflevector <16 x i8> [[WIDE_MASKED_VEC]], <16 x i8> poison, <8 x i32> <i32 0, i32 2, i32 4, i32 6, i32 8, i32 10, i32 12, i32 14> | ||
| ; ENABLED_MASKED_STRIDED-NEXT: [[STRIDED_VEC3:%.*]] = shufflevector <16 x i8> [[WIDE_MASKED_VEC]], <16 x i8> poison, <8 x i32> <i32 1, i32 3, i32 5, i32 7, i32 9, i32 11, i32 13, i32 15> | ||
| ; ENABLED_MASKED_STRIDED-NEXT: [[TMP3:%.*]] = or disjoint i32 [[TMP1]], 1 | ||
| ; ENABLED_MASKED_STRIDED-NEXT: [[TMP4:%.*]] = call <8 x i8> @llvm.smax.v8i8(<8 x i8> [[STRIDED_VEC]], <8 x i8> [[STRIDED_VEC3]]) | ||
| ; ENABLED_MASKED_STRIDED-NEXT: [[TMP6:%.*]] = getelementptr inbounds i8, ptr [[Q:%.*]], i32 [[TMP1]] | ||
| ; ENABLED_MASKED_STRIDED-NEXT: [[TMP5:%.*]] = sub <8 x i8> zeroinitializer, [[TMP4]] | ||
| ; ENABLED_MASKED_STRIDED-NEXT: [[TMP6:%.*]] = getelementptr inbounds i8, ptr [[Q:%.*]], i32 [[TMP3]] | ||
| ; ENABLED_MASKED_STRIDED-NEXT: [[TMP7:%.*]] = getelementptr inbounds i8, ptr [[TMP6]], i32 -1 | ||
| ; ENABLED_MASKED_STRIDED-NEXT: [[INTERLEAVED_VEC:%.*]] = shufflevector <8 x i8> [[TMP4]], <8 x i8> [[TMP5]], <16 x i32> <i32 0, i32 8, i32 1, i32 9, i32 2, i32 10, i32 3, i32 11, i32 4, i32 12, i32 5, i32 13, i32 6, i32 14, i32 7, i32 15> | ||
| ; ENABLED_MASKED_STRIDED-NEXT: call void @llvm.masked.store.v16i8.p0(<16 x i8> [[INTERLEAVED_VEC]], ptr [[TMP7]], i32 1, <16 x i1> [[INTERLEAVED_MASK]]) | ||
| ; ENABLED_MASKED_STRIDED-NEXT: call void @llvm.masked.store.v16i8.p0(<16 x i8> [[INTERLEAVED_VEC]], ptr [[TMP6]], i32 1, <16 x i1> [[INTERLEAVED_MASK]]) | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Note: Q[(TMP1+1)-1] --> Q[TMP1] |
||
| ; ENABLED_MASKED_STRIDED-NEXT: [[INDEX_NEXT]] = add i32 [[INDEX]], 8 | ||
| ; ENABLED_MASKED_STRIDED-NEXT: [[TMP8:%.*]] = icmp eq i32 [[INDEX_NEXT]], [[N_VEC]] | ||
| ; ENABLED_MASKED_STRIDED-NEXT: br i1 [[TMP8]], label [[FOR_END]], label [[VECTOR_BODY]], !llvm.loop [[LOOP12:![0-9]+]] | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: a bit odd to construct PtrAdd given two VPValues and a bool. Perhaps pass a GEPFlagsTy as third parameter instead of bool, analogous to passing FMFs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated, thanks!