-
Notifications
You must be signed in to change notification settings - Fork 15.1k
[VPlan] Add narrowToSingleScalarRecipe transform. #139150
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 1 commit
50be059
df295ec
a8dda3f
a7e9545
cd4b3bf
c0bf19d
451d82a
fbf5f9d
5b15a9b
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
@@ -1084,6 +1084,40 @@ void VPlanTransforms::simplifyRecipes(VPlan &Plan, Type &CanonicalIVTy) { | |||||||||||||||||||
| } | ||||||||||||||||||||
| } | ||||||||||||||||||||
|
|
||||||||||||||||||||
| static void convertToUniformRecipes(VPlan &Plan) { | ||||||||||||||||||||
| auto TryToNarrow = [](VPBasicBlock *VPBB) { | ||||||||||||||||||||
| for (VPRecipeBase &R : make_early_inc_range(reverse(*VPBB))) { | ||||||||||||||||||||
| // Try to narrow wide and replicating recipes to uniform recipes, based on | ||||||||||||||||||||
| // VPlan analysis. | ||||||||||||||||||||
| auto *Def = dyn_cast<VPSingleDefRecipe>(&R); | ||||||||||||||||||||
| if (!Def || !isa<VPReplicateRecipe, VPWidenRecipe>(Def) || | ||||||||||||||||||||
| !Def->getUnderlyingValue()) | ||||||||||||||||||||
|
||||||||||||||||||||
| !Def->getUnderlyingValue()) |
can be asserted instead if desired - these recipes must have underlying values - in order to know what to replicate or widen.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It could be asserted for VPReplicateRecipe, but not for VPWidenRecipe which does not require a underlying value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, if VPWidenRecipe uses the (optional) underlying value only for metadata and FMF, could it be replaced with VPInstruction?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep we should be able to do that soon. I'll give it a try, need to see if we rely on the encoded facts that certain recipes are widened in various places.
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| auto *Def = dyn_cast<VPSingleDefRecipe>(&R); | |
| if (!Def || !isa<VPReplicateRecipe, VPWidenRecipe>(Def) || | |
| !Def->getUnderlyingValue()) | |
| continue; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| auto *RepR = dyn_cast<VPReplicateRecipe>(&R); | |
| if (RepR && RepR->isUniform()) | |
| continue; | |
| auto *RepR = dyn_cast<VPReplicateRecipe>(&R); | |
| if (!RepR && !isa<VPWidenRecipe>(&R)) | |
| continue; | |
| if (RepR && RepR->isUniform()) | |
| continue; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| auto *SingleDef = cast<VPSingleDefRecipe>(&R); |
or RepOrWidenR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done thanks
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| // Skip recipes that aren't uniform and don't have only their scalar | |
| // Skip recipes that aren't single scalar or don't have only their scalar |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The and here should be accurate, it skips cases that have non-scalar uses, as this may require introducing broadcasts. This is something that will be generalized in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code has an ||?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yes, was thinking about the recipes we process below, updated, thanks
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| // results used. In the later case, we would introduce extra broadcasts. | |
| // results used. In the latter case, we would introduce extra broadcasts. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done thanks
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The term "UniformAfterVectorization" and VPReplicateRecipe's "uniform" field should be renamed. Being uniform (having same value for all lanes) is independent of being before or after vectorization. The term stands for "singleLane" or "singleScalar", which is typically associated with the first lane (as in unit-stride, i.e., clearly non-uniform, GEPs whose first lane is the only one used), and with all lanes when the value is uniform.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, this is long overdue! #140134
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a particular reason we're using a lambda here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As if this pass had a class of its own where its main runPass() called a TryToNarrow() method on each basic block, as in runOnBasicBlock(), independently (and conceptually in parallel).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have some follow-up patches that may calls this on additional blocks, which was why I had the lambda originally. Inlined for now, thanks
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to apply uniform analysis if VF=1? If not, could we skip it when VF=1?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, suspect so, given that in such case all replicate recipes should already be "uniform" and widen recipes are irrelevant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, added a check isScalarVFOnly() to the transform
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -7,86 +7,87 @@ define void @test(ptr %p, i40 %a) { | |
| ; CHECK-NEXT: entry: | ||
| ; CHECK-NEXT: br i1 false, label [[SCALAR_PH:%.*]], label [[VECTOR_PH:%.*]] | ||
| ; CHECK: vector.ph: | ||
| ; CHECK-NEXT: [[TMP0:%.*]] = icmp sgt i1 true, false | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is this the said degradation? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yep, trivial folding that at the moment happens in IRBuilder on VPWidenRecipe, but not on replicate recipes which clone the original instruction. Will be fixed by pending VP constant folder |
||
| ; CHECK-NEXT: br label [[VECTOR_BODY:%.*]] | ||
| ; CHECK: vector.body: | ||
| ; CHECK-NEXT: br i1 true, label [[PRED_STORE_IF:%.*]], label [[PRED_STORE_CONTINUE:%.*]] | ||
| ; CHECK: pred.store.if: | ||
| ; CHECK-NEXT: store i1 false, ptr [[P]], align 1 | ||
| ; CHECK-NEXT: store i1 [[TMP0]], ptr [[P]], align 1 | ||
| ; CHECK-NEXT: br label [[PRED_STORE_CONTINUE]] | ||
| ; CHECK: pred.store.continue: | ||
| ; CHECK-NEXT: br i1 true, label [[PRED_STORE_IF1:%.*]], label [[PRED_STORE_CONTINUE2:%.*]] | ||
| ; CHECK: pred.store.if1: | ||
| ; CHECK-NEXT: store i1 false, ptr [[P]], align 1 | ||
| ; CHECK-NEXT: store i1 [[TMP0]], ptr [[P]], align 1 | ||
| ; CHECK-NEXT: br label [[PRED_STORE_CONTINUE2]] | ||
| ; CHECK: pred.store.continue2: | ||
| ; CHECK-NEXT: br i1 true, label [[PRED_STORE_IF3:%.*]], label [[PRED_STORE_CONTINUE4:%.*]] | ||
| ; CHECK: pred.store.if3: | ||
| ; CHECK-NEXT: store i1 false, ptr [[P]], align 1 | ||
| ; CHECK-NEXT: store i1 [[TMP0]], ptr [[P]], align 1 | ||
| ; CHECK-NEXT: br label [[PRED_STORE_CONTINUE4]] | ||
| ; CHECK: pred.store.continue4: | ||
| ; CHECK-NEXT: br i1 true, label [[PRED_STORE_IF5:%.*]], label [[PRED_STORE_CONTINUE6:%.*]] | ||
| ; CHECK: pred.store.if5: | ||
| ; CHECK-NEXT: store i1 false, ptr [[P]], align 1 | ||
| ; CHECK-NEXT: store i1 [[TMP0]], ptr [[P]], align 1 | ||
| ; CHECK-NEXT: br label [[PRED_STORE_CONTINUE6]] | ||
| ; CHECK: pred.store.continue6: | ||
| ; CHECK-NEXT: br i1 true, label [[PRED_STORE_IF7:%.*]], label [[PRED_STORE_CONTINUE8:%.*]] | ||
| ; CHECK: pred.store.if7: | ||
| ; CHECK-NEXT: store i1 false, ptr [[P]], align 1 | ||
| ; CHECK-NEXT: store i1 [[TMP0]], ptr [[P]], align 1 | ||
| ; CHECK-NEXT: br label [[PRED_STORE_CONTINUE8]] | ||
| ; CHECK: pred.store.continue8: | ||
| ; CHECK-NEXT: br i1 true, label [[PRED_STORE_IF9:%.*]], label [[PRED_STORE_CONTINUE10:%.*]] | ||
| ; CHECK: pred.store.if9: | ||
| ; CHECK-NEXT: store i1 false, ptr [[P]], align 1 | ||
| ; CHECK-NEXT: store i1 [[TMP0]], ptr [[P]], align 1 | ||
| ; CHECK-NEXT: br label [[PRED_STORE_CONTINUE10]] | ||
| ; CHECK: pred.store.continue10: | ||
| ; CHECK-NEXT: br i1 true, label [[PRED_STORE_IF11:%.*]], label [[PRED_STORE_CONTINUE12:%.*]] | ||
| ; CHECK: pred.store.if11: | ||
| ; CHECK-NEXT: store i1 false, ptr [[P]], align 1 | ||
| ; CHECK-NEXT: store i1 [[TMP0]], ptr [[P]], align 1 | ||
| ; CHECK-NEXT: br label [[PRED_STORE_CONTINUE12]] | ||
| ; CHECK: pred.store.continue12: | ||
| ; CHECK-NEXT: br i1 true, label [[PRED_STORE_IF13:%.*]], label [[PRED_STORE_CONTINUE14:%.*]] | ||
| ; CHECK: pred.store.if13: | ||
| ; CHECK-NEXT: store i1 false, ptr [[P]], align 1 | ||
| ; CHECK-NEXT: store i1 [[TMP0]], ptr [[P]], align 1 | ||
| ; CHECK-NEXT: br label [[PRED_STORE_CONTINUE14]] | ||
| ; CHECK: pred.store.continue14: | ||
| ; CHECK-NEXT: br i1 true, label [[PRED_STORE_IF15:%.*]], label [[PRED_STORE_CONTINUE16:%.*]] | ||
| ; CHECK: pred.store.if15: | ||
| ; CHECK-NEXT: store i1 false, ptr [[P]], align 1 | ||
| ; CHECK-NEXT: store i1 [[TMP0]], ptr [[P]], align 1 | ||
| ; CHECK-NEXT: br label [[PRED_STORE_CONTINUE16]] | ||
| ; CHECK: pred.store.continue16: | ||
| ; CHECK-NEXT: br i1 true, label [[PRED_STORE_IF17:%.*]], label [[PRED_STORE_CONTINUE18:%.*]] | ||
| ; CHECK: pred.store.if17: | ||
| ; CHECK-NEXT: store i1 false, ptr [[P]], align 1 | ||
| ; CHECK-NEXT: store i1 [[TMP0]], ptr [[P]], align 1 | ||
| ; CHECK-NEXT: br label [[PRED_STORE_CONTINUE18]] | ||
| ; CHECK: pred.store.continue18: | ||
| ; CHECK-NEXT: br i1 false, label [[PRED_STORE_IF19:%.*]], label [[PRED_STORE_CONTINUE20:%.*]] | ||
| ; CHECK: pred.store.if19: | ||
| ; CHECK-NEXT: store i1 false, ptr [[P]], align 1 | ||
| ; CHECK-NEXT: store i1 [[TMP0]], ptr [[P]], align 1 | ||
| ; CHECK-NEXT: br label [[PRED_STORE_CONTINUE20]] | ||
| ; CHECK: pred.store.continue20: | ||
| ; CHECK-NEXT: br i1 false, label [[PRED_STORE_IF21:%.*]], label [[PRED_STORE_CONTINUE22:%.*]] | ||
| ; CHECK: pred.store.if21: | ||
| ; CHECK-NEXT: store i1 false, ptr [[P]], align 1 | ||
| ; CHECK-NEXT: store i1 [[TMP0]], ptr [[P]], align 1 | ||
| ; CHECK-NEXT: br label [[PRED_STORE_CONTINUE22]] | ||
| ; CHECK: pred.store.continue22: | ||
| ; CHECK-NEXT: br i1 false, label [[PRED_STORE_IF23:%.*]], label [[PRED_STORE_CONTINUE24:%.*]] | ||
| ; CHECK: pred.store.if23: | ||
| ; CHECK-NEXT: store i1 false, ptr [[P]], align 1 | ||
| ; CHECK-NEXT: store i1 [[TMP0]], ptr [[P]], align 1 | ||
| ; CHECK-NEXT: br label [[PRED_STORE_CONTINUE24]] | ||
| ; CHECK: pred.store.continue24: | ||
| ; CHECK-NEXT: br i1 false, label [[PRED_STORE_IF25:%.*]], label [[PRED_STORE_CONTINUE26:%.*]] | ||
| ; CHECK: pred.store.if25: | ||
| ; CHECK-NEXT: store i1 false, ptr [[P]], align 1 | ||
| ; CHECK-NEXT: store i1 [[TMP0]], ptr [[P]], align 1 | ||
| ; CHECK-NEXT: br label [[PRED_STORE_CONTINUE26]] | ||
| ; CHECK: pred.store.continue26: | ||
| ; CHECK-NEXT: br i1 false, label [[PRED_STORE_IF27:%.*]], label [[PRED_STORE_CONTINUE28:%.*]] | ||
| ; CHECK: pred.store.if27: | ||
| ; CHECK-NEXT: store i1 false, ptr [[P]], align 1 | ||
| ; CHECK-NEXT: store i1 [[TMP0]], ptr [[P]], align 1 | ||
| ; CHECK-NEXT: br label [[PRED_STORE_CONTINUE28]] | ||
| ; CHECK: pred.store.continue28: | ||
| ; CHECK-NEXT: br i1 false, label [[PRED_STORE_IF29:%.*]], label [[PRED_STORE_CONTINUE30:%.*]] | ||
| ; CHECK: pred.store.if29: | ||
| ; CHECK-NEXT: store i1 false, ptr [[P]], align 1 | ||
| ; CHECK-NEXT: store i1 [[TMP0]], ptr [[P]], align 1 | ||
| ; CHECK-NEXT: br label [[PRED_STORE_CONTINUE30]] | ||
| ; CHECK: pred.store.continue30: | ||
| ; CHECK-NEXT: br label [[MIDDLE_BLOCK:%.*]] | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as this captures both uniformity and only-first-lane-used? Also affects title of patch.
Analogous to
truncateToMinimalBitwidths()which aims to reduce each lane to fewer bits, this aims to reduce each part to fewest lanes - to one. Perhaps both should start withnarrow, as used in the now inlined lambda.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated to narrowToSingleScalarRecipe, thanks