-
Notifications
You must be signed in to change notification settings - Fork 15.2k
[InferAlignment] Propagate alignment between loads/stores of the same base pointer #145733
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 1 commit
15600f9
b905f1c
cd9e9f6
9c99e0a
8db304a
d696eee
c53e595
39dc009
2f73c04
3e55c80
1ef4008
622cf48
4a05b2e
5d15526
8d0d3fc
c6eb67a
215d65f
0025877
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| @@ -0,0 +1,296 @@ | ||||||||||||||||||||||||||||||
| ; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 5 | ||||||||||||||||||||||||||||||
| ; RUN: opt -passes=load-store-vectorizer -S < %s | FileCheck %s | ||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||
| ; The IR has the first float3 labeled with align 16, and that 16 should | ||||||||||||||||||||||||||||||
| ; be propagated such that the second set of 4 values | ||||||||||||||||||||||||||||||
| ; can also be vectorized together. | ||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||
| for (uint64_t i = 0; i < NumElements; i++) { | |
| Value *Indices[2] = { | |
| Zero, | |
| ConstantInt::get(IdxType, i), | |
| }; | |
| auto *Ptr = IC.Builder.CreateInBoundsGEP(AT, Addr, ArrayRef(Indices), | |
| Name + ".elt"); | |
| auto EltAlign = commonAlignment(Align, Offset.getKnownMinValue()); | |
| auto *L = IC.Builder.CreateAlignedLoad(AT->getElementType(), Ptr, | |
| EltAlign, Name + ".unpack"); | |
| L->setAAMetadata(LI.getAAMetadata()); | |
| V = IC.Builder.CreateInsertValue(V, L, i); | |
| Offset += EltSize; | |
| } |
I feel like implementing that would be a bit of a mess, and leads to questions like "how many layers deep should it recurse"? Unpacking one layer at a time and then adding the nested struct elements to the IC worklist to be independently operated on in a later iteration is a much cleaner solution and feels more in line with the design philosophy of InstCombine. I could be wrong though, open to feedback or challenges to my assumptions.
And just to be clear, in case my previous explanation was confusing, the reason it would have to recurse through all layers at once is because we cannot store the knowledge that "the second element of load struct.float3 align 4 is aligned to 16" on the instruction; there isn't a syntax available to express that.
Edit: We would also have to change unpackStoreToAggregate to do this too, further increasing the code complexity
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This has the align 16 with opt -attributor-enable=module -O3. The default attribute passes do a worse job it seems
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are there any existing passes in particular that would benefit from this upgraded alignment?
Isel benefits from better alignment information in general.
I think the compile time tradeoffs that come with implementing it in InstCombine or InferAlignments would be not worth it. For InstCombine specifically, we have a lot of practical issues with the compile time stress it puts on our compiler at NVIDIA, so we tend to avoid adding complexity to it when possible.
We should definitely not infer better alignments in InstCombine. We specifically split out the InferAlignments pass out of InstCombine to reduce compile-time from repeatedly inferring alignment.
I think with the scheme I proposed above doing this in InferAlignment should be close to free, but of course we won't know until we try.
The problem is that I think if we want to avoid those expensive calls and limit the analysis to "shallow" geps that offset directly off of aligned base pointers, we will miss a lot of cases. There are transformations like StraightLineStrengthReduce that build chains of geps that increase the instruction depth needed to find the underlying object.
I'm not sure we'll be losing much in practice. I'd generally expect that after CSE, cases where alignment can be increased in this manner will have the form of constant offset GEPs from a common base. All of the tests you added in this PR are of this form. Did you see motivating cases where this does not hold?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should definitely not infer better alignments in InstCombine. We specifically split out the InferAlignments pass out of InstCombine to reduce compile-time from repeatedly inferring alignment.
I agree. There is technically a solution that involves expanding the aggregate unpacking logic to traverse multiple layers at once, which would theoretically solve this problem for "free", but I think there are other tradeoffs that come with that.
I'm not sure we'll be losing much in practice. I'd generally expect that after CSE, cases where alignment can be increased in this manner will have the form of constant offset GEPs from a common base.
This is a reasonable argument. Let me investigate this possibility and get back to you all with my findings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This has the align 16 with opt -attributor-enable=module -O3. The default attribute passes do a worse job it seems
Sorry, can you clarify what "this" is? What test/command are you running?
That is the command, opt -attributor-enable=module -O3
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I see what you mean now. That Attributor pass (which does not run by default) labels the argument pointer with the align 16 attribute, and then later InferAlignment uses computeKnownBits (which sees that attribute) to upgrade the load/store.
I don't know much about the history of the Attributor pass. Is that something we should consider enabling by default? Or maybe adding the alignment attribute to the argument at some other point in the pipeline, such as in InstCombine when the aggregate is split? Edit: or maybe an addition to InferAlignments that attempts to add param attributes before the computeKnownBits pass?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, it's looking like the InferAlignments change is working. I'll put it up soon for review soon, and we can weigh it against a possible Attributor fix.
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you use more representative tests with some uses? As is these all optimize to memset or no-op
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Slight pushback, my understanding is that lit tests are most useful when they are minimal reproducers of the problem the optimization is targeting. Adding uses would not really change the nature of this optimization. Tests like llvm/test/Transforms/LoadStoreVectorizer/NVPTX/vectorize_i1.ll follow this thinking.
If you think it would be better, I could combine each pair of load and store tests into individual tests, storing the result of the loads. Other LSV tests use that pattern a lot.
Uh oh!
There was an error while loading. Please reload this page.